Gramene's new FastBit engine for DAS retrievals

A while ago, Gramene instantiated a Distributed Annotation System (DAS) for serving some of our rice genome annotations directly from our markers and sequences database. We soon found that the size and design of our database was not optimized for type of real-time queries necessary to serve DAS.

One particularly testy problem was range queries, e.g., features between a start and stop position. SQL database such as our MySQL databases have a hard time handling these types of queries well. Our first attempt to improve performance at the database level was to implement a binning scheme, but this didn't help us enough. A second attempt was to denormalize the data into a presorted (by position), flat files for each DAS track. Though this was substantially better for some queries (especially those near the beginning of a segment), it wasn't fast enough. After some more exploring, we found FastBit, "an efficient compressed bitmap index technology" that could turn our files into indexes that could now be queried in a fraction of the time. In our testing, we've found this is sufficient to serve many of our Ensembl genome browser tracks from DAS rather than loading them into our Ensembl "core" databases. We have also considered using tabix, a "generic indexer for TAB-delimited genome position files" that also appears to scale well in tests by lab members.

We hope you will find our DAS useful both on Gramene and other servers that can take advantage of these systems. If you have any questions, please consider using our feedback form or sending us a message at gramene@gramene.org