Sunday, July 18, 2010

Preparing for 1.0 -- new search engine

There is a happy and sad moments recently. Happy because Google has just released Android 2.2, a very cheerful update that speeds up most applications by about 2x. My phone become super fast. Sad because the full-text search powered by SQLite FTS3 does not work anymore.

So I think, while the have-to-make-index-first thing is not so good too, it's better to make another search engine that doesn't use SQLite. Instead, the words are searched by scanning the Bible text and finding substrings. If there are multiple words, the subsequent words are looked for from the earlier result.

Performance was not satisfying - on the first correct attempt it took 2.5 minutes to search one word when running in the emulator (~40 seconds on Nexus One). The slowness I found are in:

  1. Reading of Bible text data which uses custom UTF8 decoder (the internal UTF8 decoder was even slower!)
  2. Call to String#indexOf for each verse
  3. Case-insensitive matching that uses toLowerCase
After the following optimizations, it took only 3 seconds to search a word on Nexus One.
  1. Currently the Bible data is all ASCII, I skip the UTF8 decoder altogether and convert the bytes using String(byte[], int)
  2. Instead of calling it for each verse, I call it for each chapter, if something is found, then the verses are identified.
  3. Because of number 1, I can use simpler methods to convert upper case letters to lower case just by adding 0x20 to all uppercase letters.
The downside of this custom search engine is that it's longer to search (it was < 1 second when using FTS3 engine) and no support for * and quotes. The advantages are no additional space required to store the index, the words need not be exact, and we can filter OT and NT.

Searching for two non-consecutive words

No comments:

Post a Comment