Search Lucene with precise edit distances -


i search lucene index edit distances. example, say, there document field first_name; want documents first names 1 edit distance away from, say, 'john'.

i know lucene supports fuzzy searches (first_name:john~) , takes number between 0 , 1 control fuzziness. problem (for me) number not directly translate edit distance. , when values in documents short strings (less 3 characters) fuzzy search has difficulty finding them. example if there document first_name 'j' , search first_name:i~0.0 don't back.

in lucene's fuzzyquery, cannot specify extact distance. can specify value of "fuzziness" between 0 , 1 values closer 0 indicate broad match , values closer 1 indicate narrow match. formula "fuzziness" follows. (from lucene in action)

http://bit.ly/9hdvuf

from formula, can work approximate fuzziness given value of distance. so, stackoverflow matched stackunderflow, @ distance of 3, fuzziness required approximately 0.77.


Comments

Popular posts from this blog

ruby - When to use an ORM (Sequel, Datamapper, AR, etc.) vs. pure SQL for querying -

php - PHPDoc: @return void necessary? -

c++ - Convert big endian to little endian when reading from a binary file -