Search Lucene with precise edit distances -
i search lucene index edit distances. example, say, there document field first_name; want documents first names 1 edit distance away from, say, 'john'.
i know lucene supports fuzzy searches (first_name:john~) , takes number between 0 , 1 control fuzziness. problem (for me) number not directly translate edit distance. , when values in documents short strings (less 3 characters) fuzzy search has difficulty finding them. example if there document first_name 'j' , search first_name:i~0.0 don't back.
in lucene's fuzzyquery, cannot specify extact distance. can specify value of "fuzziness" between 0 , 1 values closer 0 indicate broad match , values closer 1 indicate narrow match. formula "fuzziness" follows. (from lucene in action)
from formula, can work approximate fuzziness given value of distance. so, stackoverflow matched stackunderflow, @ distance of 3, fuzziness required approximately 0.77.
Comments
Post a Comment