.net - Lucene PorterStemmer question -


given following code:

dim stemmer new lucene.net.analysis.porterstemmer() response.write(stemmer.stem("mattress table") & "<br />") // outputs: mattress t response.write(stemmer.stem("mattress") & "<br />") // outputs:  mattress response.write(stemmer.stem("table") & "<br />") // outputs: tabl 

could explain why porterstemmer produces different results when there space in word? expecting 'mattress table' stemmed 'mattress tabl'.

also, further confusing following code:

dim parser lucene.net.queryparsers.queryparser = new lucene.net.queryparsers.queryparser("myfield", new porterstemmeranalyzer) dim q lucene.net.search.query = parser.parse("mattress table") response.write(q.tostring & "<br />") // outputs:  myfield:mattress myfield: tabl  q = parser.parse("""mattress table""") response.write(q.tostring & "<br />") // outputs field:"mattress tabl" 

could explain why getting different results queryparser() , stem() function same word(s) using same analyzer?

thanks, kyle

porterstemmeranalyzer composed of series of tokenizers , filters. porterstemmer 1 of filters tokenstream generated. if want verify that, try changing case of query. queryparser output in lowercase due lowercasefilter on tokenstream.

some sample code custom analyzer can checked here. give peek inside analyzer.


Comments

Popular posts from this blog

unicode - Are email addresses allowed to contain non-alphanumeric characters? -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -

c++ - Convert big endian to little endian when reading from a binary file -