george February 2016

lucene query special characters

I have trouble understandnig handling of special characters in lucene.
My analyzer has no stopwords, so that special chars are not removed:

CharArraySet stopwords = new CharArraySet(0, true);
return new GermanAnalyzer(stopwords);  

than I create docs like:

doc.add(new TextField("tags", "23", Store.NO));
doc.add(new TextField("tags", "Brüder-Grimm-Weg", Store.NO));

Query tags:brüder\-g works fine, but fuzzy query tags:brüder\-g~ does not return anything. When the street name would be Eselgasse query tags:Esel~ would work fine.
I use lucene 5.3.1

Thanks for help!

Answers


femtoRgon February 2016

Fuzzy Queries (as well as wildcard or regex queries) are not analyzed by the QueryParser.

If you are using StandardAnalyzer, for instance, "Brüder-Grimm-Weg" will be indexed as three terms, "brüder", "grimm", and "weg". So, after analysis you have:

  • "tags:brüder\-g" --> tags:brüder tags:g
    This matches on tags:brüder

  • "tags:brüder\-g~" --> tags:brüder-g~2
    Since this is not analyzed, it remains a single term, and you have no matches, since there is no single term in your index like "brüder-g"

Post Status

Asked in February 2016
Viewed 1,081 times
Voted 13
Answered 1 times

Search




Leave an answer