Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Background:
- I have a field called "brand" with the following mapping:
- "brand": {
- "analyzer": "swedish",
- "fields": {
- "search": {
- "analyzer": "search_analyzer",
- "type": "string"
- }
- },
- "type": "string"
- }
- The custom search_analyer looks like this:
- "analyzer": {
- "search_analyzer": {
- "filter": [
- "lowercase",
- "swedish_stop",
- "swedish_stemmer",
- "swedish_folding",
- "split_words"
- ],
- "tokenizer": "standard",
- "type": "custom"
- }
- },
- "filter": {
- "split_words": {
- "type": "word_delimiter"
- },
- "swedish_folding": {
- "type": "icu_folding",
- "unicodeSetFilter": "[^åäöÅÄÖ]"
- },
- "swedish_stemmer": {
- "language": "swedish",
- "type": "stemmer"
- },
- "swedish_stop": {
- "stopwords": "__swedish__",
- "type": "stop"
- }
- }
- Using the analyzer tool we can see that the analyzer works as expected (diactrics stripped in this case):
- GET /<index>/_analyze?field=brand.search&text=Lékué
- {
- "tokens": [
- {
- "end_offset": 5,
- "position": 0,
- "start_offset": 0,
- "token": "lekue",
- "type": "<ALPHANUM>"
- }
- ]
- }
- Also looking at the term vectors for a document with the brand set to "Lékué" looks like this:
- GET /<index>/product/<id>/_termvector?fields=brand.search
- {
- ...
- "found": true,
- "term_vectors": {
- "brand.search": {
- "field_statistics": {
- "doc_count": 20305,
- "sum_doc_freq": 29968,
- "sum_ttf": 29976
- },
- "terms": {
- "lekue": {
- "term_freq": 1,
- "tokens": [
- {
- "end_offset": 5,
- "position": 0,
- "start_offset": 0
- }
- ]
- }
- }
- }
- },
- "took": 1
- }
- To the problem:
- The problem is that I can't find any documents when I search for the normalized brand "lekue".
- I use the following query DSL:
- GET /<index>/product/_search
- {
- "query": {
- "match": {
- "brand.search": "lekue"
- }
- }
- }
- Which returns:
- {
- ...
- "hits": {
- "total": 0,
- "max_score": null,
- "hits": []
- }
- }
- However if I change "lekue" to the non-normalized form "Lékué" I find the documents (2):
- GET /<index>/product/_search
- {
- "query": {
- "match": {
- "brand.search": "Lékué"
- }
- }
- }
- Response:
- {
- ...
- "hits": {
- "total": 2,
- "max_score": 12.780518,
- "hits": [...]
- }
- }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement