Advertisement
Guest User

Untitled

a guest
Mar 29th, 2017
46
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.70 KB | None | 0 0
  1. Background:
  2. I have a field called "brand" with the following mapping:
  3.  
  4. "brand": {
  5. "analyzer": "swedish",
  6. "fields": {
  7. "search": {
  8. "analyzer": "search_analyzer",
  9. "type": "string"
  10. }
  11. },
  12. "type": "string"
  13. }
  14.  
  15. The custom search_analyer looks like this:
  16.  
  17. "analyzer": {
  18. "search_analyzer": {
  19. "filter": [
  20. "lowercase",
  21. "swedish_stop",
  22. "swedish_stemmer",
  23. "swedish_folding",
  24. "split_words"
  25. ],
  26. "tokenizer": "standard",
  27. "type": "custom"
  28. }
  29. },
  30. "filter": {
  31. "split_words": {
  32. "type": "word_delimiter"
  33. },
  34. "swedish_folding": {
  35. "type": "icu_folding",
  36. "unicodeSetFilter": "[^åäöÅÄÖ]"
  37. },
  38. "swedish_stemmer": {
  39. "language": "swedish",
  40. "type": "stemmer"
  41. },
  42. "swedish_stop": {
  43. "stopwords": "__swedish__",
  44. "type": "stop"
  45. }
  46. }
  47.  
  48. Using the analyzer tool we can see that the analyzer works as expected (diactrics stripped in this case):
  49.  
  50. GET /<index>/_analyze?field=brand.search&text=Lékué
  51. {
  52. "tokens": [
  53. {
  54. "end_offset": 5,
  55. "position": 0,
  56. "start_offset": 0,
  57. "token": "lekue",
  58. "type": "<ALPHANUM>"
  59. }
  60. ]
  61. }
  62.  
  63. Also looking at the term vectors for a document with the brand set to "Lékué" looks like this:
  64.  
  65. GET /<index>/product/<id>/_termvector?fields=brand.search
  66. {
  67. ...
  68. "found": true,
  69. "term_vectors": {
  70. "brand.search": {
  71. "field_statistics": {
  72. "doc_count": 20305,
  73. "sum_doc_freq": 29968,
  74. "sum_ttf": 29976
  75. },
  76. "terms": {
  77. "lekue": {
  78. "term_freq": 1,
  79. "tokens": [
  80. {
  81. "end_offset": 5,
  82. "position": 0,
  83. "start_offset": 0
  84. }
  85. ]
  86. }
  87. }
  88. }
  89. },
  90. "took": 1
  91. }
  92.  
  93. To the problem:
  94. The problem is that I can't find any documents when I search for the normalized brand "lekue".
  95. I use the following query DSL:
  96.  
  97. GET /<index>/product/_search
  98. {
  99. "query": {
  100. "match": {
  101. "brand.search": "lekue"
  102. }
  103. }
  104. }
  105.  
  106. Which returns:
  107. {
  108. ...
  109. "hits": {
  110. "total": 0,
  111. "max_score": null,
  112. "hits": []
  113. }
  114. }
  115.  
  116. However if I change "lekue" to the non-normalized form "Lékué" I find the documents (2):
  117.  
  118. GET /<index>/product/_search
  119. {
  120. "query": {
  121. "match": {
  122. "brand.search": "Lékué"
  123. }
  124. }
  125. }
  126.  
  127. Response:
  128. {
  129. ...
  130. "hits": {
  131. "total": 2,
  132. "max_score": 12.780518,
  133. "hits": [...]
  134. }
  135. }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement