i_like_robots February 2016

Optimising ElasticSearch aggregated search suggestions

I'm working on implementing an autocomplete field where the suggestions also contain the number of matching documents.

I have implemented this simply using a terms aggregation with include filter. So for instance given a user typing 'Chrysler' the following query may be generated:

{
    "size": 0,
    "query": {
        "bool": {
            "must": [
                ...
            ]
        }
    },
    "aggs": {
        "filtered": {
            "filter": {
                ...
            },
            "aggs": {
                "suggestions": {
                    "terms": {
                        "field": "prefLabel",
                        "include": "Chry.*",
                        "min_doc_count": 0
                    }
                }
            }
        }
    }
}

This works fine and I am able to get the data I need. However, I am concerned that this is not very well optimised and more could be done when the documents are indexed.

Currently we have the following mapping:

{
    ...
    "prefLabel":{
        "type":"string",
        "index":"not_analyzed"
    }
}

And I am wondering whether to add an analysed field, like so:

{
    ...
    "prefLabel":{
        "type":"string",
        "index":"not_analyzed",
        "copy_to":"searchLabel"
    },
    "searchLabel":{
        "type":"string",
        "analyzer":"???"
    }
}

So my question is: what is the most optimal index-time analyser for this? (or, is this just crazy?)

Answers


Artur Nowak February 2016

I think that edge ngram tokenizer would speed things up:

curl -XPUT 'localhost:9200/test_ngram' -d '{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "suggester_analyzer" : {
                    "tokenizer" : "ngram_tokenizer"
                }
            },
            "tokenizer" : {
                "ngram_tokenizer" : {
                    "type" : "edgeNGram",
                    "min_gram" : "2",
                    "max_gram" : "7",
                    "token_chars": [ "letter", "digit" ]
                }
            }
        }
    },
    "mappings": {
        ...
             "searchLabel": {
                 "type": "string",
                 "index_analyzer": "suggster_analyzer",
                 "search_analyzer": "standard"
             }
        ...
    }
}'

Post Status

Asked in February 2016
Viewed 3,501 times
Voted 11
Answered 1 times

Search




Leave an answer