• Dylan Griffith's avatar
    Include letters, numbers & underscore always as Elasticsearch token · a0a14f6f
    Dylan Griffith authored
    There are various other regexes here that are trying to capture tokens
    in different contexts but at the very least we should also always be
    greedily capturing a series of letters, numbers and underscores.
    It's OK if this is already covered in some cases by another regex since
    we de-duplicate tokens anyway.
    
    The test included in this change is an example where we don't correctly
    capture this token today and it is a common example in Ruby so we should
    cover it.
    a0a14f6f
elasticsearch-word-tokens-with-underscores.yml 146 Bytes