ee/changelogs/unreleased/elasticsearch-word-tokens-with-underscores.yml · 8f5a80e9be642d8576f19792249360c64b2bfa67 · nexedi / gitlab-ce

Include letters, numbers & underscore always as Elasticsearch token · a0a14f6f

Dylan Griffith authored Jul 08, 2020

There are various other regexes here that are trying to capture tokens
in different contexts but at the very least we should also always be
greedily capturing a series of letters, numbers and underscores.
It's OK if this is already covered in some cases by another regex since
we de-duplicate tokens anyway.

The test included in this change is an example where we don't correctly
capture this token today and it is a common example in Ruby so we should
cover it.

a0a14f6f

elasticsearch-word-tokens-with-underscores.yml 146 Bytes

Replace elasticsearch-word-tokens-with-underscores.yml