Commits · d93f6633d49296818d1361d6617e23e8244d2480 · nexedi / MariaDB

26 Jul, 2024 40 commits

mhnsw: auto-tune efConstruction · d93f6633

Sergei Golubchik authored Jul 22, 2024

* remove hard-coded ef_construction_multiplier
* instead, let ef_construction go up and down automatically as needed
* as needed means that expanding the queue changes the result much
* much is defined by the queue stiffness, as in Hooke's law
* also search_layer() now returns only as many elements as needed, the
  caller no longer needs to overallocate result arrays for throwaway nodes
* change _downheap() to return the position where the element ended up

d93f6633

cleanup: FVector · a6c88428
Sergei Golubchik authored Jul 20, 2024
```
make FVector great again
```
a6c88428

mhnsw: store coordinates in 16 bits, not 32 · 1bca1fc5

Sergei Golubchik authored Jul 19, 2024

use int16_t instead of floats, they're faster and smaller.
but perform intermediate SIMD calculations with floats to avoid overflows.
recall drop with such scheme is below 0.002, often none.

int8_t would've been better but the precision loss is too big
and recall degrades too much.

1bca1fc5

UPDATE/DELETE post-fixes · d029a2e0
Sergei Golubchik authored Jul 18, 2024

d029a2e0
cleanup: prepare_for_insert() -> prepare_for_modify() · 95a59fee
Sergei Golubchik authored Jul 18, 2024
```
make handler::prepare_for_insert() to be called to prepare
the handler for writes, INSERT/UPDATE/DELETE.
```
95a59fee

MDEV-33408 Initial support for vector DELETE and UPDATE · 9de28556

Hugo Wen authored Jul 16, 2024

When the source row is deleted, mark the corresponding node in HNSW
index by setting `tref` to null. An index is added for the `tref` in
secondary table for faster searching of the to-be-marked nodes.

The nodes marked as deleted will still be used for search, but will not
be included in the final query results.

As skipping deleted nodes and not adding deleted nodes for new-inserted
nodes' neighbor list could impact the performance, we now only skip
these nodes in search results.

- for some reason the bitmap is not set for hlindex during the delete so
  I had to temporarily comment out one line

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.

9de28556

non-SIMD fallback · 11a37311
Sergei Golubchik authored Jul 14, 2024

11a37311

mhnsw: inter-statement shared cache · 41334e56

Sergei Golubchik authored Jul 17, 2024

* preserve the graph in memory between statements
* keep it in a TABLE_SHARE, available for concurrent searches
* nodes are generally read-only, walking the graph doesn't change them
* distance to target is cached, calculated only once
* SIMD-optimized bloom filter detects visited nodes
* nodes are stored in an array, not List, to better utilize bloom filter
* auto-adjusting heuristic to estimate the number of visited nodes
  (to configure the bloom filter)
* many threads can concurrently walk the graph. MEM_ROOT and Hash_set
  are protected with a mutex, but walking doesn't need them
* up to 8 threads can concurrently load nodes into the cache,
  nodes are partitioned into 8 mutexes (8 is chosen arbitrarily, might
  need tuning)
* concurrent editing is not supported though
* this is fine for MyISAM, TL_WRITE protects the TABLE_SHARE and the
  graph (note that TL_WRITE_CONCURRENT_INSERT is not allowed, because an
  INSERT into the main table means multiple UPDATEs in the graph)
* InnoDB uses secondary transaction-level caches linked in a list in
  in thd->ha_data via a fake handlerton
* on rollback the secondary cache is discarded, on commit nodes
  from the secondary cache are invalidated in the shared cache
  while it is exclusively locked
* on savepoint rollback both caches are flushed. this can be improved
  in the future with a row visibility callback
* graph size is controlled by @@mhnsw_cache_size, the cache is flushed
  when it reaches the threshold

41334e56

mhnsw: change storage format · 656845ef

Sergei Golubchik authored Jun 18, 2024

instead of one row per node per layer, have one row per node.
store all neighbors for all layers in that row, and the vector itself too

it completely avoids searches in the graph table and
will allow to implement deletions in the future

656845ef

mhnsw: closest neighbor precalc heuristic · 5147ca6f

Sergei Golubchik authored Jun 13, 2024

This is based on the heuristic  that if a candidate neighbor
has a very close neighbor of its own, than this close neighbor
is also likely a candidate neighbor itself.

Meaning, we might replace the loop that compares a candidate
with all neighbors if we know the distance between the candidate
and its closest neighbor. Which can be precalculated.

This gives the most speedup when the number of neighbors
and the number of dimensions are large. In the tests it was
2.5-3x speedup, with the recall being worse by 0.1%-1%

Incidentally, in the opposite case it gives both litle speedup
and notably worse recall. Tests have shown 1.13x speedup
with recall going down by ~20% in the worst - smallest - case.

Thus, this heuristic is only enabled above the certain threshold.

5147ca6f

mhnsw: return an error if lazy neighbor read failed · 1165e6a6
Sergei Golubchik authored Jun 12, 2024

1165e6a6
mhnsw: SIMD for euclidean distance · 86a3ac12
Sergei Golubchik authored Jun 06, 2024

86a3ac12

mhnsw: configurable parameters · 1249210b

Sergei Golubchik authored Jun 11, 2024

1. introduce alpha. the value of 1.1 is optimal, so hard-code it.

2. define ef and efConstruction in terms of limit and M, that is,
   ef = ef_limit_multiplier * limit
   efConstruction = ef_construction_multiplier * M

3. ef_construction_multiplier=4 is almost always optimal, so
   hard-code it too

4. rename hnsw_max_connection_per_layer to mhnsw_max_edges_per_node
   (max_connection is rather ambiguous in MariaDB) and add a help text

5. rename hnsw_ef_search to mhnsw_limit_multiplier and add a help text

1249210b

InnoDB support for hlindexes and mhnsw · a91bc7b7

Sergei Golubchik authored Jun 08, 2024

* mhnsw:
  * use primary key, innodb loves and (and the index cannot have dupes anyway)
    * MyISAM is ok with that, performance-wise
  * must be ha_rnd_init(0) because we aren't going to scan
    * MyISAM resets the position on ha_rnd_init(0) so query it before
    * oh, and use the correct handler, just in case
  * HA_ERR_RECORD_IS_THE_SAME is no error
* innodb:
  * return ref_length on create
  * don't assume table->pos_in_table_list is set
  * ok, assume away, but only for system versioned tables
* set alter_info on create (InnoDB needs to check for FKs)
* pair external_lock/external_unlock correctly

a91bc7b7

mhnsw: cache start node too, don't push too much in pg_discard · 3804afad
Sergei Golubchik authored Jun 07, 2024

3804afad
bugfix: properly reset db_plugin when hlindex discovery fails · fd80cb58
Sergei Golubchik authored Jun 07, 2024
```
otherwise it'll be free'd twice
```
fd80cb58
mhnsw: build indexes with the columns of exactly right size · 23d76688
Sergei Golubchik authored Jun 07, 2024

23d76688
cleanups · f43f9204
Sergei Golubchik authored Jun 07, 2024

f43f9204
mhnsw: remove EXTEND_CANDIDATES and KEEP_PRUNED_CONNECTIONS · e903541e
Sergei Golubchik authored Jun 06, 2024

e903541e
mhnsw: search intermediate layers with ef=1 · c532a0c1
Sergei Golubchik authored Jun 06, 2024
```
also add missing candidates.empty();
```
c532a0c1
mhnsw: fix the heuristic neighbor selection algorithm · 03239fd1
Sergei Golubchik authored Jun 05, 2024

03239fd1
mhnsw: don't prefix blob ref array with its length · 5dab4964
Sergei Golubchik authored Jun 05, 2024

5dab4964
mhnsw: don't create many empty layers · 696d2705
Sergei Golubchik authored Jun 05, 2024

696d2705
mhnsw: remove a redundant loop and ha_update_row · 76549358
Sergei Golubchik authored Jun 05, 2024

76549358
mhnsw: modify target's neighbors directly · f661b93c
Sergei Golubchik authored Jun 05, 2024

f661b93c
mhnsw: cache neighbors too · f6bc9879
Sergei Golubchik authored Jun 04, 2024

f6bc9879
mhnsw: don't guess whether it's insert or update · 69839ff8
Sergei Golubchik authored Jun 05, 2024
```
we know it every time
```
69839ff8

mhnsw: refactor FVector* classes · 691b3f26

Sergei Golubchik authored Jun 04, 2024

Now there's an FVector class which is a pure vector, an array of floats.
It doesn't necessarily corresponds to a row in the table, and usually
there is only one FVector instance - the one we're searching for.

And there's an FVectorNode class, which is a node in the graph.
It has a ref (identifying a row in the source table), possibly an array
of floats (or not — in which case it will be read lazily from the
source table as needed). There are many FVectorNodes and they're
cached to avoid re-reading them from the disk.

691b3f26

mhnsw: fix memory management · 960f8ac3
Sergei Golubchik authored Jun 03, 2024
```
move everything into a query-local memroot which is freed at the end
```
960f8ac3

mhnsw: simplify memory management of returned results · 57db6c20

Sergei Golubchik authored Jun 03, 2024

instead of pointers to FVectorRef's (which are stored elsewhere)
let's return one big array of all refs. Freeing this array will
free the complete result set.

57db6c20

misc changes · f5e2c4cf

Sergei Golubchik authored Jun 01, 2024

* sysvars should be REQUIRED_ARG
* fix a mix of US and UK spelling (use US)
* use consistent naming
* work if VEC_DISTANCE arguments are in the swapped order (const, col)
* work if VEC_DISTANCE argument is NULL/invalid or wrong length
* abort INSERT if the value is invalid or wrong length
* store the "number of neighbors" in a blob in endianness-independent way
* use field->store(longlong, bool) not field->store(double)
* a lot more error checking everywhere
* cleanup after errors
* simplify calling conventions, remove reinterpret_cast's
* todo/XXX comments
* whitespaces
* use float consistently

memory management is still totally PoC quality

Initial HNSW implementation

f5e2c4cf

Initial HNSW implementation · 5f6880c7

Vicențiu Ciorbaru authored Feb 17, 2024

This commit includes the work done in collaboration with Hugo Wen from
Amazon:

    MDEV-33408 Alter HNSW graph storage and fix memory leak

    This commit changes the way HNSW graph information is stored in the
    second table. Instead of storing connections as separate records, it now
    stores neighbors for each node, leading to significant performance
    improvements and storage savings.

    Comparing with the previous approach, the insert speed is 5 times faster,
    search speed improves by 23%, and storage usage is reduced by 73%, based
    on ann-benchmark tests with random-xs-20-euclidean and
    random-s-100-euclidean datasets.

    Additionally, in previous code, vector objects were not released after
    use, resulting in excessive memory consumption (over 20GB for building
    the index with 90,000 records), preventing tests with large datasets.
    Now ensure that vectors are released appropriately during the insert and
    search functions. Note there are still some vectors that need to be
    cleaned up after search query completion. Needs to be addressed in a
    future commit.

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.

As well as the commit:

    Introduce session variables to manage HNSW index parameters

    Three variables:

    hnsw_max_connection_per_layer
    hnsw_ef_constructor
    hnsw_ef_search

    ann-benchmark tool is also updated to support these variables in commit
    https://github.com/HugoWenTD/ann-benchmarks/commit/e09784e for branch
    https://github.com/HugoWenTD/ann-benchmarks/tree/mariadb-configurable

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.
Co-authored-by: Hugo Wen <wenhug@amazon.com>

5f6880c7

cleanup: simplify Queue<>, add const · 3f6348f0
Vicențiu Ciorbaru authored May 29, 2024
```
also add const to methods in List<> and Hash_set<>
while we're at it
```
3f6348f0
cleanup: C++11 range-based for loop for Hash_set<> · 9523b2e5
Sergei Golubchik authored Jul 12, 2024

9523b2e5

initial support for vector indexes · 42d0579c

Sergei Golubchik authored Jan 17, 2024

MDEV-33407 Parser support for vector indexes

The syntax is

  create table t1 (... vector index (v) ...);

limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported

MDEV-33404 Engine-independent indexes: subtable method

added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.

MDEV-33406 basic optimizer support for k-NN searches

for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.

42d0579c

cleanup: init_tmp_table_share(bool thread_specific) · c1558680

Sergei Golubchik authored Jul 18, 2024

let the caller tell init_tmp_table_share() whether the table
should be thread_specific or not.

In particular, internal tmp tables created in the slave thread
are perfectly thread specific

c1558680

cleanup: thd->alloc<>() and thd->calloc<>() · 537abed6

Sergei Golubchik authored Jun 01, 2024

create templates

  thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)

and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.

537abed6

Revert "MDEV-15458 Segfault in heap_scan() upon UPDATE after ADD SYSTEM VERSIONING" · dcc5fcd8

Sergei Golubchik authored Feb 09, 2024

This partially reverts 43623f04

Engines have to set ::position() after ::write_row(), otherwise
the server won't be able to refer to the row just inserted.
This is important for high-level indexes.

heap part isn't reverted, so heap doesn't support high-level indexes.
to fix this, it'll need info->lastpos in addition to info->current_ptr

dcc5fcd8

cleanup: unused function argument · f45e431e
Sergei Golubchik authored Jan 26, 2024

f45e431e
open frm for DROP TABLE · 11dc5c45
Sergei Golubchik authored Jan 27, 2024
```
needed to get partitioning and information about
secondary objects
```
11dc5c45