Commits · 960f8ac3861e65d9bf18a93157f530fe7c575547 · nexedi / MariaDB

26 Jul, 2024 27 commits

mhnsw: fix memory management · 960f8ac3
Sergei Golubchik authored Jun 03, 2024
```
move everything into a query-local memroot which is freed at the end
```
960f8ac3

mhnsw: simplify memory management of returned results · 57db6c20

Sergei Golubchik authored Jun 03, 2024

instead of pointers to FVectorRef's (which are stored elsewhere)
let's return one big array of all refs. Freeing this array will
free the complete result set.

57db6c20

misc changes · f5e2c4cf

Sergei Golubchik authored Jun 01, 2024

* sysvars should be REQUIRED_ARG
* fix a mix of US and UK spelling (use US)
* use consistent naming
* work if VEC_DISTANCE arguments are in the swapped order (const, col)
* work if VEC_DISTANCE argument is NULL/invalid or wrong length
* abort INSERT if the value is invalid or wrong length
* store the "number of neighbors" in a blob in endianness-independent way
* use field->store(longlong, bool) not field->store(double)
* a lot more error checking everywhere
* cleanup after errors
* simplify calling conventions, remove reinterpret_cast's
* todo/XXX comments
* whitespaces
* use float consistently

memory management is still totally PoC quality

Initial HNSW implementation

f5e2c4cf

Initial HNSW implementation · 5f6880c7

Vicențiu Ciorbaru authored Feb 17, 2024

This commit includes the work done in collaboration with Hugo Wen from
Amazon:

    MDEV-33408 Alter HNSW graph storage and fix memory leak

    This commit changes the way HNSW graph information is stored in the
    second table. Instead of storing connections as separate records, it now
    stores neighbors for each node, leading to significant performance
    improvements and storage savings.

    Comparing with the previous approach, the insert speed is 5 times faster,
    search speed improves by 23%, and storage usage is reduced by 73%, based
    on ann-benchmark tests with random-xs-20-euclidean and
    random-s-100-euclidean datasets.

    Additionally, in previous code, vector objects were not released after
    use, resulting in excessive memory consumption (over 20GB for building
    the index with 90,000 records), preventing tests with large datasets.
    Now ensure that vectors are released appropriately during the insert and
    search functions. Note there are still some vectors that need to be
    cleaned up after search query completion. Needs to be addressed in a
    future commit.

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.

As well as the commit:

    Introduce session variables to manage HNSW index parameters

    Three variables:

    hnsw_max_connection_per_layer
    hnsw_ef_constructor
    hnsw_ef_search

    ann-benchmark tool is also updated to support these variables in commit
    https://github.com/HugoWenTD/ann-benchmarks/commit/e09784e for branch
    https://github.com/HugoWenTD/ann-benchmarks/tree/mariadb-configurable

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.
Co-authored-by: Hugo Wen <wenhug@amazon.com>

5f6880c7

cleanup: simplify Queue<>, add const · 3f6348f0
Vicențiu Ciorbaru authored May 29, 2024
```
also add const to methods in List<> and Hash_set<>
while we're at it
```
3f6348f0
cleanup: C++11 range-based for loop for Hash_set<> · 9523b2e5
Sergei Golubchik authored Jul 12, 2024

9523b2e5

initial support for vector indexes · 42d0579c

Sergei Golubchik authored Jan 17, 2024

MDEV-33407 Parser support for vector indexes

The syntax is

  create table t1 (... vector index (v) ...);

limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported

MDEV-33404 Engine-independent indexes: subtable method

added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.

MDEV-33406 basic optimizer support for k-NN searches

for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.

42d0579c

cleanup: init_tmp_table_share(bool thread_specific) · c1558680

Sergei Golubchik authored Jul 18, 2024

let the caller tell init_tmp_table_share() whether the table
should be thread_specific or not.

In particular, internal tmp tables created in the slave thread
are perfectly thread specific

c1558680

cleanup: thd->alloc<>() and thd->calloc<>() · 537abed6

Sergei Golubchik authored Jun 01, 2024

create templates

  thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)

and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.

537abed6

Revert "MDEV-15458 Segfault in heap_scan() upon UPDATE after ADD SYSTEM VERSIONING" · dcc5fcd8

Sergei Golubchik authored Feb 09, 2024

This partially reverts 43623f04

Engines have to set ::position() after ::write_row(), otherwise
the server won't be able to refer to the row just inserted.
This is important for high-level indexes.

heap part isn't reverted, so heap doesn't support high-level indexes.
to fix this, it'll need info->lastpos in addition to info->current_ptr

dcc5fcd8

cleanup: unused function argument · f45e431e
Sergei Golubchik authored Jan 26, 2024

f45e431e
open frm for DROP TABLE · 11dc5c45
Sergei Golubchik authored Jan 27, 2024
```
needed to get partitioning and information about
secondary objects
```
11dc5c45
cleanup: extract ha_create_table_from_share() · a41093a3
Sergei Golubchik authored Jan 25, 2024

a41093a3
cleanup: generalize ER_INNODB_NO_FT_TEMP_TABLE · b8067642
Sergei Golubchik authored Jan 25, 2024

b8067642
cleanup: generalize ER_SPATIAL_CANT_HAVE_NULL · c3a479b2
Sergei Golubchik authored Jan 17, 2024

c3a479b2
cleanup: make_long_hash_field_name() and add_hash_field() · 94e1ce28
Sergei Golubchik authored Jan 18, 2024

94e1ce28

cleanup: key algorithm vs key flags · a0c7fb7a

Sergei Golubchik authored Jan 14, 2024

the information about index algorithm was stored in two
places inconsistently split between both.

BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user
explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not.

RTREE index had key->algorithm == HA_KEY_ALG_RTREE
and always had key->flags & HA_SPATIAL

FULLTEXT index had  key->algorithm == HA_KEY_ALG_FULLTEXT
and always had key->flags & HA_FULLTEXT

HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF

long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH

In this commit:

All indexes except BTREE and HASH always have key->algorithm
set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except
for storage to keep frms backward compatible).

As a side effect ALTER TABLE now detects FULLTEXT index renames correctly

a0c7fb7a

cleanup: Queue and Bounded_queue · f93ff671

Sergei Golubchik authored Feb 07, 2024

Bounded_queue<> pretended to be a typesafe C++ wrapper
on top of pure C queues.h.

But it wasn't, it was tightly bounded to filesort and only useful there.

* implement Queue<> - a typesafe C++ wrapper on top of QUEUE
* move Bounded_queue to filesort.cc, remove pointless "generalizations"
  change it to use Queue.
* remove bounded_queue.h
* change subselect_rowid_merge_engine to use Queue, not QUEUE

f93ff671

cleanup: lex_string_set3() · 4a04b28f
Sergei Golubchik authored Jan 27, 2024

4a04b28f
cleanup: remove unconditional #ifdef's · dbce3bc8
Sergei Golubchik authored Jan 10, 2024

dbce3bc8
cleanup: const in List::push_front() · ae2011d0
Sergei Golubchik authored Jun 15, 2024

ae2011d0
reject invalid spatial key declarations in the parser · 02eb1b1a
Sergei Golubchik authored Jan 08, 2024

02eb1b1a
cleanup: pass TABLE_SHARE to store_key_options() · 956a8a53
Sergei Golubchik authored Jan 26, 2024

956a8a53
cleanup: spaces, casts, comments · 70a9bcc4
Sergei Golubchik authored Jan 08, 2024

70a9bcc4
fix main.plugin_vars test to cleanup after itself · 95a69fca
Sergei Golubchik authored Feb 05, 2024

95a69fca
make INFORMATION_SCHEMA.STATISTICS.COMMENT not nullable · 5ac4bc45
Sergei Golubchik authored Jan 19, 2024
```
as it can never be null (only "" or "disabled")
```
5ac4bc45
MDEV-32885 VEC_DISTANCE() function · 796a3b99
Sergei Golubchik authored Nov 25, 2023

796a3b99

24 Jul, 2024 1 commit

Cleanup Whitespace in unittest/ directory · 4dde925f

Souradeep Saha authored Jun 27, 2024

Cleanup unnecessary whitespace at the end of lines and end of files
in the unittest/ directory. Note that all code changes are
non-functional.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.

4dde925f

17 Jul, 2024 1 commit

MDEV-33750: Rename mysql to mariadb in Debian directory · d8374262

Tuukka Pasanen authored May 27, 2024

As this is MariaDB then also variable names in
mariadb-server.*inst should correlate when possible
this change variable and directory names

d8374262

16 Jul, 2024 6 commits

MDEV-33988 DELETE single table to support table aliases · 75d354a2

Daniel Black authored May 01, 2024

Gain MySQL compatibility by allowing table aliases in a single
table statement.

This now supports the syntax of:

DELETE [delete_opts] FROM tbl_name [[AS] tbl_alias] [PARTITION (partition_name [, partition_name] ...)] ....

The delete.test is from MySQL commit 1a72b69778a9791be44525501960b08856833b8d
/ Change-Id: Iac3a2b5ed993f65b7f91acdfd60013c2344db5c0.

Co-Author: Gleb Shchepa <gleb.shchepa@oracle.com> (for delete.test)

Reviewed by Igor Babaev (igor@mariadb.com)

75d354a2

MDEV-34571: Fix funcs_1.is_columns_is_embedded · 0cd20e3a
Brandon Nesterenko authored Jul 16, 2024
```
Result file needed re-recording to account for the
new information_schema columns
```
0cd20e3a

MDEV-33627 : Implement option --dir in mariadb-import · 9e25d6f0

Vladislav Vaintroub authored Jun 03, 2024

With that, it is possible to restore the full "instance" from a backup
made with mariadb-dump --dir

The patch implements executing DDL (tables, views, triggers) using
statements that are stored in .sql file, created by mariadb-dump
--dir .

Care is taken of creating triggers correctly after the data is loaded,
disabling foreign keys and unique key checks etc.

The files are loaded in descending order by datafile size -
to ensure better work distribution when running with --parallel option.

In addition to --dir option, following options are implemented for
partial restore

include-only options:
--database             -  import one or several databases
--table                -  import one or several tables

exclude options:
--ignore-database      -. ignore one or several databases when importing
--ignore-table         -  to ignore one or several tables when importing

All options above are only valid together with --dir option,
and can be specified multiple times.

9e25d6f0

MDEV-33627 refactor threading in mariadb-import · 04988d87
Vladislav Vaintroub authored Jul 16, 2024
```
Use threadpool, instead of one-thread-and-connection-per-table
```
04988d87
MDEV-33627 preparation - tpool fix · c483c5ca
Vladislav Vaintroub authored Jul 16, 2024
```
Fix tpool to not use maintenance timer for fixed pool size.
```
c483c5ca
MDEV-33627 preparation - compile mysqlimport as C++ · 59167c56
Vladislav Vaintroub authored May 13, 2024

59167c56

12 Jul, 2024 1 commit

MDEV-34571 Add page accessed and pages read from disk to table_stats · ecc79611

Monty authored Jul 11, 2024

Trivial batch, using the handler statistics already collected for
the slow query log.

The reason for the changes in test cases was mainly to change to use
select TABLE_SCHEMA ... from information_schema.table_statistics instead
of 'show table_statistics' to avoid future changes to test results
if we add more columns to table_statistics.

ecc79611

11 Jul, 2024 2 commits

MDEV-19123 Debian configuration - no explicit configuration for ut8mb4 · d1af7fde

Daniel Black authored Jun 20, 2024

There is no need for a character-set-server configuration when utf8mb4
is now the server default.

Also remove the character-set-collations as its no longer required and
the uca1400_ai_ci is now the default for all character sets that support
it. ref: MDEV-25829 / MDEV-34430.

d1af7fde

MDEV-19123 Change default charset from latin1 to utf8mb4 · 36eba988
Alexander Barkov authored May 28, 2024
```
Changing the default server character set from latin1 to utf8mb4.
```
36eba988

10 Jul, 2024 2 commits
- Merge remote-tracking branch 'origin/11.5' into 11.6 · a2a5ba14
  Alexander Barkov authored Jul 10, 2024
  
  a2a5ba14
- Merge remote-tracking branch 'origin/11.4' into 11.5 · 4e805aed
  Alexander Barkov authored Jul 10, 2024
  
  4e805aed