An error occurred fetching the project authors.
  1. 22 Aug, 2024 1 commit
  2. 18 Apr, 2024 1 commit
    • Alexander Barkov's avatar
      MDEV-31340 Remove MY_COLLATION_HANDLER::strcasecmp() · fd247cc2
      Alexander Barkov authored
      This patch also fixes:
        MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
        MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
        MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
        MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
        MDEV-33088 Cannot create triggers in the database `MYSQL`
        MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
        MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
        MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
        MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
        MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0
      
      - Removing the virtual function strnncoll() from MY_COLLATION_HANDLER
      
      - Adding a wrapper function CHARSET_INFO::streq(), to compare
        two strings for equality. For now it calls strnncoll() internally.
        In the future it will turn into a virtual function.
      
      - Adding new accent sensitive case insensitive collations:
          - utf8mb4_general1400_as_ci
          - utf8mb3_general1400_as_ci
        They implement accent sensitive case insensitive comparison.
        The weight of a character is equal to the code point of its
        upper case variant. These collations use Unicode-14.0.0 casefolding data.
      
        The result of
           my_charset_utf8mb3_general1400_as_ci.strcoll()
        is very close to the former
           my_charset_utf8mb3_general_ci.strcasecmp()
      
        There is only a difference in a couple dozen rare characters, because:
          - the switch from "tolower" to "toupper" comparison, to make
            utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
          - the switch from Unicode-3.0.0 to Unicode-14.0.0
        This difference should be tolarable. See the list of affected
        characters in the MDEV description.
      
        Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
        Unlike utf8mb4_general_ci, it does not treat all BMP characters
        as equal.
      
      - Adding classes representing names of the file based database objects:
      
          Lex_ident_db
          Lex_ident_table
          Lex_ident_trigger
      
        Their comparison collation depends on the underlying
        file system case sensitivity and on --lower-case-table-names
        and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.
      
      - Adding classes representing names of other database objects,
        whose names have case insensitive comparison style,
        using my_charset_utf8mb3_general1400_as_ci:
      
        Lex_ident_column
        Lex_ident_sys_var
        Lex_ident_user_var
        Lex_ident_sp_var
        Lex_ident_ps
        Lex_ident_i_s_table
        Lex_ident_window
        Lex_ident_func
        Lex_ident_partition
        Lex_ident_with_element
        Lex_ident_rpl_filter
        Lex_ident_master_info
        Lex_ident_host
        Lex_ident_locale
        Lex_ident_plugin
        Lex_ident_engine
        Lex_ident_server
        Lex_ident_savepoint
        Lex_ident_charset
        engine_option_value::Name
      
      - All the mentioned Lex_ident_xxx classes implement a method streq():
      
        if (ident1.streq(ident2))
           do_equal();
      
        This method works as a wrapper for CHARSET_INFO::streq().
      
      - Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
        in class members and in function/method parameters.
      
      - Replacing all calls like
          system_charset_info->coll->strcasecmp(ident1, ident2)
        to
          ident1.streq(ident2)
      
      - Taking advantage of the c++11 user defined literal operator
        for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
        data types. Use example:
      
        const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;
      
        is now a shorter version of:
      
        const Lex_ident_column primary_key_name=
          Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
      fd247cc2
  3. 12 Mar, 2024 1 commit
    • Alexander Barkov's avatar
      MDEV-33621 Unify duplicate code in my_wildcmp_uca_impl() and my_wildcmp_unicode_impl() · 1e889a6e
      Alexander Barkov authored
      This is a refactoring patch, it does not change the behaviour.
      The MTR tests are being added only to cover the LIKE predicate better.
      (these tests should have been added earlier under terms of MDEV 9711).
      This patch does not need its own specific MTR tests.
      
      Moving the duplicate code into a new shared file ctype-wildcmp.inl
      and including it from multiple places, to define the following functions:
      
      - my_wildcmp_uca_impl(), in ctype-uca.c
      
        For utf8mb3, utf8mb4, ucs2, utf16, utf32, using cs->cset->mb_wc().
        For UCA based collations.
      
      - my_wildcmp_mb2_or_mb4_general_ci_impl(), in ctype-ucs2.c:
      
        For ucs2, utf16, utf32, using cs->cset->mb_wc().
        For general_ci-style collations:
            - xxx_general_ci
            - xxx_general_mysql500_ci
            - xxx_general_nopad_ci
      
      - my_wildcmp_mb2_or_mb4_bin_impl(), in ctype-ucs2.c:
      
        For ucs2, utf16, utf32, using cs->cset->mb_wc().
        For _bin collations:
            - xxx_bin
            - xxx_nopad_bin
      
      - my_wildcmp_utf8mb3_general_ci_impl(), in ctype-utf8.c
      
        Optimized for utf8mb3, using my_mb_wc_utf8mb3_quick().
      
        For general_ci-style collations:
            - utf8mb3_general_ci
            - utf8mb3_general_mysql500_ci
            - utf8mb3_general_nopad_ci
      
      - my_wildcmp_utf8mb4_general_ci_impl(), in ctype-utf8.c
      
        Optimized for utf8mb4, using my_mb_wc_utf8mb4_quick().
      
        For general_ci-style collations:
            - utf8mb4_general_ci
            - utf8mb4_general_nopad_ci
      1e889a6e
  4. 28 Feb, 2024 1 commit
    • Alexander Barkov's avatar
      MDEV-31531 Remove my_casedn_str() and my_caseup_str() · 929c2e06
      Alexander Barkov authored
      Under terms of MDEV 27490 we'll add support for non-BMP identifiers
      and upgrade casefolding information to Unicode version 14.0.0.
      In Unicode-14.0.0 conversion to lower and upper cases can increase octet length
      of the string, so conversion won't be possible in-place any more.
      
      This patch removes virtual functions performing in-place casefolding:
        - my_charset_handler_st::casedn_str()
        - my_charset_handler_st::caseup_str()
      and fixes the code to use the non-inplace functions instead:
        - my_charset_handler_st::casedn()
        - my_charset_handler_st::caseup()
      929c2e06
  5. 19 Oct, 2023 1 commit
    • Sergei Petrunia's avatar
      MDEV-32113: utf8mb3_key_col=utf8mb4_value cannot be used for ref · 4941ac91
      Sergei Petrunia authored
      (Variant#3: Allow cross-charset comparisons, use a special
      CHARSET_INFO to create lookup keys. Review input addressed.)
      
      Equalities that compare utf8mb{3,4}_general_ci strings, like:
      
        WHERE ... utf8mb3_key_col=utf8mb4_value    (MB3-4-CMP)
      
      can now be used to construct ref[const] access and also participate
      in multiple-equalities.
      This means that utf8mb3_key_col can be used for key-lookups when
      compared with an utf8mb4 constant, field or expression using '=' or
      '<=>' comparison operators.
      
      This is controlled by optimizer_switch='cset_narrowing=on', which is
      OFF by default.
      
      IMPLEMENTATION
      Item value comparison in (MB3-4-CMP) is done using utf8mb4_general_ci.
      This is valid as any utf8mb3 value is also an utf8mb4 value.
      
      When making index lookup value for utf8mb3_key_col, we do "Charset
      Narrowing": characters that are in the Basic Multilingual Plane (=BMP) are
      copied as-is, as they can be represented in utf8mb3. Characters that are
      outside the BMP cannot be represented in utf8mb3 and are replaced
      with U+FFFD, the "Replacement Character".
      
      In utf8mb4_general_ci, the Replacement Character compares as equal to any
      character that's not in BMP. Because of this, the constructed lookup value
      will find all index records that would be considered equal by the original
      condition (MB3-4-CMP).
      Approved-by: default avatarMonty <monty@mariadb.org>
      4941ac91
  6. 18 Oct, 2023 1 commit
    • Xiaotong Niu's avatar
      MDEV-26494 Fix buffer overflow of string lib on Arm64 · 8f2f8f31
      Xiaotong Niu authored
      In the hexlo function, the element type of the array hex_lo_digit is not
      explicitly declared as signed char, causing elements with a value of -1
      to be converted to 255 on Arm64. The problem occurs because "char" is
      unsigned by default on Arm64 compiler, but signed on x86 compiler. This
      problem can be seen in https://godbolt.org/z/rT775xshj
      
      The above issue causes "use-after-poison" exception in my_mb_wc_filename
      function. The code snippet where the error occurred is shown below,
      copied from below link.
      https://github.com/MariaDB/server/blob/5fc19e71375fb39eb85354321bf852d998aecf81/strings/ctype-utf8.c#L2728
      
      2728    if ((byte1= hexlo(byte1)) >= 0 &&
      2729     (byte2= hexlo(byte2)) >= 0)
        	{
      2731    	int byte3= hexlo(s[3]);
          		…
        	}
      
      At line 2729, when byte2 is 0, which indicates the end of the string s.
      (1) On x86, hexlo(0) return -1 and line 2731 is skipped, as expected.
      (2) On Arm64, hexlo(0) return 255 and line 2731 is executed, not as
      expected, accessing s[3] after the null character of string s, thus
      raising the "user-after-poison" error.
      
      The problem was discovered when executing the main.mysqlcheck test.
      Signed-off-by: default avatarXiaotong Niu <xiaotong.niu@arm.com>
      8f2f8f31
  7. 12 Sep, 2023 1 commit
    • Sergei Petrunia's avatar
      MDEV-31496: Make optimizer handle UCASE(varchar_col)=... · e987b935
      Sergei Petrunia authored
      (Review input addressed)
      (Added handling of UPDATE/DELETE and partitioning w/o index)
      
      If the properties of the used collation allow, do the following
      equivalent rewrites:
      
      1. UPPER(key_col)=expr  ->  key_col=expr
         expr=UPPER(key_col)  ->  expr=key_col
         (also rewrite both sides of the equality at the same time)
      
      2. UPPER(key_col) IN (constant-list)  -> key_col IN (constant-list)
      
      - Mark utf8mb{3,4}_general_ci as collations that allow this.
      - Add optimizer_switch='sargable_casefold=ON' to control this.
        (ON by default in this patch)
      - Cover the rewrite in Optimizer Trace, rewrite name is
        "sargable_casefold_removal".
      e987b935
  8. 18 Apr, 2023 4 commits
    • Alexander Barkov's avatar
      MDEV-30577 Case folding for uca1400 collations is not up to date · c21745db
      Alexander Barkov authored
      Adding casefolding for Unicode-14.0.0 into uca1400 collations.
      c21745db
    • Alexander Barkov's avatar
      MDEV-31071 Refactor case folding data types in Unicode collations · 6075f12c
      Alexander Barkov authored
      This is a non-functional change. It changes the way how case folding data
      and weight data (for simple Unicode collations) are stored:
      
      - Removing data types MY_UNICASE_CHARACTER, MY_UNICASE_INFO
      - Using data types MY_CASEFOLD_CHARACTER, MY_CASEFOLD_INFO instead.
      
      This patch changes simple Unicode collations in a similar way
      how MDEV-30695 previously changed Asian collations.
      
      No new MTR tests are needed. The underlying code is thoroughly
      covered by a number of ctype_*_ws.test and ctype_*_casefold.test
      files, which were added recently as a preparation
      for this change.
      
      Old and new Unicode data layout
      -------------------------------
      
      Case folding data is now stored in separate tables
      consisting of MY_CASEFOLD_CHARACTER elements with two members:
      
          typedef struct casefold_info_char_t
          {
            uint32 toupper;
            uint32 tolower;
          } MY_CASEFOLD_CHARACTER;
      
      while weight data (for simple non-UCA collations xxx_general_ci
      and xxx_general_mysql500_ci) is stored in separate arrays of
      uint16 elements.
      
      Before this change case folding data and simple weight data were
      stored together, in tables of the following elements with three members:
      
          typedef struct unicase_info_char_st
          {
            uint32 toupper;
            uint32 tolower;
            uint32 sort;          /* weights for simple collations */
          } MY_UNICASE_CHARACTER;
      
      This data format was redundant, because weights (the "sort" member) were
      needed only for these two simple Unicode collations:
      - xxx_general_ci
      - xxx_general_mysql500_ci
      
      Adding case folding information for Unicode-14.0.0 using the old
      format would waste memory without purpose.
      
      Detailed changes
      ----------------
      - Changing the underlying data types as described above
      
      - Including unidata-dump.c into the sources.
        This program was earlier used to dump UnicodeData.txt
        (e.g. https://www.unicode.org/Public/14.0.0/ucd/UnicodeData.txt)
        into MySQL / MariaDB source files.
        It was originally written in 2002, but has not been distributed yet
        together with MySQL / MariaDB sources.
      
      - Removing the old format Unicode data earlier dumped from UnicodeData.txt
        (versions 3.0.0 and 5.2.0) from ctype-utf8.c.
        Adding Unicode data in the new format into separate header files,
        to maintain the code easier:
      
          - ctype-unicode300-casefold.h
          - ctype-unicode300-casefold-tr.h
          - ctype-unicode300-general_ci.h
          - ctype-unicode300-general_mysql500_ci.h
          - ctype-unicode520-casefold.h
      
      - Adding a new file ctype-unidata.c as an aggregator for
        the header files listed above.
      6075f12c
    • Alexander Barkov's avatar
      MDEV-31069 Reuse duplicate char-to-weight conversion code in ctype-utf8.c and ctype-ucs2.c · 2ad287ca
      Alexander Barkov authored
      Removing similar functions from ctype-utf8.c and ctype-ucs2.c
      
      - my_tosort_utf16()
      - my_tosort_utf32()
      - my_tosort_ucs2()
      - my_tosort_unicode()
      
      Adding new shared functions into ctype-unidata.h:
      
      - my_tosort_unicode_bmp()  - reused for utf8mb3, ucs2
      - my_tosort_unicode()      - reused for utf8mb4, utf16, utf32
      
      For simplicity, the new version of my_tosort_unicode*()
      does not include the code handling the MY_CS_LOWER_SORT flag because:
      - it affects performance negatively
      - we don't have any collations with this flag yet anyway
      (This code was most likely earlier erroneously merged from
      MySQL's utf8_tolower_ci at some point.)
      2ad287ca
    • Alexander Barkov's avatar
  9. 01 Mar, 2023 1 commit
    • Alexander Barkov's avatar
      MDEV-30746 Regression in ucs2_general_mysql500_ci · 965bdf3e
      Alexander Barkov authored
      1. Adding a separate MY_COLLATION_HANDLER
         my_collation_ucs2_general_mysql500_ci_handler
         implementing a proper order for ucs2_general_mysql500_ci
         The problem happened because ucs2_general_mysql500_ci
         erroneously used my_collation_ucs2_general_ci_handler.
      
      2. Cosmetic changes: Renaming:
         - plane00_mysql500 to my_unicase_mysql500_page00
         - my_unicase_pages_mysql500 to my_unicase_mysql500_pages
         to use the same naming style with:
         - my_unicase_default_page00
         - my_unicase_defaul_pages
      
      3. Moving code fragments from
         - handler::check_collation_compatibility() in handler.cc
         - upgrade_collation() in table.cc
         into new methods in class Charset, to reuse the code easier.
      965bdf3e
  10. 23 Feb, 2023 1 commit
  11. 21 Feb, 2023 1 commit
    • Alexander Barkov's avatar
      MDEV-30695 Refactor case folding data types in Asian collations · 33f8f92b
      Alexander Barkov authored
      This is a non-functional change and should not change the server behavior.
      
      Casefolding information is now stored in items of a new data type MY_CASEFOLD_CHARACTER:
      
      typedef struct casefold_info_char_t
      {
        uint32 toupper;
        uint32 tolower;
      } MY_CASEFOLD_CHARACTER;
      
      Before this change, casefolding tables for Asian collations were stored in:
      
      typedef struct unicase_info_char_st
      {
        uint32 toupper;
        uint32 tolower;
        uint32 sort;
      } MY_UNICASE_CHARACTER;
      
      The "sort" member was not used in the code handling Asian collations,
      it only wasted space.
      (it's only used by Unicode _general_ci and _general_mysql500_ci collations).
      
      Unicode collations (at least UCA and _bin) should also be refactored later,
      but under terms of a separate task.
      33f8f92b
  12. 17 Feb, 2023 1 commit
    • Alexander Barkov's avatar
      MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations for utf8 · 7f6b648d
      Alexander Barkov authored
      String length growth during upper/lower conversion
      in Unicode collations depends only on the underlying MY_UNICASE_INFO
      used in the collation.
      
      Maintaining a separate member CHARSET_INFO::caseup_multiply and
      CHARSET_INFO::casedn_multiply duplicated this information
      and caused bugs like this (when MY_UNICASE_INFO and case??_multiply
      when out of sync because of incomplete CHARSET_INFO initialization).
      
      Fix:
      
      Changing CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply
      from members to virtual functions.
      The virtual functions in Unicode collations calculate case conversion
      growth factors from the MY_UNICASE_INFO. This guarantees that the growth
      factors are always in sync with the MY_UNICASE_INFO.
      7f6b648d
  13. 10 Aug, 2022 1 commit
    • Alexander Barkov's avatar
      MDEV-27009 Add UCA-14.0.0 collations · 13344682
      Alexander Barkov authored
      - Added one neutral and 22 tailored (language specific) collations based on
        Unicode Collation Algorithm version 14.0.0.
      
        Collations were added for Unicode character sets
        utf8mb3, utf8mb4, ucs2, utf16, utf32.
      
        Every tailoring was added with four accent and case
        sensitivity flag combinations, e.g:
      
        * utf8mb4_uca1400_swedish_as_cs
        * utf8mb4_uca1400_swedish_as_ci
        * utf8mb4_uca1400_swedish_ai_cs
        * utf8mb4_uca1400_swedish_ai_ci
      
        and their _nopad_ variants:
      
        * utf8mb4_uca1400_swedish_nopad_as_cs
        * utf8mb4_uca1400_swedish_nopad_as_ci
        * utf8mb4_uca1400_swedish_nopad_ai_cs
        * utf8mb4_uca1400_swedish_nopad_ai_ci
      
      - Introducing a conception of contextually typed named collations:
      
        CREATE DATABASE db1 CHARACTER SET utf8mb4;
        CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci);
      
        The idea is that there is no a need to specify the character set prefix
        in the new collation names. It's enough to type just the suffix
        "uca1400_as_ci". The character set is taken from the context.
      
        In the above example script the context character set is utf8mb4.
        So the CREATE TABLE will make a column with the collation
        utf8mb4_uca1400_as_ci.
      
        Short collations names can be used in any parts of the SQL syntax
        where the COLLATE clause is understood.
      
      - New collations are displayed only one time
        (without character set combinations) by these statements:
      
           SELECT * FROM INFORMATION_SCHEMA.COLLATIONS;
           SHOW COLLATION;
      
        For example, all these collations:
        - utf8mb3_uca1400_swedish_as_ci
        - utf8mb4_uca1400_swedish_as_ci
        - ucs2_uca1400_swedish_as_ci
        - utf16_uca1400_swedish_as_ci
        - utf32_uca1400_swedish_as_ci
        have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION,
        with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix
        without the character set name:
      
      SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS
      WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';
      
      +-----------------------+
      | COLLATION_NAME        |
      +-----------------------+
      | uca1400_swedish_as_ci |
      +-----------------------+
      
        Note, the behaviour of old collations did not change.
        Non-unicode collations (e.g. latin1_swedish_ci) and
        old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci)
        are still displayed with the character set prefix, as before.
      
      - The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed.
      
        The NOT NULL constraint was removed from these columns:
        - CHARACTER_SET_NAME
        - ID
        - IS_DEFAULT
        and from the corresponding columns in SHOW COLLATION.
      
        For example:
      
      SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
      FROM INFORMATION_SCHEMA.COLLATIONS
      WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';
      +-----------------------+--------------------+------+------------+
      | COLLATION_NAME        | CHARACTER_SET_NAME | ID   | IS_DEFAULT |
      +-----------------------+--------------------+------+------------+
      | uca1400_swedish_as_ci | NULL               | NULL | NULL       |
      +-----------------------+--------------------+------+------------+
      
        The NULL value in these columns now means that the collation
        is applicable to multiple character sets.
        The behavioir of old collations did not change.
        Make sure your client programs can handle NULL values in these columns.
      
      - The structure of the table
        INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed.
      
        Three new NOT NULL columns were added:
        - FULL_COLLATION_NAME
        - ID
        - IS_DEFAULT
      
        New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY.
        The column COLLATION_NAME contains the collation name without the character
        set prefix. The column FULL_COLLATION_NAME contains the collation name with
        the character set prefix.
      
        Old collations have full collation name in both FULL_COLLATION_NAME and
        COLLATION_NAME.
      
      SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
      FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
      WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4|latin1).*swedish.*ci$';
      +-----------------------------+-------------------------------------+--------------------+------+------------+
      | COLLATION_NAME              | FULL_COLLATION_NAME                 | CHARACTER_SET_NAME | ID   | IS_DEFAULT |
      +-----------------------------+-------------------------------------+--------------------+------+------------+
      | latin1_swedish_ci           | latin1_swedish_ci                   | latin1             |    8 | Yes        |
      | latin1_swedish_nopad_ci     | latin1_swedish_nopad_ci             | latin1             | 1032 |            |
      | utf8mb4_swedish_ci          | utf8mb4_swedish_ci                  | utf8mb4            |  232 |            |
      | uca1400_swedish_ai_ci       | utf8mb4_uca1400_swedish_ai_ci       | utf8mb4            | 2368 |            |
      | uca1400_swedish_as_ci       | utf8mb4_uca1400_swedish_as_ci       | utf8mb4            | 2370 |            |
      | uca1400_swedish_nopad_ai_ci | utf8mb4_uca1400_swedish_nopad_ai_ci | utf8mb4            | 2372 |            |
      | uca1400_swedish_nopad_as_ci | utf8mb4_uca1400_swedish_nopad_as_ci | utf8mb4            | 2374 |            |
      +-----------------------------+-------------------------------------+--------------------+------+------------+
      
      - Other INFORMATION_SCHEMA queries:
      
        SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS;
        SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS;
        SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES;
        SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
        SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES;
        SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS;
        SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS;
        SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES;
        SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES;
        SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS;
        SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS;
        SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS;
      
        display full collation names, including character sets prefix,
        for all collations, including new collations.
      
        Corresponding SHOW commands also display full collation names
        in collation related columns:
      
        SHOW CREATE TABLE t1;
        SHOW CREATE DATABASE db1;
        SHOW TABLE STATUS;
        SHOW CREATE FUNCTION f1;
        SHOW CREATE PROCEDURE p1;
        SHOW CREATE EVENT ev1;
        SHOW CREATE TRIGGER tr1;
        SHOW CREATE VIEW;
      
        These INFORMATION_SCHEMA queries and SHOW statements may change in
        the future, to display show collation names.
      13344682
  14. 21 Jan, 2022 1 commit
  15. 17 Jan, 2022 1 commit
  16. 27 Sep, 2021 1 commit
  17. 13 Sep, 2021 1 commit
  18. 19 May, 2021 2 commits
    • Monty's avatar
      Change CHARSET_INFO character set and collaction names to LEX_CSTRING · a206658b
      Monty authored
      This change removed 68 explict strlen() calls from the code.
      
      The following renames was done to ensure we don't use the old names
      when merging code from earlier releases, as using the new variables
      for print function could result in crashes:
      - charset->csname renamed to charset->cs_name
      - charset->name renamed to charset->coll_name
      
      Almost everything where mechanical changes except:
      - Changed to use the new Protocol::store(LEX_CSTRING..) when possible
      - Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
      - Changed to use String->append(LEX_CSTRING&) when possible
      
      Other things:
      - There where compiler issues with ensuring that all character set names
        points to the same string: gcc doesn't allow one to use integer constants
        when defining global structures (constant char * pointers works fine).
        To get around this, I declared defines for each character set name
        length.
      a206658b
    • Rucha Deodhar's avatar
      MDEV-8334: Rename utf8 to utf8mb3 · 2fdb556e
      Rucha Deodhar authored
      This patch changes the main name of 3 byte character set from utf8 to
      utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
      so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
      2fdb556e
  19. 23 Jul, 2020 1 commit
    • Monty's avatar
      MDEV-7947 strcmp() takes 0.37% in OLTP RO · dbcd3384
      Monty authored
      This patch ensures that all identical character sets shares the same
      cs->csname.
      This allows us to replace strcmp() in my_charset_same() with comparisons
      of pointers. This fixes a long standing performance issue that could cause
      as strcmp() for every item sent trough the protocol class to the end user.
      
      One consequence of this patch is that we don't allow one to add a character
      definition in the Index.xml file that changes the csname of an existing
      character set. This is by design as changing character set names of existing
      ones is extremely dangerous, especially as some storage engines just records
      character set numbers.
      
      As we now have a hash over character set's csname, we can in the future
      use that for faster access to a specific character set. This could be done
      by changing the hash to non unique and use the hash to find the next
      character set with same csname.
      dbcd3384
  20. 10 Jun, 2020 1 commit
    • Alexander Barkov's avatar
      MDEV-22849 Reuse skip_trailing_space() in my_hash_sort_utf8mbX · 9b9a354d
      Alexander Barkov authored
      Replacing the slow loop in my_hash_sort_utf8mbX() to the fast
      skip_trailing_spaces(), which consumes 8 bytes in one iteration,
      and is around 8 times faster on long data.
      
      Also, renaming:
      - my_hash_sort_utf8() to my_hash_sort_utf8mb3()
      - my_hash_sort_utf8_nopad() to my_hash_sort_utf8mb3_nopad()
      to merge to 10.5 easier (automatically?).
      9b9a354d
  21. 09 May, 2020 1 commit
  22. 28 Jan, 2020 1 commit
  23. 28 Jun, 2019 1 commit
  24. 11 May, 2019 1 commit
  25. 03 Apr, 2019 1 commit
  26. 19 Oct, 2018 1 commit
  27. 17 Oct, 2018 1 commit
  28. 15 Oct, 2018 1 commit
  29. 19 Jul, 2018 1 commit
    • Alexander Barkov's avatar
      Simplify caseup() and casedn() in charsets · e2ac4098
      Alexander Barkov authored
      After the MDEV-13118 fix there's no code in the server that
      wants caseup/casedn to change the argument in place for simple
      charsets.  Let's remove this logic and always return the result in a
      new string for all charsets, both simple and complex.
      
      1. Removing the optimization that *some* character sets used in casedn()
        and caseup(), which allowed (and required) to change the case in-place,
        overwriting the string passed as the "src" argument.
        Now all CHARSET_INFO's work in the same way:
        non of them change the source string in-place, all of them now convert
        case from the source string to the destination string, leaving
        the source string untouched.
      
      2. Adding "const" qualifier to the "char *src" parameter
         to caseup() and casedn().
      
      3. Removing duplicate implementations in ctype-mb.c.
        Now both caseup() and casedn() implementations for all CJK character sets
        use internally the same function my_casefold_mb()
        (the former my_casefold_mb_varlen()).
      
      4. Removing the "unused" attribute from parameters of some my_case{up|dn}_xxx()
         implementations, as the affected parameters are now *used* in the code.
         Previously these parameters were used only in DBUG_ASSERT().
      e2ac4098
  30. 05 Apr, 2018 1 commit
    • luz.paz's avatar
      Misc. typos · 3dd01669
      luz.paz authored
      Found via `codespell -i 3 -w --skip="./debian/po" -I ../mariadb-server-word-whitelist.txt  ./cmake/ ./debian/ ./Docs/ ./include/ ./man/ ./plugin/ ./strings/`
      3dd01669
  31. 17 Oct, 2017 1 commit
  32. 17 May, 2017 2 commits
  33. 10 Mar, 2017 1 commit
  34. 24 Nov, 2016 1 commit
  35. 10 Oct, 2016 1 commit