An error occurred fetching the project authors.
  1. 22 Aug, 2024 1 commit
  2. 18 Apr, 2024 1 commit
    • Alexander Barkov's avatar
      MDEV-31340 Remove MY_COLLATION_HANDLER::strcasecmp() · fd247cc2
      Alexander Barkov authored
      This patch also fixes:
        MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
        MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
        MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
        MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
        MDEV-33088 Cannot create triggers in the database `MYSQL`
        MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
        MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
        MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
        MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
        MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0
      
      - Removing the virtual function strnncoll() from MY_COLLATION_HANDLER
      
      - Adding a wrapper function CHARSET_INFO::streq(), to compare
        two strings for equality. For now it calls strnncoll() internally.
        In the future it will turn into a virtual function.
      
      - Adding new accent sensitive case insensitive collations:
          - utf8mb4_general1400_as_ci
          - utf8mb3_general1400_as_ci
        They implement accent sensitive case insensitive comparison.
        The weight of a character is equal to the code point of its
        upper case variant. These collations use Unicode-14.0.0 casefolding data.
      
        The result of
           my_charset_utf8mb3_general1400_as_ci.strcoll()
        is very close to the former
           my_charset_utf8mb3_general_ci.strcasecmp()
      
        There is only a difference in a couple dozen rare characters, because:
          - the switch from "tolower" to "toupper" comparison, to make
            utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
          - the switch from Unicode-3.0.0 to Unicode-14.0.0
        This difference should be tolarable. See the list of affected
        characters in the MDEV description.
      
        Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
        Unlike utf8mb4_general_ci, it does not treat all BMP characters
        as equal.
      
      - Adding classes representing names of the file based database objects:
      
          Lex_ident_db
          Lex_ident_table
          Lex_ident_trigger
      
        Their comparison collation depends on the underlying
        file system case sensitivity and on --lower-case-table-names
        and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.
      
      - Adding classes representing names of other database objects,
        whose names have case insensitive comparison style,
        using my_charset_utf8mb3_general1400_as_ci:
      
        Lex_ident_column
        Lex_ident_sys_var
        Lex_ident_user_var
        Lex_ident_sp_var
        Lex_ident_ps
        Lex_ident_i_s_table
        Lex_ident_window
        Lex_ident_func
        Lex_ident_partition
        Lex_ident_with_element
        Lex_ident_rpl_filter
        Lex_ident_master_info
        Lex_ident_host
        Lex_ident_locale
        Lex_ident_plugin
        Lex_ident_engine
        Lex_ident_server
        Lex_ident_savepoint
        Lex_ident_charset
        engine_option_value::Name
      
      - All the mentioned Lex_ident_xxx classes implement a method streq():
      
        if (ident1.streq(ident2))
           do_equal();
      
        This method works as a wrapper for CHARSET_INFO::streq().
      
      - Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
        in class members and in function/method parameters.
      
      - Replacing all calls like
          system_charset_info->coll->strcasecmp(ident1, ident2)
        to
          ident1.streq(ident2)
      
      - Taking advantage of the c++11 user defined literal operator
        for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
        data types. Use example:
      
        const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;
      
        is now a shorter version of:
      
        const Lex_ident_column primary_key_name=
          Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
      fd247cc2
  3. 18 Apr, 2023 1 commit
    • Alexander Barkov's avatar
      MDEV-31071 Refactor case folding data types in Unicode collations · 6075f12c
      Alexander Barkov authored
      This is a non-functional change. It changes the way how case folding data
      and weight data (for simple Unicode collations) are stored:
      
      - Removing data types MY_UNICASE_CHARACTER, MY_UNICASE_INFO
      - Using data types MY_CASEFOLD_CHARACTER, MY_CASEFOLD_INFO instead.
      
      This patch changes simple Unicode collations in a similar way
      how MDEV-30695 previously changed Asian collations.
      
      No new MTR tests are needed. The underlying code is thoroughly
      covered by a number of ctype_*_ws.test and ctype_*_casefold.test
      files, which were added recently as a preparation
      for this change.
      
      Old and new Unicode data layout
      -------------------------------
      
      Case folding data is now stored in separate tables
      consisting of MY_CASEFOLD_CHARACTER elements with two members:
      
          typedef struct casefold_info_char_t
          {
            uint32 toupper;
            uint32 tolower;
          } MY_CASEFOLD_CHARACTER;
      
      while weight data (for simple non-UCA collations xxx_general_ci
      and xxx_general_mysql500_ci) is stored in separate arrays of
      uint16 elements.
      
      Before this change case folding data and simple weight data were
      stored together, in tables of the following elements with three members:
      
          typedef struct unicase_info_char_st
          {
            uint32 toupper;
            uint32 tolower;
            uint32 sort;          /* weights for simple collations */
          } MY_UNICASE_CHARACTER;
      
      This data format was redundant, because weights (the "sort" member) were
      needed only for these two simple Unicode collations:
      - xxx_general_ci
      - xxx_general_mysql500_ci
      
      Adding case folding information for Unicode-14.0.0 using the old
      format would waste memory without purpose.
      
      Detailed changes
      ----------------
      - Changing the underlying data types as described above
      
      - Including unidata-dump.c into the sources.
        This program was earlier used to dump UnicodeData.txt
        (e.g. https://www.unicode.org/Public/14.0.0/ucd/UnicodeData.txt)
        into MySQL / MariaDB source files.
        It was originally written in 2002, but has not been distributed yet
        together with MySQL / MariaDB sources.
      
      - Removing the old format Unicode data earlier dumped from UnicodeData.txt
        (versions 3.0.0 and 5.2.0) from ctype-utf8.c.
        Adding Unicode data in the new format into separate header files,
        to maintain the code easier:
      
          - ctype-unicode300-casefold.h
          - ctype-unicode300-casefold-tr.h
          - ctype-unicode300-general_ci.h
          - ctype-unicode300-general_mysql500_ci.h
          - ctype-unicode520-casefold.h
      
      - Adding a new file ctype-unidata.c as an aggregator for
        the header files listed above.
      6075f12c
  4. 04 Apr, 2023 1 commit
    • Alexander Barkov's avatar
      MDEV-30034 UNIQUE USING HASH accepts duplicate entries for tricky collations · 8020b1bd
      Alexander Barkov authored
      - Adding a new argument "flag" to MY_COLLATION_HANDLER::strnncollsp_nchars()
        and a flag MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES.
        The flag defines if strnncollsp_nchars() should emulate trailing spaces
        which were possibly trimmed earlier (e.g. in InnoDB CHAR compression).
        This is important for NOPAD collations.
      
        For example, with this input:
         - str1= 'a '    (Latin letter a followed by one space)
         - str2= 'a  '   (Latin letter a followed by two spaces)
         - nchars= 3
        if the flag is given, strnncollsp_nchars() will virtually restore
        one trailing space to str1 up to nchars (3) characters and compare two
        strings as equal:
        - str1= 'a  '  (one extra trailing space emulated)
        - str2= 'a  '  (as is)
      
        If the flag is not given, strnncollsp_nchars() does not add trailing
        virtual spaces, so in case of a NOPAD collation, str1 will be compared
        as less than str2 because it is shorter.
      
      - Field_string::cmp_prefix() now passes the new flag.
        Field_varstring::cmp_prefix() and Field_blob::cmp_prefix() do
        not pass the new flag.
      
      - The branch in cmp_whole_field() in storage/innobase/rem/rem0cmp.cc
        (which handles the CHAR data type) now also passed the new flag.
      
      - Fixing UCA collations to respect the new flag.
        Other collations are possibly also affected, however
        I had no success in making an SQL script demonstrating the problem.
        Other collations will be extended to respect this flags in a separate
        patch later.
      
      - Changing the meaning of the last parameter of Field::cmp_prefix()
        from "number of bytes" (internal length)
        to "number of characters" (user visible length).
      
        The code calling cmp_prefix() from handler.cc was wrong.
        After this change, the call in handler.cc became correct.
      
        The code calling cmp_prefix() from key_rec_cmp() in key.cc
        was adjusted according to this change.
      
      - Old strnncollsp_nchar() related tests in unittest/strings/strings-t.c
        now pass the new flag.
        A few new tests also were added, without the flag.
      8020b1bd
  5. 17 Jan, 2022 1 commit
  6. 13 Sep, 2021 1 commit
  7. 29 Jan, 2020 1 commit
  8. 28 Jan, 2020 1 commit
  9. 28 Jun, 2019 1 commit
  10. 19 Oct, 2018 1 commit
  11. 06 Sep, 2016 1 commit
  12. 31 Mar, 2016 1 commit
    • Alexander Barkov's avatar
      MDEV-8360 Clean-up CHARSET_INFO: strnncollsp: diff_if_only_endspace_difference · 1d73005b
      Alexander Barkov authored
      - Removing the "diff_if_only_endspace_difference" argument from
        MY_COLLATION_HANDLER::strnncollsp(), my_strnncollsp_simple(),
        as well as in the function template MY_FUNCTION_NAME(strnncollsp)
        in strcoll.ic
      
      - Removing the "diff_if_only_space_different" from ha_compare_text(),
        hp_rec_key_cmp().
      
      - Adding a new function my_strnncollsp_padspace_bin() and reusing
        it instead of duplicate code pieces in my_strnncollsp_8bit_bin(),
        my_strnncollsp_latin1_de(), my_strnncollsp_tis620(),
        my_strnncollsp_utf8_cs().
      
      - Adding more tests for better coverage of the trailing space handling.
      
      - Removing the unused definition of HA_END_SPACE_ARE_EQUAL
      1d73005b
  13. 23 Mar, 2016 1 commit
  14. 06 Jul, 2015 2 commits
  15. 03 Jul, 2015 2 commits
  16. 26 Jun, 2015 1 commit