• Alexander Barkov's avatar
    MDEV-11343 LOAD DATA INFILE fails to load data with an escape character... · dd0ff302
    Alexander Barkov authored
    MDEV-11343 LOAD DATA INFILE fails to load data with an escape character followed by a multi-byte character
    
    Partially backporting MDEV-9874 from 10.2 to 10.0
    
    READ_INFO::read_field() raised the ER_INVALID_CHARACTER_STRING error
    when reading an escape character followed by a multi-byte character.
    
    Raising wellformedness errors in READ_INFO::read_field() was wrong,
    because the main goal of READ_INFO::read_field() is to *unescape* the
    data which was presumably escaped using mysql_real_escape_string(),
    using the same character set with the one specified in
    "LOAD DATA INFILE ... CHARACTER SET ..." (or assumed by default).
    
    During LOAD DATA, multi-byte characters are not always scanned as a single
    entity! In case of escaped data, parts of a multi-byte character can be
    scanned on different loop iterations. So the old code erroneously tested
    welformedness in the middle of a multi-byte character.
    
    Moreover, the data after unescaping can go into a BLOB field, not a text field.
    Wellformedness tests are meaningless in this case.
    
    Ater this patch, wellformedness is only checked later, during
    Field::store(str,length,cs) time. The loop that scans bytes only
    makes sure to revert the changes made by mysql_real_escape_string().
    
    Note, in some cases users can supply data which did not really go through
    mysql_real_escape_string() and was escaped by some other means,
    or was not escaped at all. The file reported in this MDEV contains
    the string "\ä", which is an example of such improperly escaped data, as
    - either there should be two backslashes:   "\\ä"
    - or there should be no backslashes at all: "ä"
    mysql_real_escape_string() could not generate "\ä".
    dd0ff302
loaddata.result 18.4 KB