• Kirill Smelkov's avatar
    golang_str,strconv: Fix decoding of rune-error · 598eb479
    Kirill Smelkov authored
    Error rune (u+fffd) is returned by _utf8_decode_rune to indicate an
    error in decoding. But the error rune itself is valid unicode codepoint:
    
       >>> x = u"�"
       >>> x
       u'\ufffd'
       >>> x.encode('utf-8')
       '\xef\xbf\xbd'
    
    This way only (r=_rune_error, size=1) should be treated by the caller as
    utf8 decoding error.
    
    But e.g. strconv.quote was not careful to also inspect the size, and this way
    was quoting � into just "\xef" instead of "\xef\xbf\xbd".
    _utf8_decode_surrogateescape was also subject to similar error.
    
    -> Fix it.
    
    Without the fix e.g. added test for strconv.quote fails as
    
        >           assert quote(tin) == tquoted
        E           assert '"\xef"' == '"�"'
        E             - "\xef"
        E             + "�"
    
    /reviewed-by @jerome
    /reviewed-at nexedi/pygolang!18
    598eb479
strconv.py 5.49 KB