-
Kirill Smelkov authored
Error rune (u+fffd) is returned by _utf8_decode_rune to indicate an error in decoding. But the error rune itself is valid unicode codepoint: >>> x = u"�" >>> x u'\ufffd' >>> x.encode('utf-8') '\xef\xbf\xbd' This way only (r=_rune_error, size=1) should be treated by the caller as utf8 decoding error. But e.g. strconv.quote was not careful to also inspect the size, and this way was quoting � into just "\xef" instead of "\xef\xbf\xbd". _utf8_decode_surrogateescape was also subject to similar error. -> Fix it. Without the fix e.g. added test for strconv.quote fails as > assert quote(tin) == tquoted E assert '"\xef"' == '"�"' E - "\xef" E + "�" /reviewed-by @jerome /reviewed-at nexedi/pygolang!18
598eb479