golang/_golang_str.pyx · 598eb4799f8839353bf20af44f9621268df02a0b · Carlos Ramos Carreño / pygolang

golang_str,strconv: Fix decoding of rune-error · 598eb479

Kirill Smelkov authored Oct 03, 2022

Error rune (u+fffd) is returned by _utf8_decode_rune to indicate an
error in decoding. But the error rune itself is valid unicode codepoint:

   >>> x = u"�"
   >>> x
   u'\ufffd'
   >>> x.encode('utf-8')
   '\xef\xbf\xbd'

This way only (r=_rune_error, size=1) should be treated by the caller as
utf8 decoding error.

But e.g. strconv.quote was not careful to also inspect the size, and this way
was quoting � into just "\xef" instead of "\xef\xbf\xbd".
_utf8_decode_surrogateescape was also subject to similar error.

-> Fix it.

Without the fix e.g. added test for strconv.quote fails as

    >           assert quote(tin) == tquoted
    E           assert '"\xef"' == '"�"'
    E             - "\xef"
    E             + "�"

/reviewed-by @jerome
/reviewed-at nexedi/pygolang!18

598eb479

_golang_str.pyx 11.1 KB

Replace _golang_str.pyx