• Kirill Smelkov's avatar
    golang_str: Fix ustr to provide buffer interface, like bstr already does · 8a240b5b
    Kirill Smelkov authored
    Kazuhiko reports that using base64.b64encode with ustr fails on py3:
    
        >>> base64.b64encode(b('a'))
        b'YQ=='
        >>> base64.b64encode(u('a'))
        Traceback (most recent call last):
          File "<console>", line 1, in <module>
          File "/*/lib/python3.8/base64.py", line 58, in b64encode
            encoded = binascii.b2a_base64(s, newline=False)
        TypeError: a bytes-like object is required, not 'pyustr'
    
    which uncovers a thought bug of mine: initially in 105d03d4 (golang_str: Add
    test for memoryview(bstr)) I made only bstr to provide buffer interface, while
    ustr does not provide it with wrong-thinking that it contains unicode
    characters, not binary data. But to fully respect the promise that ustr can be
    automatically converted to bytes, it also means that ustr should provide buffer
    interface so that things like PyArg_Parse("s#") or PyArg_Parse("y") could
    accept it.
    
    While PyArg_Parse("s#") is not yet completely fixed to work with this patch, as
    it still reports UnicodeEncodeError for ustr corresponding to non-UTF8 data,
    adding buffer interface to ustr is still a step into the right direction
    becuase of the way e.g. binascii.b64encode(u) is implemented:
    
        base64.b64encode(x)     ->  binascii.b2a_base64(x)
    
        binascii.b2a_base64(u)  ->  py2: PyArg_ParseTuple('s*', u)  ->  _PyUnicode_AsDefaultEncodedString(u)
                                    py3: PyObject_GetBuffer(u)      ->  u.tp_as_buffer.bf_getbuffer
    
    Here we see that on py3 it tails to retrieve object's data via
    .tp_as_buffer.bf_getbuffer and if there is no buffer interface provided that
    will fail. But we can't let base64.b64encode(ustr) to fail if
    base64.b64encode(bstr) works ok because both bstr and ustr represent the
    same string entity just into two different forms.
    
    -> So teach ustr to provide buffer interface so that e.g. memoryview starts to
       work on it and observe corresponding bytes data. This fixes
       binascii.b64encode(ustr) on py3 and also fixes t_hash/py2, and y, y_star and
       y_hash test_strings_capi_getargs_to_cstr cases on py3.
    
    Note: the original unicode on py2 has:
    
        .bf_getreadbuf      -> []wchar  for     []UCS                                   ; used by buffer(u)
        .bf_getcharbuffer   -> []byte   for     encode([]UCS, sys.defaultencoding)      ; used by t#  and PyObject_AsCharBuffer
        .bf_getbuffer = 0                                                               ; used by memoryview(u)
    
    and on py3:
    
        .tp_as_buffer = 0
    
    /reported-by @kazuhiko
    /reported-at nexedi/pygolang!21 (comment 172595)
    8a240b5b
golang_str_test.py 115 KB