golang_str: Fix ustr to provide buffer interface, like bstr already does
Kazuhiko reports that using base64.b64encode with ustr fails on py3: >>> base64.b64encode(b('a')) b'YQ==' >>> base64.b64encode(u('a')) Traceback (most recent call last): File "<console>", line 1, in <module> File "/*/lib/python3.8/base64.py", line 58, in b64encode encoded = binascii.b2a_base64(s, newline=False) TypeError: a bytes-like object is required, not 'pyustr' which uncovers a thought bug of mine: initially in 105d03d4 (golang_str: Add test for memoryview(bstr)) I made only bstr to provide buffer interface, while ustr does not provide it with wrong-thinking that it contains unicode characters, not binary data. But to fully respect the promise that ustr can be automatically converted to bytes, it also means that ustr should provide buffer interface so that things like PyArg_Parse("s#") or PyArg_Parse("y") could accept it. While PyArg_Parse("s#") is not yet completely fixed to work with this patch, as it still reports UnicodeEncodeError for ustr corresponding to non-UTF8 data, adding buffer interface to ustr is still a step into the right direction becuase of the way e.g. binascii.b64encode(u) is implemented: base64.b64encode(x) -> binascii.b2a_base64(x) binascii.b2a_base64(u) -> py2: PyArg_ParseTuple('s*', u) -> _PyUnicode_AsDefaultEncodedString(u) py3: PyObject_GetBuffer(u) -> u.tp_as_buffer.bf_getbuffer Here we see that on py3 it tails to retrieve object's data via .tp_as_buffer.bf_getbuffer and if there is no buffer interface provided that will fail. But we can't let base64.b64encode(ustr) to fail if base64.b64encode(bstr) works ok because both bstr and ustr represent the same string entity just into two different forms. -> So teach ustr to provide buffer interface so that e.g. memoryview starts to work on it and observe corresponding bytes data. This fixes binascii.b64encode(ustr) on py3 and also fixes t_hash/py2, and y, y_star and y_hash test_strings_capi_getargs_to_cstr cases on py3. Note: the original unicode on py2 has: .bf_getreadbuf -> []wchar for []UCS ; used by buffer(u) .bf_getcharbuffer -> []byte for encode([]UCS, sys.defaultencoding) ; used by t# and PyObject_AsCharBuffer .bf_getbuffer = 0 ; used by memoryview(u) and on py3: .tp_as_buffer = 0 /reported-by @kazuhiko /reported-at nexedi/pygolang!21 (comment 172595)
Showing
Please register or sign in to comment