golang_str_pickle: Fix it so that py3 can load what py2 saved and back
Since ebd18f3f (golang_str: bstr/ustr pickle support) bstr and ustr have support for pickling. However in that patch I verified that it is possible to dump and load back an object only on the same python version, which missed that fact that a bstr pickled on py2 cannot be loaded on py3: on py2: (z-dev) kirr@deca:~/src/tools/go/pygolang$ ipython Python 2.7.18 (default, Apr 28 2021, 17:39:59) ... In [1]: from golang import * In [2]: s = bstr('мир') + b'\xff' In [3]: s Out[3]: b(b'мир\xff') In [5]: import pickle In [6]: p = pickle.dumps(1) In [7]: p Out[7]: 'I1\n.' In [8]: import pickletools In [9]: p = pickle.dumps(s, 1) In [10]: p Out[10]: 'ccopy_reg\n_reconstructor\nq\x00(cgolang._golang\n_pybstr\nq\x01h\x01U\x07\xd0\xbc\xd0\xb8\xd1\x80\xffq\x02tq\x03Rq\x04.' In [11]: pickletools.dis(p) 0: c GLOBAL 'copy_reg _reconstructor' 25: q BINPUT 0 27: ( MARK 28: c GLOBAL 'golang._golang _pybstr' 52: q BINPUT 1 54: h BINGET 1 56: U SHORT_BINSTRING '\xd0\xbc\xd0\xb8\xd1\x80\xff' 65: q BINPUT 2 67: t TUPLE (MARK at 27) 68: q BINPUT 3 70: R REDUCE 71: q BINPUT 4 73: . STOP highest protocol among opcodes = 1 on py3: (py39.venv) kirr@deca:~/src/tools/go/pygolang-master$ ipython Python 3.9.19+ (heads/3.9:40d77b93672, Apr 12 2024, 06:40:05) ... In [1]: from golang import * In [2]: import pickle In [3]: p = b'ccopy_reg\n_reconstructor\nq\x00(cgolang._golang\n_pybstr\nq\x01h\x01U\x07\xd0\xbc\xd0\xb8\xd1\x80\xffq\x02tq\x03Rq\x04.' In [4]: s = pickle.loads(p) --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) Cell In[4], line 1 ----> 1 s = pickle.loads(p) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128) which happens in the above example because pickling bstr relies on SHORT_BINSTRING opcode which is not really handled well on py3. -> Rework how bstr and ustr are pickled by fully taking control on what we emit at which protocol level and how and asserting in tests that pickling produces exactly the data, that is expected to be on the output. This way we know that pickling bstr/ustr works the same way on both py2 and py3 and, by also asserting that that data can be unpickled and into the same string object, that both py2 and py3 can load what any of py2 or py3 saved. For the reference the dump for above b(b'мир\xff') now becomes In [5]: p Out[5]: 'cgolang\nbstr\nq\x00(X\t\x00\x00\x00\xd0\xbc\xd0\xb8\xd1\x80\xed\xb3\xbfq\x01tq\x02Rq\x03.' In [7]: pickletools.dis(p) 0: c GLOBAL 'golang bstr' 13: q BINPUT 0 15: ( MARK 16: X BINUNICODE u'\u043c\u0438\u0440\udcff' 30: q BINPUT 1 32: t TUPLE (MARK at 15) 33: q BINPUT 2 35: R REDUCE 36: q BINPUT 3 38: . STOP highest protocol among opcodes = 1 See comments in the code, and added golden vectors in the test for details.
Showing
Please register or sign in to comment