• Kirill Smelkov's avatar
    golang_str_pickle: Fix it so that py3 can load what py2 saved and back · 1ec5ed82
    Kirill Smelkov authored
    Since ebd18f3f (golang_str: bstr/ustr pickle support) bstr and ustr have
    support for pickling. However in that patch I verified that it is
    possible to dump and load back an object only on the same python
    version, which missed that fact that a bstr pickled on py2 cannot be
    loaded on py3:
    
    on py2:
    
        (z-dev) kirr@deca:~/src/tools/go/pygolang$ ipython
        Python 2.7.18 (default, Apr 28 2021, 17:39:59)
        ...
    
        In [1]: from golang import *
    
        In [2]: s = bstr('мир') + b'\xff'
    
        In [3]: s
        Out[3]: b(b'мир\xff')
    
        In [5]: import pickle
    
        In [6]: p = pickle.dumps(1)
    
        In [7]: p
        Out[7]: 'I1\n.'
    
        In [8]: import pickletools
    
        In [9]: p = pickle.dumps(s, 1)
    
        In [10]: p
        Out[10]: 'ccopy_reg\n_reconstructor\nq\x00(cgolang._golang\n_pybstr\nq\x01h\x01U\x07\xd0\xbc\xd0\xb8\xd1\x80\xffq\x02tq\x03Rq\x04.'
    
        In [11]: pickletools.dis(p)
            0: c    GLOBAL     'copy_reg _reconstructor'
           25: q    BINPUT     0
           27: (    MARK
           28: c        GLOBAL     'golang._golang _pybstr'
           52: q        BINPUT     1
           54: h        BINGET     1
           56: U        SHORT_BINSTRING '\xd0\xbc\xd0\xb8\xd1\x80\xff'
           65: q        BINPUT     2
           67: t        TUPLE      (MARK at 27)
           68: q    BINPUT     3
           70: R    REDUCE
           71: q    BINPUT     4
           73: .    STOP
        highest protocol among opcodes = 1
    
    on py3:
    
        (py39.venv) kirr@deca:~/src/tools/go/pygolang-master$ ipython
        Python 3.9.19+ (heads/3.9:40d77b93672, Apr 12 2024, 06:40:05)
        ...
    
        In [1]: from golang import *
    
        In [2]: import pickle
    
        In [3]: p = b'ccopy_reg\n_reconstructor\nq\x00(cgolang._golang\n_pybstr\nq\x01h\x01U\x07\xd0\xbc\xd0\xb8\xd1\x80\xffq\x02tq\x03Rq\x04.'
    
        In [4]: s = pickle.loads(p)
        ---------------------------------------------------------------------------
        UnicodeDecodeError                        Traceback (most recent call last)
        Cell In[4], line 1
        ----> 1 s = pickle.loads(p)
    
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)
    
    which happens in the above example because pickling bstr relies on
    SHORT_BINSTRING opcode which is not really handled well on py3.
    
    -> Rework how bstr and ustr are pickled by fully taking control on what
    we emit at which protocol level and how and asserting in tests that
    pickling produces exactly the data, that is expected to be on the
    output.
    
    This way we know that pickling bstr/ustr works the same way on both py2
    and py3 and, by also asserting that that data can be unpickled and into
    the same string object, that both py2 and py3 can load what any of py2
    or py3 saved.
    
    For the reference the dump for above b(b'мир\xff') now becomes
    
        In [5]: p
        Out[5]: 'cgolang\nbstr\nq\x00(X\t\x00\x00\x00\xd0\xbc\xd0\xb8\xd1\x80\xed\xb3\xbfq\x01tq\x02Rq\x03.'
    
        In [7]: pickletools.dis(p)
            0: c    GLOBAL     'golang bstr'
           13: q    BINPUT     0
           15: (    MARK
           16: X        BINUNICODE u'\u043c\u0438\u0440\udcff'
           30: q        BINPUT     1
           32: t        TUPLE      (MARK at 15)
           33: q    BINPUT     2
           35: R    REDUCE
           36: q    BINPUT     3
           38: .    STOP
        highest protocol among opcodes = 1
    
    See comments in the code, and added golden vectors in the test for details.
    1ec5ed82
_golang_str_pickle.pyx 3.28 KB