-
Kirill Smelkov authored
In Python bytes is immutable and read-only array of bytes. It is also hashable and so is different from go []byte in that it can be used as a dict key. Thus the closes approximation for Python bytes in Go is some type derived from Go's string - it will be different from string and at the same time will inherit from string it immutability property and being able to be used as map key. So - add ogórek.Bytes type to represent Python bytes - add support to decode BINBYTES* pickle opcodes (these are protocol 3 opcodes) - add support to encode ogórek.Bytes via those BINBYTES* opcodes - for protocols <= 2, where there is no opcodes to directly represent bytes, adopt the same approach as Python - by pickling bytes as _codecs.encode(byt.decode('latin1'), 'latin1') this way unpickling it on Python3 will give bytes, while unpickling it on Python2 will give str: In [1]: sys.version Out[1]: '3.6.6 (default, Jun 27 2018, 14:44:17) \n[GCC 8.1.0]' In [2]: byt = b'\x01\x02\x03' In [3]: _codecs.encode(byt.decode('latin1'), 'latin1') Out[3]: b'\x01\x02\x03' --- In [1]: sys.version Out[1]: '2.7.15+ (default, Aug 31 2018, 11:56:52) \n[GCC 8.2.0]' In [2]: byt = b'\x01\x02\x03' In [3]: _codecs.encode(byt.decode('latin1'), 'latin1') Out[3]: '\x01\x02\x03' - correspondingly teach decoder to recognize particular calls to _codecs.encode as being representation for bytes and decode it appropriately. - since we now have to emit byt.decode('latin1') as UNICODE - add, so far internal, `type unicode(string)` that instructs ogórek encoder to always emit the string with UNICODE opcodes (regular string is encoded to unicode pickle object only for protocol >= 3). - For []byte encoding preserve the current status - even though dispatching in Encoder.encode changes, the end result is the same - []byte was and stays currently encoded as just regular string. This was added in 555efd8f "first draft of dumb pickle encoder", and even though that might be not a good choice, changing it is a topic for another patch.
2fe0e876