• Kirill Smelkov's avatar
    Add PyDict mode · 3bf6c92d
    Kirill Smelkov authored
    Similarly to StrictUnicode mode (see b28613c2) add new opt-in mode that
    requests to decode Python dicts as ogórek.Dict instead of builtin map.
    As explained in recent patch "Add custom Dict that mirrors Python dict
    behaviour" this is needed to fix decoding issues that can be there due
    to different behaviour of Python dict and builtin Go map:
    
        ---- 8< ----
        Ogórek currently represents unpickled dict via map[any]any, which is
        logical, but also exhibits issues because builtin Go map behaviour is
        different from Python's dict behaviour. For example:
    
        - Python's dict allows tuples to be used in keys, while Go map
          does not (https://github.com/kisielk/og-rek/issues/50),
    
        - Python's dict allows both long and int to be used interchangeable as
          keys, while Go map does not handle *big.Int as key with the same
          semantic (https://github.com/kisielk/og-rek/issues/55)
    
        - Python's dict allows to use numbers interchangeable in keys - all int
          and float, but on Go side int(1) and float64(1.0) are considered by
          builtin map as different keys.
    
        - In Python world bytestring (str from py2) is considered to be related
          to both unicode (str on py3) and bytes, but builtin map considers all
          string, Bytes and ByteString as different keys.
    
        - etc...
    
        All in all there are many differences in behaviour in builtin Python
        dict and Go map that result in generally different semantics when
        decoding pickled data. Those differences can be fixed only if we add
        custom dict implementation that mirrors what Python does.
    
        -> Do that: add custom Dict that implements key -> value mapping with
           mirroring Python behaviour.
    
        For now we are only adding the Dict class itself and its tests.
        Later we will use this new Dict to handle decoding dictionaries from the pickles.
        ---- 8< ----
    
    In this patch we add new Decoder option to activate PyDict mode
    decoding, teach encoder to also support encoding of Dict and adjust
    tests.
    
    The behaviour of new system is explained by the following doc.go
    excerpt:
    
        For dicts there are two modes. In the first, default, mode Python dicts are
        decoded into standard Go map. This mode tries to use builtin Go type, but
        cannot mirror py behaviour fully because e.g. int(1), big.Int(1) and
        float64(1.0) are all treated as different keys by Go, while Python treats
        them as being equal. It also does not support decoding dicts with tuple
        used in keys:
    
             dict      map[any]any                       PyDict=n mode, default
                     ←  ogórek.Dict
    
        With PyDict=y mode, however, Python dicts are decoded as ogórek.Dict which
        mirrors behaviour of Python dict with respect to keys equality, and with
        respect to which types are allowed to be used as keys.
    
             dict      ogórek.Dict                       PyDict=y mode
                     ←  map[any]any
    3bf6c92d
encode.go 14.8 KB