• Levin Zimmermann's avatar
    pandas: Fix unpickle np arrays with py2+pd>0.19.x · d223aede
    Levin Zimmermann authored
    Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of
    'DataFrame.to_records()', so that the resulting Record objects dtype names are
    unicodes if the data frames column names were unicode. Before this bug fix
    the dtype names were str, no matter whether the column names were str or unicode.
    
    Unfortunately np unpickle breaks if dtype names are unicode [2]. Since
    many of our data frame columns are unicode, loading arrays often
    fails. In python3 this isn't a problem anymore, so until then we fix
    this by introducing a simple monkey patch to pandas, which basically
    reverts the mentioned bug fix.
    
    [1] https://github.com/pandas-dev/pandas/issues/11879
    [2] Small example to reproduce this error:
    
    ''
    import os
    
    import numpy as np
    import pandas as pd
    
    r = pd.DataFrame({u'A':[1,2,3]}).to_records()
    a = np.ndarray(shape=r.shape, dtype=r.dtype.fields)
    p = "t"
    
    try:
      os.remove(p)
    except:
      pass
    
    with open(p, 'wb') as f:
      np.save(f, a)
    with open(p, 'rb') as f:
      np.load(f)
    ''
    
    /reviewed-on !1738
    /reviewed-by @jerome @klaus
    d223aede
__init__.py 9.35 KB