Commit 1730ba8e authored by Levin Zimmermann's avatar Levin Zimmermann

pandas: Fix unpickle np arrays with py2+pd>0.19.x

Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of
'DataFrame.to_records()', so that the resulting Record objects dtype names are
unicodes if the data frames column names were unicode. Before this bug fix
the dtype names were str, no matter whether the column names were str or unicode.

Unfortunately np unpickle breaks if dtype names are unicode [2]. Since
many of our data frame columns are unicode, loading arrays often
fails. In python3 this isn't a problem anymore, so until then we fix
this by introducing a simple monkey patch to pandas, which basically
reverts the mentioned bug fix.

[1] https://github.com/pandas-dev/pandas/issues/11879
[2] Small example to reproduce this error:

''
import os

import numpy as np
import pandas as pd

r = pd.DataFrame({u'A':[1,2,3]}).to_records()
a = np.ndarray(shape=r.shape, dtype=r.dtype.fields)
p = "t"

try:
  os.remove(p)
except:
  pass

with open(p, 'wb') as f:
  np.save(f, a)
with open(p, 'rb') as f:
  np.load(f)
''

/reviewed-on !1738
/reviewed-by @jerome @klaus
parent 6180f1af
Pipeline #26887 failed with stage
in 0 seconds