Commit d223aede authored by Levin Zimmermann's avatar Levin Zimmermann

pandas: Fix unpickle np arrays with py2+pd>0.19.x

Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of
'DataFrame.to_records()', so that the resulting Record objects dtype names are
unicodes if the data frames column names were unicode. Before this bug fix
the dtype names were str, no matter whether the column names were str or unicode.

Unfortunately np unpickle breaks if dtype names are unicode [2]. Since
many of our data frame columns are unicode, loading arrays often
fails. In python3 this isn't a problem anymore, so until then we fix
this by introducing a simple monkey patch to pandas, which basically
reverts the mentioned bug fix.

[1] https://github.com/pandas-dev/pandas/issues/11879
[2] Small example to reproduce this error:

''
import os

import numpy as np
import pandas as pd

r = pd.DataFrame({u'A':[1,2,3]}).to_records()
a = np.ndarray(shape=r.shape, dtype=r.dtype.fields)
p = "t"

try:
  os.remove(p)
except:
  pass

with open(p, 'wb') as f:
  np.save(f, a)
with open(p, 'rb') as f:
  np.load(f)
''

/reviewed-on nexedi/erp5!1738
/reviewed-by @jerome @klaus
parent 611419d0
Pipeline #26857 failed with stage
in 0 seconds
......@@ -32,7 +32,7 @@
"""
from __future__ import absolute_import
from App.config import getConfiguration
from .patches import python, globalrequest
from .patches import python, globalrequest, Pandas
import six
if six.PY2:
from .patches import pylint
......
##############################################################################
#
# Copyright (c) 2023 Nexedi SARL and Contributors. All Rights Reserved.
#
# WARNING: This program as such is intended to be used by professional
# programmers who take the whole responsability of assessing all potential
# consequences resulting from its eventual inadequacies and bugs
# End users who are looking for a ready-to-use solution with commercial
# garantees and support are strongly adviced to contract a Free Software
# Service Company
#
# This program is Free Software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
##############################################################################
import numpy as np
try:
import pandas as pd
except ImportError:
pass
else:
# This monkey-patch reverts https://github.com/pandas-dev/pandas/commit/25dcff59
#
# We're often using unicode strings in DataFrame column names,
# which makes it impossible to unpickle np arrays. With python3
# this isn't a problem anymore, so we should remove this once ERP5
# is fully migrated to Python3 only support.
pd_DataFrame_to_records = pd.DataFrame.to_records
def DataFrame_to_records(*args, **kwargs):
record = pd_DataFrame_to_records(*args, **kwargs)
record.dtype = np.dtype([(str(k), v) for k, v in record.dtype.descr])
return record
pd.DataFrame.to_records = DataFrame_to_records
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment