pandas: Fix unpickle np arrays with py2+pd>0.19.x

Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] https://github.com/pandas-dev/pandas/issues/11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) '' /reviewed-on !1738 /reviewed-by @jerome @klaus

pandas: Fix unpickle np arrays with py2+pd>0.19.x
Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] https://github.com/pandas-dev/pandas/issues/11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) '' /reviewed-on !1738 /reviewed-by @jerome @klaus
d223aede · Levin Zimmermann · 611419d0 · d223aede · d223aede
Commit d223aede authored Feb 23, 2023 by Levin Zimmermann
Show whitespace changes
Inline Side-by-side

Showing with 47 additions and 1 deletion

product/ERP5Type/__init__.py product/ERP5Type/__init__.py +1 -1

product/ERP5Type/patches/Pandas.py product/ERP5Type/patches/Pandas.py +46 -0

No files found.
--- a/product/ERP5Type/__init__.py
+++ b/product/ERP5Type/__init__.py
@@ -32,7 +32,7 @@
 """
 from __future__ import absolute_import
 from App.config import getConfiguration
-from .patches import python, globalrequest
+from .patches import python, globalrequest, Pandas
 import six
 if six.PY2:
  from .patches import pylint

--- a/product/ERP5Type/patches/Pandas.py
+++ b/product/ERP5Type/patches/Pandas.py
+##############################################################################
+#
+# Copyright (c) 2023 Nexedi SARL and Contributors. All Rights Reserved.
+#
+# WARNING: This program as such is intended to be used by professional
+# programmers who take the whole responsability of assessing all potential
+# consequences resulting from its eventual inadequacies and bugs
+# End users who are looking for a ready-to-use solution with commercial
+# garantees and support are strongly adviced to contract a Free Software
+# Service Company
+#
+# This program is Free Software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+#
+##############################################################################
+
+import numpy as np
+
+try:
+  import pandas as pd
+except ImportError:
+  pass
+else:
+  # This monkey-patch reverts https://github.com/pandas-dev/pandas/commit/25dcff59
+  #
+  # We're often using unicode strings in DataFrame column names,
+  # which makes it impossible to unpickle np arrays. With python3
+  # this isn't a problem anymore, so we should remove this once ERP5
+  # is fully migrated to Python3 only support.
+  pd_DataFrame_to_records = pd.DataFrame.to_records
+  def DataFrame_to_records(*args, **kwargs):
+    record = pd_DataFrame_to_records(*args, **kwargs)
+    record.dtype = np.dtype([(str(k), v) for k, v in record.dtype.descr])
+    return record
+  pd.DataFrame.to_records = DataFrame_to_records