Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickle still fails for numpy.void objects

Over a year ago I reported a bug I encountered while pickling some fairly complex data. At the time I didn't know what the issue was and believed it might have had something todo with recursive referencing.

I've encountered the issue several times while working on my project, but only did arbitrary things trying to fix it until the error disappeared. Now I finally took the time to home in on the source of the issue and refine my MWE. This is what I came up with:

import pickle
import numpy as np


# create data
dtypes = [('f0', 'O')]
# for some reason, I need at least an extra of 19 fields for it to crash
# immediately
dtypes += [(f'f{i+1}', 'i4') for i in range(19)]
data = np.empty(1, dtype=dtypes)
# print(data[0])

# dump data
dump = pickle.dumps(data[0], pickle.HIGHEST_PROTOCOL)
# print('dumping works')

# load data
load = pickle.loads(dump)
# print('loading works')

# process crashes here if len(dtypes) > 19
print(load)

# process prints random data, e.g.

# (((...), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 0, 0,
# -1931060898, 32763, 1472326776, 503, 1482667496, 503, 0, 0, 1484270024,
# 503, 1472326776, 503, -1930803631, 32763, 1484270024, 503)

# or

# ((((((...), False, True), False, True), dtype('int32'), None), 0, 0), 0, 0)

# or

# (((...), <cell at 0x0000017018A25888: str object at 0x000001701AF998F0>,
# <cell at 0x0000017019084498: bool object at 0x00007FFB8D0EA970>,
# <cell at 0x00000170190842B8: int object at 0x00007FFB8D16A270>,
# <cell at 0x000001701927E2E8: str object at 0x0000017018989BB0>,
# <cell at 0x0000017018A3E798: bool object at 0x00007FFB8D0EA970>),
# 0, 0, -1931060898, 32763, 451341512)

# and crashes immediately afterwards if 2 <= len(dtypes) <= 19.

# process finishes with exit code 0 if data has no additional fields except f0,
# and prints 

# (((...),),)

Now I'm aware that similar issues have been reported in the past:

pickling/unpickling numpy.void and numpy.record for multiprocessing

Segmentation fault with numpy.void and pickle

Python - pickling fails for numpy.void objects

And in a quite recent and very similar case

Segfault after loading pickled void objects

a fix seems to have been introduced, however my code still causes a crash:

Process finished with exit code -1073741819 (0xC0000005)

Now for Python - pickling fails for numpy.void objects, the accepted answer is a comment by jottos (Dec 29 '09 at 18:42):

so, pickling will only work with top level module functions and classes, and will not pickle class data, so if some numpy class code/data are required to produce a representation of the numpy void type pickling isn't going to work as expected. It may be that the numpy package has implemented an internal repr to print the void type as a tuple, if this is the case then what you pickled certainly is not going to be what you printed.

But this is from over ten years ago, and it seems that bug fixes have been introduced since then. So is that still what is going on here, or is it something else? Especially since my code displays such arbitrary behavior.

Setup Info

Windows: 10 Home, v. 21H1, build 19043.1288

PyCharm: 2021.2 (Professional), build #PY-212.4746.96

Python (via anaconda): 3.7.7 [MSC v.1916 64 bit (AMD64)]

Numpy: 1.19.2

Pickle: 4.0

like image 347
mapf Avatar asked Dec 19 '25 08:12

mapf


1 Answers

The issue you're hitting is a known bug in older NumPy versions (specifically ≤1.19.3) when pickling structured arrays with ≥20 fields. The crash (0xC0000005) happens because NumPy's internal pickling logic for numpy.void objects uses deep recursion that overflows the stack on Windows (which has smaller default stack sizes). This was fixed in NumPy 1.19.4 (GH#15163).

It could be useful that upgrade your NumPy to ≥1.19.4.

like image 126
luxiu lu Avatar answered Dec 21 '25 22:12

luxiu lu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!