I am using python structured arrays
to arrange relevant data then passing it into C++ land. I noticed on occasion a memory copy error which leads to calculation issues. I first explored using hashing functions to make sure it was not corrupted but now use np.where()
to see where the values are differing.
Issue: When I add several numpy arrays into a structured array, the underlying arrays sometimes develop errors. On my MacBook with Python 3.12.8 & Numpy 2.0.2, it completes the full 1000 loops (pasted correct output below). On my Ubuntu Server 24.02 w/ Python 3.12.8 and Numpy 2.0.2, after running for a few iterations, the server eventually develops errors in the underlying arrays. Sometimes the error consists of 2 elements, other times it can be 16 elements. Sometimes the error occurs as quickly as the 2nd loop, other times it occurs in 45 loops.
Trouble shooting attempt: I made a separate "copy only" calling np.copy on arrays. Code runs without issue on both systems. You can enable this by uncommenting TestCopy(...)
function call and commenting out the Coalesce function.
Possible thoughts:
I stripped away an enormous code base and now have reduced it to a small repeatable file that I pasted below.
How example below works:
CoalesceData
functionoriginal
arrays through the storage mechanismstructured array
structured array
to see if they match the original
arraysEnvironment: Python 3.12.8 & Numpy 2.0.2 (have tried other versions of numpy as well)
import numpy as np
import os
import sys
class HashStorage():
def __init__(self):
self.storage = {}
def check_diff(self, name, arr):
if name in self.storage:
expected_arr = self.storage[name]
print(
f'Checking: {name} for differences, new_id: {id(arr)} old_id: {id(expected_arr)}')
diffs = np.where(arr != expected_arr)
diff_length = len(diffs[0])
assert diff_length == 0, f'indices: {diffs}, {expected_arr[diffs]} : {arr[diffs]}'
else:
self.storage[name] = arr
def check_numpy_hash(self, name, arr):
# called_name = f'{sys._getframe(1).f_code.co_name}:{sys._getframe(1).f_lineno}_{name}'
called_name = f'{sys._getframe(1).f_code.co_name}_{name}'
# computed_hash = checksum_numpy_array(arr)
self.check_diff(called_name, arr)
hash_global_storage = HashStorage()
def CoalesceData(o_np, h_np, l_np, c_np, timestamp_np, s_np, a_np, m_np):
global hash_global_storage
hash_global_storage.check_numpy_hash('o_np', o_np)
hash_global_storage.check_numpy_hash('h_np', h_np)
hash_global_storage.check_numpy_hash('l_np', l_np)
hash_global_storage.check_numpy_hash('c_np', c_np)
hash_global_storage.check_numpy_hash('timestamp_np', timestamp_np)
hash_global_storage.check_numpy_hash('s_np', s_np)
hash_global_storage.check_numpy_hash('a_np', a_np)
hash_global_storage.check_numpy_hash('m_np', m_np)
# create structured array
dt = np.dtype([
('open', o_np.dtype), # 4 bytes
('high', h_np.dtype), # 4 bytes
('low', l_np.dtype), # 4 bytes
('close', c_np.dtype), # 4 bytes
('timestamp', timestamp_np.dtype), # 8 bytes
('a', a_np.dtype), # 4 bytes
('s', s_np.dtype), # 1 byte
('m', m_np.dtype) # 2 bytes
], align=True)
structured_array = np.zeros(len(o_np), dtype=dt)
structured_array['open'] = o_np
structured_array['high'] = h_np
structured_array['low'] = l_np
structured_array['close'] = c_np
structured_array['timestamp'] = timestamp_np
structured_array['s'] = s_np
structured_array['a'] = a_np
structured_array['m'] = m_np
print(structured_array.flags)
print(f'Structured Array address: {id(structured_array)}')
# Now, check arrays in the structured array to make sure everything is deterministic
# autopep8: off
hash_global_storage.check_numpy_hash('o_np', structured_array['open'])
hash_global_storage.check_numpy_hash('h_np', structured_array['high'])
hash_global_storage.check_numpy_hash('l_np', structured_array['low'])
hash_global_storage.check_numpy_hash('c_np', structured_array['close'])
hash_global_storage.check_numpy_hash('timestamp_np', structured_array['timestamp'])
hash_global_storage.check_numpy_hash('s_np', structured_array['s'])
hash_global_storage.check_numpy_hash('a_np', structured_array['a'])
hash_global_storage.check_numpy_hash('m_np', structured_array['m'])
# autopep8: on
hash_global_storage.check_numpy_hash(
'structured_array', structured_array)
return structured_array
def TestCopy(o_np, h_np, l_np, c_np, timestamp_np, s_np, a_np, m_np):
global hash_global_storage
hash_global_storage.check_numpy_hash('o_np', o_np)
hash_global_storage.check_numpy_hash('h_np', h_np)
hash_global_storage.check_numpy_hash('l_np', l_np)
hash_global_storage.check_numpy_hash('c_np', c_np)
hash_global_storage.check_numpy_hash('timestamp_np', timestamp_np)
hash_global_storage.check_numpy_hash('s_np', s_np)
hash_global_storage.check_numpy_hash('a_np', a_np)
hash_global_storage.check_numpy_hash('m_np', m_np)
o_copy = o_np.copy()
h_copy = h_np.copy()
l_copy = l_np.copy()
c_copy = c_np.copy()
timestamp_copy = timestamp_np.copy()
s_copy = s_np.copy()
a_copy = a_np.copy()
m_copy = m_np.copy()
hash_global_storage.check_numpy_hash('o_np', o_copy)
hash_global_storage.check_numpy_hash('h_np', h_copy)
hash_global_storage.check_numpy_hash('l_np', l_copy)
hash_global_storage.check_numpy_hash('c_np', c_copy)
hash_global_storage.check_numpy_hash('timestamp_np', timestamp_copy)
hash_global_storage.check_numpy_hash('s_np', s_copy)
hash_global_storage.check_numpy_hash('a_np', a_copy)
hash_global_storage.check_numpy_hash('m_np', m_copy)
if __name__ == '__main__':
# create a test
df_length = 124_882_868
o_np = np.random.rand(df_length).astype(np.float32)
h_np = np.random.rand(df_length).astype(np.float32)
l_np = np.random.rand(df_length).astype(np.float32)
c_np = np.random.rand(df_length).astype(np.float32)
timestamp_np = np.full(df_length, 12313131, dtype=np.int64)
a_np = np.random.rand(df_length).astype(np.float32)
s_np = np.random.randint(6, size=df_length, dtype=np.int8)
m_np = np.random.randint(1000, size=df_length, dtype=np.int16)
for i in range(0, 1000):
print(f"--------Working on iteration: {i}")
# autopep8: off
CoalesceData(o_np, h_np, l_np, c_np, timestamp_np, a_np, s_np, m_np)
# TestCopy(o_np, h_np, l_np, c_np, timestamp_np, a_np, s_np, m_np) # Works
# autopep8: on
When I run the following code on my MacBook Air, using a Conda environment (Python 3.12.8 and Numpy 2.0.2), the program successfully executes without an memory copy error.
--------Working on iteration: 999
Checking: CoalesceData_o_np for differences, new_id: 4354134000 old_id: 4354134000
Checking: CoalesceData_h_np for differences, new_id: 4354134096 old_id: 4354134096
Checking: CoalesceData_l_np for differences, new_id: 4354134192 old_id: 4354134192
Checking: CoalesceData_c_np for differences, new_id: 4354134288 old_id: 4354134288
Checking: CoalesceData_timestamp_np for differences, new_id: 4354133904 old_id: 4354133904
Checking: CoalesceData_s_np for differences, new_id: 4354134480 old_id: 4354134480
Checking: CoalesceData_a_np for differences, new_id: 4354134672 old_id: 4354134672
Checking: CoalesceData_m_np for differences, new_id: 4354134768 old_id: 4354134768
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
Structured Array address: 4354134576
Checking: CoalesceData_o_np for differences, new_id: 4354134960 old_id: 4354134000
Checking: CoalesceData_h_np for differences, new_id: 4354134960 old_id: 4354134096
Checking: CoalesceData_l_np for differences, new_id: 4354134960 old_id: 4354134192
Checking: CoalesceData_c_np for differences, new_id: 4354134960 old_id: 4354134288
Checking: CoalesceData_timestamp_np for differences, new_id: 4354134960 old_id: 4354133904
Checking: CoalesceData_s_np for differences, new_id: 4354134960 old_id: 4354134480
Checking: CoalesceData_a_np for differences, new_id: 4354134960 old_id: 4354134672
Checking: CoalesceData_m_np for differences, new_id: 4354134960 old_id: 4354134768
Checking: CoalesceData_structured_array for differences, new_id: 4354134576 old_id: 4354134384
When running on my Ubuntu server, also with Python 3.12.8 and Numpy 2.0.2, I get:
--------Working on iteration: 3
Checking: CoalesceData_o_np for differences, new_id: 133772149707888 old_id: 133772149707888
Checking: CoalesceData_h_np for differences, new_id: 133772149707984 old_id: 133772149707984
Checking: CoalesceData_l_np for differences, new_id: 133772149708080 old_id: 133772149708080
Checking: CoalesceData_c_np for differences, new_id: 133772149708176 old_id: 133772149708176
Checking: CoalesceData_timestamp_np for differences, new_id: 133772149707792 old_id: 133772149707792
Checking: CoalesceData_s_np for differences, new_id: 133772149708368 old_id: 133772149708368
Checking: CoalesceData_a_np for differences, new_id: 133772149708560 old_id: 133772149708560
Checking: CoalesceData_m_np for differences, new_id: 133772149708656 old_id: 133772149708656
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
Structured Array address: 133772149708464
Checking: CoalesceData_o_np for differences, new_id: 133772149708848 old_id: 133772149707888
Checking: CoalesceData_h_np for differences, new_id: 133772149708848 old_id: 133772149707984
Checking: CoalesceData_l_np for differences, new_id: 133772149708848 old_id: 133772149708080
Traceback (most recent call last):
File "/home/memo/bt/np_error_debug.py", line 105, in <module>
CoalesceData(o_np, h_np, l_np, c_np, timestamp_np,
File "/home/memo/bt/np_error_debug.py", line 74, in CoalesceData
hash_global_storage.check_numpy_hash('l_np', structured_array['low'])
File "/home/memo/bt/np_error_debug.py", line 26, in check_numpy_hash
self.check_diff(called_name, arr)
File "/home/memo/bt/np_error_debug.py", line 18, in check_diff
assert diff_length == 0, f'indices: {diffs}, {expected_arr[diffs]} : {arr[diffs]}'
^^^^^^^^^^^^^^^^
AssertionError: indices: (array([97087371, 97087372]),), [0.61208063 0.56773347] : [0. 0.]
A slightly different number of elements error:
File "/home/memo/bt/np_error_debug.py", line 41, in check_diff
assert diff_length == 0, f'indices: {diffs}, {expected_arr[diffs]} : {arr[diffs]}'
^^^^^^^^^^^^^^^^
AssertionError: indices: (array([120253452, 120253453, 120253454, 120253455, 120253456, 120253457,
120253458, 120253459, 120253460, 120253461, 120253462, 120253463,
120253464, 120253465, 120253466, 120253467]),), [0.38179564 0.7686447 0.06995761 0.76895595 0.7134335 0.12035605
0.9882022 0.7208525 0.5113986 0.11400567 0.08236554 0.09342069
0.85959834 0.6065078 0.5138216 0.66513485] : [0.87417835 0.9489298 0.16752674 0.16250128 0.13623057 0.8921764
0.14262542 0.8389298 0.37004778 0.5679792 0.79316586 0.1225264
0.86325306 0.6123406 0.5594882 0.2388553 ]
Update: I started from the ground up to debug this (per a comment above) as I wasted 2-3 days on it. I ran memtester (in Linux vs the memtest86+ accessed through GRUB). It showed the following:
I decided to update my motherboard's bios (Jan 2023 to Feb 2025) and made sure stock ram settings. I re-ran with MemTest86+ overnight for 6 passes as well as Memtester in Linux and seeing no errors.
Oddly enough, when I re-run MemTester (as seen above), I can now only request a ~12 gb block and get the mlock
versus I was able to run 92gb in the above screenshot.
I can now run the entire Python calculation above posted without errors developing, which completely matches now what I see on my MacBook.
My server had never showed signs of instability prior. I'll probably have it run a bunch more passes but I do think this was the issue. This is just something for all of us to keep in the back of our minds should you encounter an unusual bug. I can't remember to what level of stability testing I did 2 years prior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With