Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy Structured Array - Memory copy error

I am using python structured arrays to arrange relevant data then passing it into C++ land. I noticed on occasion a memory copy error which leads to calculation issues. I first explored using hashing functions to make sure it was not corrupted but now use np.where() to see where the values are differing.

Issue: When I add several numpy arrays into a structured array, the underlying arrays sometimes develop errors. On my MacBook with Python 3.12.8 & Numpy 2.0.2, it completes the full 1000 loops (pasted correct output below). On my Ubuntu Server 24.02 w/ Python 3.12.8 and Numpy 2.0.2, after running for a few iterations, the server eventually develops errors in the underlying arrays. Sometimes the error consists of 2 elements, other times it can be 16 elements. Sometimes the error occurs as quickly as the 2nd loop, other times it occurs in 45 loops.

Trouble shooting attempt: I made a separate "copy only" calling np.copy on arrays. Code runs without issue on both systems. You can enable this by uncommenting TestCopy(...) function call and commenting out the Coalesce function.

Possible thoughts:

  • Could this be a hardware or memory instability error?
  • Or is there some OS specific numpy implementation details?
  • The traditional .copy() appears to work, so maybe it's a numpy structured array issue implementation?

I stripped away an enormous code base and now have reduced it to a small repeatable file that I pasted below.

How example below works:

  1. Creating several random in-memory numpy arrays (large, +100M length)
  2. Passing this data to CoalesceData function
  3. Run original arrays through the storage mechanism
  4. Combine arrays into a structured array
  5. Check the arrays inside of the structured array to see if they match the original arrays

Environment: Python 3.12.8 & Numpy 2.0.2 (have tried other versions of numpy as well)

import numpy as np
import os
import sys


class HashStorage():
    def __init__(self):
        self.storage = {}

    def check_diff(self, name, arr):
        if name in self.storage:

            expected_arr = self.storage[name]
            print(
                f'Checking: {name} for differences, new_id: {id(arr)} old_id: {id(expected_arr)}')
            diffs = np.where(arr != expected_arr)
            diff_length = len(diffs[0])
            assert diff_length == 0, f'indices: {diffs}, {expected_arr[diffs]} : {arr[diffs]}'
        else:
            self.storage[name] = arr

    def check_numpy_hash(self, name, arr):
        # called_name = f'{sys._getframe(1).f_code.co_name}:{sys._getframe(1).f_lineno}_{name}'
        called_name = f'{sys._getframe(1).f_code.co_name}_{name}'
        # computed_hash = checksum_numpy_array(arr)
        self.check_diff(called_name, arr)


hash_global_storage = HashStorage()


def CoalesceData(o_np, h_np, l_np, c_np, timestamp_np, s_np, a_np, m_np):

    global hash_global_storage

    hash_global_storage.check_numpy_hash('o_np', o_np)
    hash_global_storage.check_numpy_hash('h_np', h_np)
    hash_global_storage.check_numpy_hash('l_np', l_np)
    hash_global_storage.check_numpy_hash('c_np', c_np)
    hash_global_storage.check_numpy_hash('timestamp_np', timestamp_np)
    hash_global_storage.check_numpy_hash('s_np', s_np)
    hash_global_storage.check_numpy_hash('a_np', a_np)
    hash_global_storage.check_numpy_hash('m_np', m_np)

    # create structured array
    dt = np.dtype([
        ('open', o_np.dtype),  # 4 bytes
        ('high', h_np.dtype),  # 4 bytes
        ('low', l_np.dtype),  # 4 bytes
        ('close', c_np.dtype),  # 4 bytes
        ('timestamp', timestamp_np.dtype),  # 8 bytes
        ('a', a_np.dtype),  # 4 bytes
        ('s', s_np.dtype),  # 1 byte
        ('m', m_np.dtype)  # 2 bytes
    ], align=True)

    structured_array = np.zeros(len(o_np), dtype=dt)
    structured_array['open'] = o_np
    structured_array['high'] = h_np
    structured_array['low'] = l_np
    structured_array['close'] = c_np
    structured_array['timestamp'] = timestamp_np
    structured_array['s'] = s_np
    structured_array['a'] = a_np
    structured_array['m'] = m_np

    print(structured_array.flags)
    print(f'Structured Array address: {id(structured_array)}')

    # Now, check arrays in the structured array to make sure everything is deterministic
    # autopep8: off
    hash_global_storage.check_numpy_hash('o_np', structured_array['open'])
    hash_global_storage.check_numpy_hash('h_np', structured_array['high'])
    hash_global_storage.check_numpy_hash('l_np', structured_array['low'])
    hash_global_storage.check_numpy_hash('c_np', structured_array['close'])
    hash_global_storage.check_numpy_hash('timestamp_np', structured_array['timestamp'])
    hash_global_storage.check_numpy_hash('s_np', structured_array['s'])
    hash_global_storage.check_numpy_hash('a_np', structured_array['a'])
    hash_global_storage.check_numpy_hash('m_np', structured_array['m'])
    # autopep8: on

    hash_global_storage.check_numpy_hash(
        'structured_array', structured_array)

    return structured_array


def TestCopy(o_np, h_np, l_np, c_np, timestamp_np, s_np, a_np, m_np):
    global hash_global_storage

    hash_global_storage.check_numpy_hash('o_np', o_np)
    hash_global_storage.check_numpy_hash('h_np', h_np)
    hash_global_storage.check_numpy_hash('l_np', l_np)
    hash_global_storage.check_numpy_hash('c_np', c_np)
    hash_global_storage.check_numpy_hash('timestamp_np', timestamp_np)
    hash_global_storage.check_numpy_hash('s_np', s_np)
    hash_global_storage.check_numpy_hash('a_np', a_np)
    hash_global_storage.check_numpy_hash('m_np', m_np)

    o_copy = o_np.copy()
    h_copy = h_np.copy()
    l_copy = l_np.copy()
    c_copy = c_np.copy()
    timestamp_copy = timestamp_np.copy()
    s_copy = s_np.copy()
    a_copy = a_np.copy()
    m_copy = m_np.copy()

    hash_global_storage.check_numpy_hash('o_np', o_copy)
    hash_global_storage.check_numpy_hash('h_np', h_copy)
    hash_global_storage.check_numpy_hash('l_np', l_copy)
    hash_global_storage.check_numpy_hash('c_np', c_copy)
    hash_global_storage.check_numpy_hash('timestamp_np', timestamp_copy)
    hash_global_storage.check_numpy_hash('s_np', s_copy)
    hash_global_storage.check_numpy_hash('a_np', a_copy)
    hash_global_storage.check_numpy_hash('m_np', m_copy)


if __name__ == '__main__':

    # create a test

    df_length = 124_882_868

    o_np = np.random.rand(df_length).astype(np.float32)
    h_np = np.random.rand(df_length).astype(np.float32)
    l_np = np.random.rand(df_length).astype(np.float32)
    c_np = np.random.rand(df_length).astype(np.float32)
    timestamp_np = np.full(df_length, 12313131, dtype=np.int64)
    a_np = np.random.rand(df_length).astype(np.float32)
    s_np = np.random.randint(6, size=df_length, dtype=np.int8)
    m_np = np.random.randint(1000, size=df_length, dtype=np.int16)

    for i in range(0, 1000):
        print(f"--------Working on iteration: {i}")
        # autopep8: off
        CoalesceData(o_np, h_np, l_np, c_np, timestamp_np, a_np, s_np, m_np)
        # TestCopy(o_np, h_np, l_np, c_np, timestamp_np, a_np, s_np, m_np) # Works
        # autopep8: on

When I run the following code on my MacBook Air, using a Conda environment (Python 3.12.8 and Numpy 2.0.2), the program successfully executes without an memory copy error.

--------Working on iteration: 999
Checking: CoalesceData_o_np for differences, new_id: 4354134000 old_id: 4354134000
Checking: CoalesceData_h_np for differences, new_id: 4354134096 old_id: 4354134096
Checking: CoalesceData_l_np for differences, new_id: 4354134192 old_id: 4354134192
Checking: CoalesceData_c_np for differences, new_id: 4354134288 old_id: 4354134288
Checking: CoalesceData_timestamp_np for differences, new_id: 4354133904 old_id: 4354133904
Checking: CoalesceData_s_np for differences, new_id: 4354134480 old_id: 4354134480
Checking: CoalesceData_a_np for differences, new_id: 4354134672 old_id: 4354134672
Checking: CoalesceData_m_np for differences, new_id: 4354134768 old_id: 4354134768
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

Structured Array address: 4354134576
Checking: CoalesceData_o_np for differences, new_id: 4354134960 old_id: 4354134000
Checking: CoalesceData_h_np for differences, new_id: 4354134960 old_id: 4354134096
Checking: CoalesceData_l_np for differences, new_id: 4354134960 old_id: 4354134192
Checking: CoalesceData_c_np for differences, new_id: 4354134960 old_id: 4354134288
Checking: CoalesceData_timestamp_np for differences, new_id: 4354134960 old_id: 4354133904
Checking: CoalesceData_s_np for differences, new_id: 4354134960 old_id: 4354134480
Checking: CoalesceData_a_np for differences, new_id: 4354134960 old_id: 4354134672
Checking: CoalesceData_m_np for differences, new_id: 4354134960 old_id: 4354134768
Checking: CoalesceData_structured_array for differences, new_id: 4354134576 old_id: 4354134384

When running on my Ubuntu server, also with Python 3.12.8 and Numpy 2.0.2, I get:

--------Working on iteration: 3                                                                                                                                                      
Checking: CoalesceData_o_np for differences, new_id: 133772149707888 old_id: 133772149707888                                                                                         
Checking: CoalesceData_h_np for differences, new_id: 133772149707984 old_id: 133772149707984                                                                                         
Checking: CoalesceData_l_np for differences, new_id: 133772149708080 old_id: 133772149708080                                                                                         
Checking: CoalesceData_c_np for differences, new_id: 133772149708176 old_id: 133772149708176                                                                                         
Checking: CoalesceData_timestamp_np for differences, new_id: 133772149707792 old_id: 133772149707792                                                                                 
Checking: CoalesceData_s_np for differences, new_id: 133772149708368 old_id: 133772149708368                                                                                         
Checking: CoalesceData_a_np for differences, new_id: 133772149708560 old_id: 133772149708560                                                                                         
Checking: CoalesceData_m_np for differences, new_id: 133772149708656 old_id: 133772149708656                                                                                         
  C_CONTIGUOUS : True                                                                                                                                                                
  F_CONTIGUOUS : True                                                                                                                                                                
  OWNDATA : True                                                                                                                                                                     
  WRITEABLE : True                                                                                                                                                                   
  ALIGNED : True                                                                                                                                                                     
  WRITEBACKIFCOPY : False                                                                                                                                                            
                                                                                                                                                                                     
Structured Array address: 133772149708464                                                                                                                                            
Checking: CoalesceData_o_np for differences, new_id: 133772149708848 old_id: 133772149707888                                                                                         
Checking: CoalesceData_h_np for differences, new_id: 133772149708848 old_id: 133772149707984                                                                                         
Checking: CoalesceData_l_np for differences, new_id: 133772149708848 old_id: 133772149708080                                                                                         
Traceback (most recent call last):                                                                                                                                                   
  File "/home/memo/bt/np_error_debug.py", line 105, in <module>                                                                                                          
    CoalesceData(o_np, h_np, l_np, c_np, timestamp_np,                                                                                                                               
  File "/home/memo/bt/np_error_debug.py", line 74, in CoalesceData                                                                                                       
    hash_global_storage.check_numpy_hash('l_np', structured_array['low'])                                                                                                            
  File "/home/memo/bt/np_error_debug.py", line 26, in check_numpy_hash                                                                                                   
    self.check_diff(called_name, arr)                                                                                                                                                
  File "/home/memo/bt/np_error_debug.py", line 18, in check_diff                                                                                                         
    assert diff_length == 0, f'indices: {diffs}, {expected_arr[diffs]} : {arr[diffs]}'                                                                                               
           ^^^^^^^^^^^^^^^^                                                                                                                                                          
AssertionError: indices: (array([97087371, 97087372]),), [0.61208063 0.56773347] : [0. 0.]

A slightly different number of elements error:

  File "/home/memo/bt/np_error_debug.py", line 41, in check_diff                                                                                                         
    assert diff_length == 0, f'indices: {diffs}, {expected_arr[diffs]} : {arr[diffs]}'                                                                                               
           ^^^^^^^^^^^^^^^^                                                                                                                                                          
AssertionError: indices: (array([120253452, 120253453, 120253454, 120253455, 120253456, 120253457,                                                                                   
       120253458, 120253459, 120253460, 120253461, 120253462, 120253463,                                                                                                             
       120253464, 120253465, 120253466, 120253467]),), [0.38179564 0.7686447  0.06995761 0.76895595 0.7134335  0.12035605                                                            
 0.9882022  0.7208525  0.5113986  0.11400567 0.08236554 0.09342069                                                                                                                   
 0.85959834 0.6065078  0.5138216  0.66513485] : [0.87417835 0.9489298  0.16752674 0.16250128 0.13623057 0.8921764                                                                    
 0.14262542 0.8389298  0.37004778 0.5679792  0.79316586 0.1225264                                                                                                                    
 0.86325306 0.6123406  0.5594882  0.2388553 ]
like image 853
Deftness Avatar asked Oct 16 '25 13:10

Deftness


1 Answers

Update: I started from the ground up to debug this (per a comment above) as I wasted 2-3 days on it. I ran memtester (in Linux vs the memtest86+ accessed through GRUB). It showed the following:

enter image description here

I decided to update my motherboard's bios (Jan 2023 to Feb 2025) and made sure stock ram settings. I re-ran with MemTest86+ overnight for 6 passes as well as Memtester in Linux and seeing no errors.

Oddly enough, when I re-run MemTester (as seen above), I can now only request a ~12 gb block and get the mlock versus I was able to run 92gb in the above screenshot.

I can now run the entire Python calculation above posted without errors developing, which completely matches now what I see on my MacBook.

My server had never showed signs of instability prior. I'll probably have it run a bunch more passes but I do think this was the issue. This is just something for all of us to keep in the back of our minds should you encounter an unusual bug. I can't remember to what level of stability testing I did 2 years prior.

like image 108
Deftness Avatar answered Oct 19 '25 04:10

Deftness



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!