Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas MemoryError on server with more Memory

I have a method working with dataframes on pandas that is behaving differently on 2 different systems. While trying to load and work with a particular source csv I am getting Memory errors on a Windows Server machine with 16gb of ram but not on my local computer with only 12

def load_table(self, name, source_folder="", columns=None):
    """Load a table from memory or csv by name.

    loads a table from memory or csv. if loaded from csv saves the result
    table to the temporary list. An explicit call to save_table is
    necessary if the results want to survive clearing temporary storage
    @param string name the name of the table to load
    @param string sourceFolder the folder to look for the csv if the table
        is not already in memory
    @return DataFrame returns a DataFrame representing the table if found.
    @raises IOError if table cannot be loaded
    """
    #using copy in these first two to avoid modification of existing data
    #without an explicit save_table
    if name in self.tables:
        result = self.tables[name].copy()
    elif name in self.temp_tables:
        result = self.temp_tables[name].copy()
    elif os.path.isfile(name+".csv"):
        data_frame = pd.read_csv(name+".csv", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame
    elif os.path.isfile(name+".xlsx"):
        data_frame = pd.read_excel(name+".xlsx", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame
    elif os.path.isfile(source_folder+name+".csv"):
        data_frame = pd.read_csv(source_folder+name+".csv", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame
    elif os.path.isfile(source_folder+name+".xlsx"):
        data_frame = pd.read_excel(source_folder+name+".xlsx", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame

and save_temp is like this:

def save_temp(self, data_frame, name):
        """ save a table to the temporary storage

        @param DataFrame data_frame, the data frame we are storing
        @param string name, the key to index this value
        @throws ValueError throws an error if the data frame is empty
        """
        if data_frame.empty:
            raise ValueError("The data frame passed was empty", name, data_frame)
        self.temp_tables[name] = data_frame.copy()

Sometimes the memoryError happens on the read_csv I attempted in the interactive interpreter to load this file manually which worked and then saved it into the tables dictionary referenced here. Then trying to do load_table errors out on the copy instead.

Taking the manually loaded dataframe and calling .copy() on it also produces a MemoryError with no text on the server box but not locally.

The server machine is running Windows Server 2012 R2 whereas my local machine is Windows 7

Both are 64-bit machines

the server is 2.20GHz with 2 processors while my local machine is 3.4 GHz Server:16GB RAM Local: 12GB RAM

changing the .copy() to .copy(False) allows the code to run on the server machine but does not answer the question of why it would be getting a MemoryError on the machine with more memory in the first place.

Edited to add: both are using pandas: 0.16.0 numpy: 1.9.2 The server is apparently using 32bit python while my local machine is 64-bit 2.7.8 for both

like image 923
lathomas64 Avatar asked Oct 22 '25 16:10

lathomas64


1 Answers

So your issue was that despite the same version of pandas and a 64-bit operating system you had 32-bit python which has a limit of 2gb for memory.

like image 196
EdChum Avatar answered Oct 25 '25 04:10

EdChum