I have a method working with dataframes on pandas that is behaving differently on 2 different systems. While trying to load and work with a particular source csv I am getting Memory errors on a Windows Server machine with 16gb of ram but not on my local computer with only 12
def load_table(self, name, source_folder="", columns=None):
"""Load a table from memory or csv by name.
loads a table from memory or csv. if loaded from csv saves the result
table to the temporary list. An explicit call to save_table is
necessary if the results want to survive clearing temporary storage
@param string name the name of the table to load
@param string sourceFolder the folder to look for the csv if the table
is not already in memory
@return DataFrame returns a DataFrame representing the table if found.
@raises IOError if table cannot be loaded
"""
#using copy in these first two to avoid modification of existing data
#without an explicit save_table
if name in self.tables:
result = self.tables[name].copy()
elif name in self.temp_tables:
result = self.temp_tables[name].copy()
elif os.path.isfile(name+".csv"):
data_frame = pd.read_csv(name+".csv", encoding="utf-8")
self.save_temp(data_frame, name)
result = data_frame
elif os.path.isfile(name+".xlsx"):
data_frame = pd.read_excel(name+".xlsx", encoding="utf-8")
self.save_temp(data_frame, name)
result = data_frame
elif os.path.isfile(source_folder+name+".csv"):
data_frame = pd.read_csv(source_folder+name+".csv", encoding="utf-8")
self.save_temp(data_frame, name)
result = data_frame
elif os.path.isfile(source_folder+name+".xlsx"):
data_frame = pd.read_excel(source_folder+name+".xlsx", encoding="utf-8")
self.save_temp(data_frame, name)
result = data_frame
and save_temp is like this:
def save_temp(self, data_frame, name):
""" save a table to the temporary storage
@param DataFrame data_frame, the data frame we are storing
@param string name, the key to index this value
@throws ValueError throws an error if the data frame is empty
"""
if data_frame.empty:
raise ValueError("The data frame passed was empty", name, data_frame)
self.temp_tables[name] = data_frame.copy()
Sometimes the memoryError happens on the read_csv I attempted in the interactive interpreter to load this file manually which worked and then saved it into the tables dictionary referenced here. Then trying to do load_table errors out on the copy instead.
Taking the manually loaded dataframe and calling .copy() on it also produces a MemoryError with no text on the server box but not locally.
The server machine is running Windows Server 2012 R2 whereas my local machine is Windows 7
Both are 64-bit machines
the server is 2.20GHz with 2 processors while my local machine is 3.4 GHz Server:16GB RAM Local: 12GB RAM
changing the .copy() to .copy(False) allows the code to run on the server machine but does not answer the question of why it would be getting a MemoryError on the machine with more memory in the first place.
Edited to add: both are using pandas: 0.16.0 numpy: 1.9.2 The server is apparently using 32bit python while my local machine is 64-bit 2.7.8 for both
So your issue was that despite the same version of pandas and a 64-bit operating system you had 32-bit python which has a limit of 2gb for memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With