I have a pandas dataframe which I checked the size of with sys.
sys.getsizeof(df)
# output: 136
If I transpose it I get
sys.getsizeof(df.T)
# output: 341
If I transpose twice I get
sys.getsizeof(df.T.T)
#output: 136
How is pandas managing the memory?
UPDATE:
I used df.memory_usage instead to yield the following (which confused me even more as copying yielded smaller in memory size). Is this related to the datatypes of the objects? Or maybe the column and index strings?
df = pd.DataFrame({"Total Unique Authors": author_count,
"Earliest Year": [earliest_year],
"Latest Year": [latest_year],
"Total Reviews": [total_reviews]})
print(df.memory_usage().sum())
print(df.copy().memory_usage().sum())
print(df.T.memory_usage().sum())
print(df.T.copy().memory_usage().sum())
OUTPUT
112
112
224
64
Taken from sys documentation:
Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
however, I cannot reproduce your finding:
import sys
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10,3))
print(sys.getsizeof(df))
print(sys.getsizeof(df.T))
leads to
344
344
As commented by coldspeed, df.info() or 'df.memory_usage()' is more helpful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With