I have 6 big data tsv files which I am reading into dataframes within Google Collab. However, the files are too big and Google Colab cannot handle it.
#Crew data
downloaded = drive.CreateFile({'id':'16'}) 
downloaded.GetContentFile('title.crew.tsv') 
df_crew = pd.read_csv('title.crew.tsv',header=None,sep='\t',dtype='unicode')
#Ratings data
downloaded = drive.CreateFile({'id':'15'}) 
downloaded.GetContentFile('title.ratings.tsv') 
df_ratings = pd.read_csv('title.ratings.tsv',header=None,sep='\t',dtype='unicode')
#Episode data
downloaded = drive.CreateFile({'id':'14'}) 
downloaded.GetContentFile('title.episode.tsv') 
df_episode = pd.read_csv('title.episode.tsv',header=None,sep='\t',dtype='unicode')
#Name Basics data
downloaded = drive.CreateFile({'id':'13'}) 
downloaded.GetContentFile('name.basics.tsv') 
df_name = pd.read_csv('name.basics.tsv',header=None,sep='\t',dtype='unicode')
#Principals data
downloaded = drive.CreateFile({'id':'12'}) 
downloaded.GetContentFile('title.pricipals.tsv') 
df_principals = pd.read_csv('title.pricipals.tsv',header=None,sep='\t',dtype='unicode')
#Title Basics data
downloaded = drive.CreateFile({'id':'11'}) 
downloaded.GetContentFile('title.basics.tsv') 
df_title = pd.read_csv('title.basics.tsv',header=None,sep='\t',dtype='unicode')
Error: Your session crashed after using all available RAM. Runtime logs say this:

How can Google Collab handle Ram better? The size of all my tsv files combined is 2,800 MB. Please advise!
The simplest way is to only use data as you use it and delete it from memory. This can be done forcefully by causing the garbage collector to release (see thread here [https://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python])1
If you want to expand your RAM in Colab there used to be a hack where you intentionally caused it to run out of RAM and then it'll offer you a higher RAM runtime. This option can also be selected with Colab pro under Runtime -> Change Runtime Type. For $10 a month, Colab pro may very well be a good option for you.
I saw this hack here but in short just append something to an array in a while loop until the RAM is depleted.
a = []
while 1:
    a.append("1")
If anyone is working with any neural network model. The RAM offered in google-colab without google pro account is around 12GB. This could lead crashing of session due to low resources for some neural model. You can decrease the training and testing dataset by some amount and re-check the working of model. It might work well.
One can shuffle the dataset and use dataset less than the original dataset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With