I am trying to read a .csv file called ratings.csv from http://grouplens.org/datasets/movielens/20m/ the file is 533.4MB in my computer.
This is what am writing in jupyter notebook
import pandas as pd
ratings = pd.read_cv('./movielens/ratings.csv', sep=',')
The problem from here is the kernel would break or die and ask me to restart and its keeps repeating the same. There is no any error. Please can you suggest any alternative of solving this, it is as if my computer has no capability of running this.
This works but it keeps rewriting
chunksize = 20000
for ratings in pd.read_csv('./movielens/ratings.csv', chunksize=chunksize):
ratings.append(ratings)
ratings.head()
Only the last chunk is written others are written-off
You should consider using the chunksize
parameter in read_csv
when reading in your dataframe, because it returns a TextFileReader
object you can then pass to pd.concat
to concatenate your chunks.
chunksize = 100000
tfr = pd.read_csv('./movielens/ratings.csv', chunksize=chunksize, iterator=True)
df = pd.concat(tfr, ignore_index=True)
If you just want to process each chunk individually, use,
chunksize = 20000
for chunk in pd.read_csv('./movielens/ratings.csv',
chunksize=chunksize,
iterator=True):
do_something_with_chunk(chunk)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With