I was trying to build a predictor that tells me if a tweet is talking about a natural disaster o not.
Using the Kaggle dataset.
I ve got:
    text               target
15  What's up man?      0
16  I love fruits       0
17  Summer is lovely    0
18  My car is so fast   0
The list goes on..
I got for the target, this number of appearance
0 4342
1 3271
Name: target, dtype: int64
This is my DataBlock
dls_lm = DataBlock(
blocks=(TextBlock.from_df('text', seq_len=15, is_lm=True), CategoryBlock),
get_x=ColReader('text'), get_y=ColReader('target'), splitter=ColSplitter())
This is my Dataloaders
dls = dls_lm.dataloaders(df2, bs=24)
This is the error that im having
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'is_valid'
The above exception was the direct cause of the following exception:
KeyError                                  Traceback (most recent call last)
5 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:
KeyError: 'is_valid'
If anyone knows how I can fix it would really help me. Thanks!
The reason for this error is the parameter splitter=ColSplitter().
Replace it with something like splitter=RandomSplitter(valid_pct=0.1, seed=42)
The signature of ColSplitter is
def ColSplitter(col='is_valid'):
    "Split `items` (supposed to be a dataframe) by value in `col`"
What does that mean? Well, FastAI split your input data into a train and a validation set to assess the performance of your trained model in every iteration.
ColSplitter expects your input DataFrame to have a column is_valid that specifies which items (rows) should be in the validation set.
Since you don't have a column called is_valid in your input data you should replace the ColSplitter with a different data splitting strategy, e.g. random splitting:
splitter=RandomSplitter(valid_pct=0.1, seed=42)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With