Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read_csv does not load a comma separated CSV properly

Now,I analyze Titanic challenge of Kaggel. My code is this: code

But my ideal output is: ideal output

So,in my last code is

df["Age"].fillna(df.Age.median(), inplace=True)

and error happens

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'Age'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-4-9763f0a9951c> in <module>()
----> 1 df["Age"].fillna(df.Age.median(), inplace=True)

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
  1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'Age'

I use sep=',' so I really cannot understand why this code cannot separate in each comma.How can I fix this?

I followed one answer,but error happens (I do not know why) error

My data is data

like image 930
user8385498 Avatar asked Aug 31 '25 18:08

user8385498


2 Answers

Attention!

The main issue was downloading the data. If you run a problem of loading and processing the Kaggle Titanic Dataset, you may re-download the CSV from here and re-run your program.


You can pass delimiter=',':

df = pd.read_csv("Desktop/data/train.csv", delimiter=',')
print(df.head())

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  


print(df.columns)

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

Next, you can create a mapping of sorts:

mapping = {'male' : 0, 'female' : 1}

And you'll call pd.Series.replace:

df.Sex = df.Sex.replace(mapping)
print(df.Sex)

0    0
1    1
2    1
3    1
4    0
Name: Sex, dtype: int64
like image 80
cs95 Avatar answered Sep 02 '25 17:09

cs95


Your read_csv looks fine, the replace in the same line seems to be causing trouble.

Try to first read the csv as is into the variable df. This way your code will be cleaner.

df = pd.read_csv('Desktop/data/train.csv',sep=',')
df['Sex'] = df['Sex'].map( {'female': 1, 'male': 0} )

But you can leave the sep argument altogether as comma is standard delimiter

Alternatively do the cleaning with replace on a new line after you read the file into df and use inplace=True:

df['Sex'].replace({'male': 0, 'female': 1}, inplace=True)

General advice:

Kaggle webpage supports script sharing and commenting in kernel section. Try to look at it to see how you can go about the analysis if you are stuck somewhere:

https://www.kaggle.com/c/titanic/kernels

like image 45
StefanK Avatar answered Sep 02 '25 17:09

StefanK