Now,I analyze Titanic challenge of Kaggel.
My code is this:
But my ideal output is:
So,in my last code is
df["Age"].fillna(df.Age.median(), inplace=True)
and error happens
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2133 try:
-> 2134 return self._engine.get_loc(key)
2135 except KeyError:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()
KeyError: 'Age'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-4-9763f0a9951c> in <module>()
----> 1 df["Age"].fillna(df.Age.median(), inplace=True)
/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
2057 return self._getitem_multilevel(key)
2058 else:
-> 2059 return self._getitem_column(key)
2060
2061 def _getitem_column(self, key):
/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
2064 # get column
2065 if self.columns.is_unique:
-> 2066 return self._get_item_cache(key)
2067
2068 # duplicate columns & possible reduce dimensionality
/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
1384 res = cache.get(item)
1385 if res is None:
-> 1386 values = self._data.get(item)
1387 res = self._box_item_values(item, values)
1388 cache[item] = res
/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
3541
3542 if not isnull(item):
-> 3543 loc = self.items.get_loc(item)
3544 else:
3545 indexer = np.arange(len(self.items))[isnull(self.items)]
/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2134 return self._engine.get_loc(key)
2135 except KeyError:
-> 2136 return self._engine.get_loc(self._maybe_cast_indexer(key))
2137
2138 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()
KeyError: 'Age'
I use sep=','
so I really cannot understand why this code cannot separate in each comma.How can I fix this?
I followed one answer,but error happens (I do not know why)
My data is
Attention!
The main issue was downloading the data. If you run a problem of loading and processing the Kaggle Titanic Dataset, you may re-download the CSV from here and re-run your program.
You can pass delimiter=','
:
df = pd.read_csv("Desktop/data/train.csv", delimiter=',')
print(df.head())
PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
Parch Ticket Fare Cabin Embarked
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
print(df.columns)
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')
Next, you can create a mapping of sorts:
mapping = {'male' : 0, 'female' : 1}
And you'll call pd.Series.replace
:
df.Sex = df.Sex.replace(mapping)
print(df.Sex)
0 0
1 1
2 1
3 1
4 0
Name: Sex, dtype: int64
Your read_csv looks fine, the replace in the same line seems to be causing trouble.
Try to first read the csv as is into the variable df. This way your code will be cleaner.
df = pd.read_csv('Desktop/data/train.csv',sep=',')
df['Sex'] = df['Sex'].map( {'female': 1, 'male': 0} )
But you can leave the sep argument altogether as comma is standard delimiter
Alternatively do the cleaning with replace on a new line after you read the file into df and use
inplace=True
:
df['Sex'].replace({'male': 0, 'female': 1}, inplace=True)
General advice:
Kaggle webpage supports script sharing and commenting in kernel section. Try to look at it to see how you can go about the analysis if you are stuck somewhere:
https://www.kaggle.com/c/titanic/kernels
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With