I am trying to create a new column that contains city names. I also have a list containing the city names needed and CSV files that have city names under different column names.
What I am trying to do is to check whether the city names in the list exist in a specific range of columns of the CSV files and fill that particular city name in the new column City.
My code is:
import pandas as pd
import numpy as np
City_Name_List = ['Amsterdam', 'Antwerp', 'Brussels', 'Ghent', 'Asheville', 'Austin', 'Boston', 'Broward County',
'Cambridge', 'Chicago', 'Clark County Nv', 'Columbus', 'Denver', 'Hawaii', 'Jersey City', 'Los Angeles',
'Nashville', 'New Orleans', 'New York City', 'Oakland', 'Pacific Grove', 'Portland', 'Rhode Island', 'Salem Or', 'San Diego']
data = {'host_identity_verified':['t','t','t','t','t','t','t','t','t','t'],
'neighbourhood':['Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands', 'NaN',
'Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands',
'Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands', 'NaN',
'Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands'],
'neighbourhood_cleansed':['Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West',
'Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West'],
'neighbourhood_group_cleansed': ['NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN'],
'latitude':[ 52.36575, 52.36509, 52.37297, 52.38761, 52.36719, 52.36575, 52.36509, 52.37297, 52.38761, 52.36719]}
df = pd.DataFrame(data)
df['City'] = [x for x in City_Name_List if x in df.loc[:,'host_identity_verified':'latitude'].values][0]
When I run the code, I get this message:
Traceback (most recent call last):
File "C:/Users/YAZAN/PycharmProjects/Yazan_Work/try.py", line 63, in <module>
df['City'] = [x for x in City_Name_List if x in df.loc[:,'host_identity_verified':'latitude'].values][0]
IndexError: list index out of range
This due to the face that the City Amsterdam in the data is followed by other words.
I want my output to be as follow:
0 Amsterdam
1 Amsterdam
2 Amsterdam
3 Amsterdam
4 Amsterdam
5 Amsterdam
6 Amsterdam
7 Amsterdam
8 Amsterdam
9 Amsterdam
Name: City, dtype: object
I tried relentlessly to solve this issue. I tried to use endswith
, startswith
, regex, but to no avail. I might be using both methods wrongly. I hope someone can help me.
The issue is that when you say x in df.loc[]
you are not checking if the city name is in each particular string, but rather if the city name is in the entire Series, which it is not. What you need is something like this:
df['city'] = [x if x in City_Name_list else '' for x[0] in df['neighbourhood'].str.split(',')]
This will split each row in df['neighborhood'] along the commas and return the first value, then check if that value is in your list of city names and if so then place it in the 'city' Series.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With