I have a several pandas Data Frames stored in a dictionary:
df1=pd.DataFrame({'product':['ajoijoft','bbhjbh','cser','sesrd','yfgjke','tfyfyf','drdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df2=pd.DataFrame({'product':['ajyughjoijoft','bdrddbhjbh','rdtrdcser','sdtrdthddesrd','yawafgjke','tesrgsfyfyf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df3=pd.DataFrame({'product':['joijoft','bdbhjbh','rdcser','sdhddesrd','wajke','yf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df_dict = {"A":df1,'B':df2, "C":df3}
I want to know the length of the each string in product, so I write as below.
for i, ii in df_dict.items():
ii['Productsize'] = ii['product'].str.len()
This worked and I could get the length for all "product".
Next, I want to remove rows that have a short product string length, that is: Productsize < 6
I tried to use this code:
for i, ii in df_dict.items():
ii=ii[~(ii['Productsize'] <= 6)]
However, this did not work. If I write individually (i.e. not in a loop) as below, it will work though.
df1=df1[~(df1['Productsize'] <= 6)]
Does anyone know what the problem might be?
I tried you guys suggested. Unfortunately, this does not work. Do you know why...? Here is the code.
df1=pd.DataFrame({'product':['ajoijoft','bbhjbh','cser','sesrd','yfgjke','tfyfyf','drdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df2=pd.DataFrame({'product':['ajyughjoijoft','bdrddbhjbh','rdtrdcser','sdtrdthddesrd','yawafgjke','tesrgsfyfyf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df3=pd.DataFrame({'product':['joijoft','bdbhjbh','rdcser','sdhddesrd','wajke','yf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df_dict = {"A":df1,'B':df2, "C":df3}
for i, ii in df_dict.items():
ii['Productsize'] = ii['product'].str.len()
for i, ii in df_dict.items():
df_dict[i] = ii[~(ii['Productsize'] <= 6)]
First, you should be using a dictionary or list to hold many similar structured dataframes and not flood your global environment with separate dataframes. Always use a container to organize yourself and set up to run bulk operations like pd.concat to build a master set. But be sure to assign dataframes to dictionary directly and not create separate objects.
As for the reason your dictionary dataframes do not update is you are not correctly assigning. Every instance of df needs to be replaced with df[key]. So,
df[~(df['Productsize'] <= 6)]
Would be replaced as
df_dict[key][~(df_dict[key]['Productsize'] <= 6)]
You lose no functionality of the dataframe when it is stored in a container, just referencing it changes. Therefore adjust accordingly:
for k, v in df_dict.items():
df_dict[k]['Productsize'] = df_dict[k]['product'].str.len()
df_dict[k] = df_dict[k][~(df_dict[k]['Productsize'] <= 6)]
Alternatively, use the value item of dictionary loop, but reassign the temporary changes to current index as @phi explains.
for k, v in df_dict.items():
v['Productsize'] = v['product'].str.len()
v = v[~(v['Productsize'] <= 6)]
df_dict[k] = v
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With