I've got a many row, many column dataframe with different 'placeholder' values needing substitution (in a subset of columns). I've read many examples in the forum using nested lists or dictionaries, but haven't had luck with variations..
# A test dataframe
df = pd.DataFrame({'Sample':['alpha','beta','gamma','delta','epsilon'],
'element1':[1,-0.01,-5000,1,-2000],
'element2':[1,1,1,-5000,2],
'element3':[-5000,1,1,-0.02,2]})
# List of headings containing values to replace
headings = ['element1', 'element2', 'element3']
And I am trying to do something like this (obviously this doesn't work):
# If any rows have value <-1, NaN
df[headings].replace(df[headings < -1], np.nan)
# If a value is between -1 and 0, make a replacement
df[headings].replace(df[headings < 0 & headings > -1], 0.05)
So, is there possibly a better way to accomplish this using loops or fancy pandas tricks?
You can set the Sample column as index and then replace values on the whole data frame based on conditions:
df = df.set_index('Sample')
df[df < -1] = np.nan
df[(df < 0) & (df > -1)] = 0.05
Which gives:
# element1 element2 element3
# Sample
# alpha 1.00 1.0 NaN
# beta 0.05 1.0 1.00
# gamma NaN 1.0 1.00
# delta 1.00 NaN 0.05
# epsilon NaN 2.0 2.00
Here is the successful answer as suggested by @Psidom.
The solution involves taking a slice out of the dataframe, applying the function, then reincorporates the amended slice:
df1 = df.loc[:, headings]
df1[df1 < -1] = np.nan
df1[(df1 < 0)] = 0.05
df.loc[:, headings] = df1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With