I know there's a ton resources online for outlier removal, but I haven't yet managed to obtain what I exactly want, so posting here, I have an array (or DF) of 4 columns. Now I want to remove the rows from the DF based on a column's outlier values. The following is what I have tried, but they are not perfect.
def outliers2(data2, m = 4.5):
c=[]
data = data2[:,1] # Choosing the column
d = np.abs(data - np.median(data)) # deviation comoutation
mdev = np.median(d) # mean deviation
for i in range(len(data)):
if (abs(data[i] - mdev) < m * np.std(data)):
c.append(data2[i])
return c
x = pd.DataFrame(outliers2(np.array(b)))
column = ['t','orig_w','filt_w','smt_w']
x.columns = column
#Plot
plt.rcParams['figure.figsize'] = [10,8]
plt.plot(b.t,b.orig_w,'o',label='Original',alpha=0.8) # Original
plt.plot(x.t,x.orig_w,'.',c='r',label='Outlier removed',alpha=0.8) # After outlier removal
plt.legend()
the plot illustrates how the results looks, red points after the outlier treatment over the blue original points. I would really like to get rid of those vertical group of points around the x~0 mark. What to do ?
A link to the data file is provided here : Full data
The green circles show typically the points i would like to get rid of

You could use scipy's median_filter:
import pandas as pd
from matplotlib import pyplot as plt
from scipy.ndimage import median_filter
b = pd.read_csv("test.csv")
x = b.copy()
x.orig_w = median_filter(b.orig_w, size=15)
#Plot
plt.rcParams['figure.figsize'] = [10,8]
#Original
plt.plot(b.t,b.orig_w,'o',label='Original',alpha=0.8)
# After outlier removal
plt.plot(x.t,x.orig_w,'.',c='r',label='Outlier removed',alpha=0.8)
plt.legend()
plt.show()
Sample output:

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With