Outlier removal techniques from an array

Question

I know there's a ton resources online for outlier removal, but I haven't yet managed to obtain what I exactly want, so posting here, I have an array (or DF) of 4 columns. Now I want to remove the rows from the DF based on a column's outlier values. The following is what I have tried, but they are not perfect.

def outliers2(data2, m = 4.5):
    c=[]
    data = data2[:,1] # Choosing the column
    d = np.abs(data - np.median(data)) # deviation comoutation
    mdev = np.median(d) # mean deviation
    for i in range(len(data)):
        if (abs(data[i] - mdev) < m * np.std(data)):
            c.append(data2[i])            
    return c

x = pd.DataFrame(outliers2(np.array(b)))
column = ['t','orig_w','filt_w','smt_w']
x.columns = column

#Plot
plt.rcParams['figure.figsize'] = [10,8]
plt.plot(b.t,b.orig_w,'o',label='Original',alpha=0.8) # Original
plt.plot(x.t,x.orig_w,'.',c='r',label='Outlier removed',alpha=0.8) # After outlier removal
plt.legend()

the plot illustrates how the results looks, red points after the outlier treatment over the blue original points. I would really like to get rid of those vertical group of points around the x~0 mark. What to do ?

A link to the data file is provided here : Full data enter image description here The green circles show typically the points i would like to get rid of

Mr. T · Accepted Answer

You could use scipy's median_filter:

import pandas as pd
from matplotlib import pyplot as plt
from scipy.ndimage import median_filter

b = pd.read_csv("test.csv")

x = b.copy()
x.orig_w = median_filter(b.orig_w, size=15)

#Plot
plt.rcParams['figure.figsize'] = [10,8]
#Original
plt.plot(b.t,b.orig_w,'o',label='Original',alpha=0.8) 
# After outlier removal
plt.plot(x.t,x.orig_w,'.',c='r',label='Outlier removed',alpha=0.8) 
plt.legend()
plt.show()

Sample output: enter image description here

Outlier removal techniques from an array

Tags:

python

pandas

numpy

scipy

outliers

Ayan Mitra

1 Answers

Mr. T

Recent Activity

Donate For Us

Outlier removal techniques from an array

Tags:

python

pandas

numpy

scipy

outliers

Ayan Mitra

1 Answers

Mr. T

Related questions

Recent Activity

Donate For Us