Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove outlier with Python

I have a DataFrame which consists of 30 rows and 9 columns. I want to make a 2 sigma outlier removal.

I do it with this:

from scipy import stats
df[(np.abs(stats.zscore(df)) < 2).all(axis=1)]

But it removes the whole line if there is a outlier in a single column. I just want to get this single value deleted. How can I do this? And the first column contains the time. this should never be touched. How can I exclude this single column?

this is how the data looks like:

Trace for Mass: 60Ni    61Ni    62Ni    63Cu    64Ni    65Cu    66Zn
Resolution: High    High    High    High    High    High    High
                            
Time    Intensity   Intensity   Intensity   Intensity   Intensity   Intensity   Intensity
[sec]   [cps]   [cps]   [cps]   [cps]   [cps]   [cps]   [cps]

0.  4.246875178068876e-003  4.550645244307816e-004  8.364085806533694e-004  3.21496045216918e-003   3.215973265469074e-003  1.595904817804694e-003  1.983924303203821e-003  
1.051999807357788   4.264393821358681e-003  5.171436932869256e-004  8.292743586935103e-004  3.154967911541462e-003  3.216561861336231e-003  1.622977200895548e-003  1.874359208159149e-003  
2.102999925613403   4.27544629201293e-003   4.796394787263125e-004  8.318902109749615e-004  3.211528761312366e-003  3.147452371194959e-003  1.622740761376917e-003  1.879810937680304e-003  
3.154999971389771   4.278738517314196e-003  4.829006502404809e-004  7.972901221364737e-004  3.218628698959947e-003  3.22998408228159e-003   1.604416524060071e-003  1.938240835443139e-003  
4.206999778747559   4.211603198200464e-003  4.424861108418554e-004  8.007381693460047e-004  3.2428870908916e-003    3.166524693369865e-003  1.590821426361799e-003  1.903632888570428e-003  
5.257999897003174   4.267803858965635e-003  5.1306706154719e-004    8.309389813803136e-004  3.144200425595045e-003  3.117314074188471e-003  1.603707205504179e-003  1.815222087316215e-003  
6.309999942779541   4.182798787951469e-003  5.052632768638432e-004  7.896805764175952e-004  3.130593337118626e-003  3.10095027089119e-003   1.570251770317555e-003  1.817710697650909e-003  
7.361000061035156   4.296375438570976e-003  4.910536226816475e-004  8.9122453937307e-004    3.204192267730832e-003  3.028199542313814e-003  1.533132861368358e-003  1.788084045983851e-003  
8.413000106811523   4.335530567914248e-003  6.025235052220523e-004  8.631621603854001e-004  3.268211148679256e-003  2.987353131175041e-003  1.608435995876789e-003  1.796260941773653e-003  
9.463999748229981   4.290143493562937e-003  4.839488829020411e-004  8.525795419700444e-004  3.222533734515309e-003  3.005951410159469e-003  1.583610195666552e-003  1.700276043266058e-003  
10.51599979400635   4.287909716367722e-003  5.497571546584368e-004  9.083477198146284e-004  3.219338599592447e-003  2.950039459392428e-003  1.682562520727515e-003  1.783343963325024e-003  
11.56699943542481   4.260278772562742e-003  4.665948799811304e-004  7.738673011772335e-004  3.193542594090104e-003  2.853760728612542e-003  1.568833249621093e-003  1.736654434353113e-003  
12.61899948120117   4.26474679261446e-003   5.00720867421478e-004   8.611407829448581e-004  3.217800287529826e-003  2.865647897124291e-003  1.595077337697148e-003  1.658685388974845e-003  
13.67099952697754   4.222772549837828e-003  4.647313617169857e-004  8.633999968878925e-004  3.159464336931706e-003  2.801976399496198e-003  1.629361184313893e-003  1.673259655945003e-003  
14.72200012207031   4.23405971378088e-003   4.880253691226244e-004  8.320091292262077e-004  3.10550956055522e-003   2.766199875622988e-003  1.57923623919487e-003   1.671363832429051e-003  
15.77400016784668   4.263806156814098e-003  5.268111126497388e-004  8.335548918694258e-004  3.150589996948838e-003  2.747958991676569e-003  1.52225757483393e-003   1.638660905882716e-003  
16.82500076293945   4.173276014626026e-003  5.153965321369469e-004  7.848058012314141e-004  3.132368205115199e-003  2.736426191404462e-003  1.501098275184631e-003  1.646955031901598e-003  
17.87699890136719   4.209604579955339e-003  4.582091642078012e-004  7.977656787261367e-004  3.183129709213972e-003  2.714420203119516e-003  1.604771241545677e-003  1.606788486242294e-003  
18.92900085449219   4.214542452245951e-003  4.919854109175503e-004  8.5032032802701e-004    3.177686594426632e-003  2.588512841612101e-003  1.560558215714991e-003  1.607973361387849e-003  
19.97999954223633   4.171629901975393e-003  4.438837058842182e-004  8.449696470052004e-004  3.142070723697543e-003  2.649111207574606e-003  1.58833886962384e-003   1.547667197883129e-003  
21.0310001373291    4.234999883919954e-003  5.094563821330667e-004  8.215457201004028e-004  3.189756069332361e-003  2.645698608830571e-003  1.556538976728916e-003  1.515797688625753e-003  
22.08300018310547   4.159520845860243e-003  5.21336798556149e-004   7.7945546945557e-004    3.093914361670613e-003  2.504269825294614e-003  1.597914495505393e-003  1.550629152916372e-003  
23.13399887084961   4.095097538083792e-003  5.284418002702296e-004  8.160762954503298e-004  3.164552384987474e-003  2.605574205517769e-003  1.5143376076594e-003    1.545534702017903e-003  
24.18600082397461   4.190911073237658e-003  4.741653683595359e-004  8.253505802713335e-004  3.078178269788623e-003  2.457562601193786e-003  1.61718437448144e-003   1.502647297456861e-003  
25.23799896240234   4.155758768320084e-003  4.477270995266736e-004  8.012137841433287e-004  3.119352972134948e-003  2.549331868067384e-003  1.551455701701343e-003  1.538307638838887e-003  
26.28899955749512   4.055834375321865e-003  4.267746699042618e-004  8.247561054304242e-004  3.050019731745124e-003  2.364743268117309e-003  1.565523212775588e-003  1.418655156157911e-003  
27.34099960327148   4.160813987255096e-003  4.637996316887438e-004  8.405701955780387e-004  3.15011665225029e-003   2.621341263875365e-003  1.558548538014293e-003  1.534871873445809e-003  
28.39200019836426   4.123781807720661e-003  5.418366636149585e-004  8.308201213367283e-004  3.128936979919672e-003  2.427210099995136e-003  1.607372076250613e-003  1.475754892453551e-003  
29.44400024414063   4.185620695352554e-003  4.987408174201846e-004  7.421225891448557e-004  3.080426249653101e-003  2.371448557823896e-003  1.567532890476286e-003  1.444243011064827e-003  
30.49600028991699   4.092158749699593e-003  5.319360643625259e-004  8.368841372430325e-004  3.113200422376394e-003  2.385094529017806e-003  1.580300158821046e-003  1.433581346645951e-003  

This file is read by:

pd.options.display.float_format = '{:.4f}'.format

data = pd.read_csv(dateiname, sep='\t', names=['Time', '60Ni', '61Ni', '62Ni', '63Cu', '64Ni', '65Cu', '66Zn'], skiprows=6, nrows=30, index_col=False, dtype=float)
like image 271
Nobody Avatar asked Oct 27 '25 08:10

Nobody


1 Answers

If you need to replace outliers by missing values, use DataFrame.mask:

df = df.mask(np.abs(stats.zscore(df)) < 2)

#working for replace outlier by missing values
#df = df.mask(np.abs(stats.zscore(df)) < 2, np.nan)

I just want to get this single value deleted.

This is not possible, we can only remove row(s) like your solution.

like image 83
jezrael Avatar answered Oct 30 '25 00:10

jezrael