I have a DataFrame which consists of 30 rows and 9 columns. I want to make a 2 sigma outlier removal.
I do it with this:
from scipy import stats
df[(np.abs(stats.zscore(df)) < 2).all(axis=1)]
But it removes the whole line if there is a outlier in a single column. I just want to get this single value deleted. How can I do this? And the first column contains the time. this should never be touched. How can I exclude this single column?
this is how the data looks like:
Trace for Mass: 60Ni 61Ni 62Ni 63Cu 64Ni 65Cu 66Zn
Resolution: High High High High High High High
Time Intensity Intensity Intensity Intensity Intensity Intensity Intensity
[sec] [cps] [cps] [cps] [cps] [cps] [cps] [cps]
0. 4.246875178068876e-003 4.550645244307816e-004 8.364085806533694e-004 3.21496045216918e-003 3.215973265469074e-003 1.595904817804694e-003 1.983924303203821e-003
1.051999807357788 4.264393821358681e-003 5.171436932869256e-004 8.292743586935103e-004 3.154967911541462e-003 3.216561861336231e-003 1.622977200895548e-003 1.874359208159149e-003
2.102999925613403 4.27544629201293e-003 4.796394787263125e-004 8.318902109749615e-004 3.211528761312366e-003 3.147452371194959e-003 1.622740761376917e-003 1.879810937680304e-003
3.154999971389771 4.278738517314196e-003 4.829006502404809e-004 7.972901221364737e-004 3.218628698959947e-003 3.22998408228159e-003 1.604416524060071e-003 1.938240835443139e-003
4.206999778747559 4.211603198200464e-003 4.424861108418554e-004 8.007381693460047e-004 3.2428870908916e-003 3.166524693369865e-003 1.590821426361799e-003 1.903632888570428e-003
5.257999897003174 4.267803858965635e-003 5.1306706154719e-004 8.309389813803136e-004 3.144200425595045e-003 3.117314074188471e-003 1.603707205504179e-003 1.815222087316215e-003
6.309999942779541 4.182798787951469e-003 5.052632768638432e-004 7.896805764175952e-004 3.130593337118626e-003 3.10095027089119e-003 1.570251770317555e-003 1.817710697650909e-003
7.361000061035156 4.296375438570976e-003 4.910536226816475e-004 8.9122453937307e-004 3.204192267730832e-003 3.028199542313814e-003 1.533132861368358e-003 1.788084045983851e-003
8.413000106811523 4.335530567914248e-003 6.025235052220523e-004 8.631621603854001e-004 3.268211148679256e-003 2.987353131175041e-003 1.608435995876789e-003 1.796260941773653e-003
9.463999748229981 4.290143493562937e-003 4.839488829020411e-004 8.525795419700444e-004 3.222533734515309e-003 3.005951410159469e-003 1.583610195666552e-003 1.700276043266058e-003
10.51599979400635 4.287909716367722e-003 5.497571546584368e-004 9.083477198146284e-004 3.219338599592447e-003 2.950039459392428e-003 1.682562520727515e-003 1.783343963325024e-003
11.56699943542481 4.260278772562742e-003 4.665948799811304e-004 7.738673011772335e-004 3.193542594090104e-003 2.853760728612542e-003 1.568833249621093e-003 1.736654434353113e-003
12.61899948120117 4.26474679261446e-003 5.00720867421478e-004 8.611407829448581e-004 3.217800287529826e-003 2.865647897124291e-003 1.595077337697148e-003 1.658685388974845e-003
13.67099952697754 4.222772549837828e-003 4.647313617169857e-004 8.633999968878925e-004 3.159464336931706e-003 2.801976399496198e-003 1.629361184313893e-003 1.673259655945003e-003
14.72200012207031 4.23405971378088e-003 4.880253691226244e-004 8.320091292262077e-004 3.10550956055522e-003 2.766199875622988e-003 1.57923623919487e-003 1.671363832429051e-003
15.77400016784668 4.263806156814098e-003 5.268111126497388e-004 8.335548918694258e-004 3.150589996948838e-003 2.747958991676569e-003 1.52225757483393e-003 1.638660905882716e-003
16.82500076293945 4.173276014626026e-003 5.153965321369469e-004 7.848058012314141e-004 3.132368205115199e-003 2.736426191404462e-003 1.501098275184631e-003 1.646955031901598e-003
17.87699890136719 4.209604579955339e-003 4.582091642078012e-004 7.977656787261367e-004 3.183129709213972e-003 2.714420203119516e-003 1.604771241545677e-003 1.606788486242294e-003
18.92900085449219 4.214542452245951e-003 4.919854109175503e-004 8.5032032802701e-004 3.177686594426632e-003 2.588512841612101e-003 1.560558215714991e-003 1.607973361387849e-003
19.97999954223633 4.171629901975393e-003 4.438837058842182e-004 8.449696470052004e-004 3.142070723697543e-003 2.649111207574606e-003 1.58833886962384e-003 1.547667197883129e-003
21.0310001373291 4.234999883919954e-003 5.094563821330667e-004 8.215457201004028e-004 3.189756069332361e-003 2.645698608830571e-003 1.556538976728916e-003 1.515797688625753e-003
22.08300018310547 4.159520845860243e-003 5.21336798556149e-004 7.7945546945557e-004 3.093914361670613e-003 2.504269825294614e-003 1.597914495505393e-003 1.550629152916372e-003
23.13399887084961 4.095097538083792e-003 5.284418002702296e-004 8.160762954503298e-004 3.164552384987474e-003 2.605574205517769e-003 1.5143376076594e-003 1.545534702017903e-003
24.18600082397461 4.190911073237658e-003 4.741653683595359e-004 8.253505802713335e-004 3.078178269788623e-003 2.457562601193786e-003 1.61718437448144e-003 1.502647297456861e-003
25.23799896240234 4.155758768320084e-003 4.477270995266736e-004 8.012137841433287e-004 3.119352972134948e-003 2.549331868067384e-003 1.551455701701343e-003 1.538307638838887e-003
26.28899955749512 4.055834375321865e-003 4.267746699042618e-004 8.247561054304242e-004 3.050019731745124e-003 2.364743268117309e-003 1.565523212775588e-003 1.418655156157911e-003
27.34099960327148 4.160813987255096e-003 4.637996316887438e-004 8.405701955780387e-004 3.15011665225029e-003 2.621341263875365e-003 1.558548538014293e-003 1.534871873445809e-003
28.39200019836426 4.123781807720661e-003 5.418366636149585e-004 8.308201213367283e-004 3.128936979919672e-003 2.427210099995136e-003 1.607372076250613e-003 1.475754892453551e-003
29.44400024414063 4.185620695352554e-003 4.987408174201846e-004 7.421225891448557e-004 3.080426249653101e-003 2.371448557823896e-003 1.567532890476286e-003 1.444243011064827e-003
30.49600028991699 4.092158749699593e-003 5.319360643625259e-004 8.368841372430325e-004 3.113200422376394e-003 2.385094529017806e-003 1.580300158821046e-003 1.433581346645951e-003
This file is read by:
pd.options.display.float_format = '{:.4f}'.format
data = pd.read_csv(dateiname, sep='\t', names=['Time', '60Ni', '61Ni', '62Ni', '63Cu', '64Ni', '65Cu', '66Zn'], skiprows=6, nrows=30, index_col=False, dtype=float)
If you need to replace outliers by missing values, use DataFrame.mask:
df = df.mask(np.abs(stats.zscore(df)) < 2)
#working for replace outlier by missing values
#df = df.mask(np.abs(stats.zscore(df)) < 2, np.nan)
I just want to get this single value deleted.
This is not possible, we can only remove row(s) like your solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With