Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

use pandas to check first digit of a column

Tags:

python

pandas

Problem
I need to test the first digit of each number in a column for conditions.

Conditions
is the first digit of checkVar greater than 5 or is the first digit of checkVar less than 2
then set newVar=1

Solution

One thought that I had was to convert to it a string, left strip the space, and then take [0], but i can't figure out the code.

perhaps something like,

df.ix[df.checkVar.str[0:1].str.contains('1'),'newVar']=1

It isn't what I want, and for some reason i get this error

invalid index to scalar variable.

testing my original variable i get values that should meet the condition

df.checkVar.value_counts()
301    62
1      15
2       5
999     3
dtype: int64   

ideally it would look something like this:

            checkVar  newVar
NaN  1         nan    
     2         nan
     3         nan
     4         nan
     5       301.0
     6       301.0
     7       301.0
     8       301.0
     9       301.0
     10      301.0
     11      301.0
     12      301.0
     13      301.0
     14        1.0     1
     15        1.0     1

UPDATE
My final solution, since actual problem was more complex

w = df.EligibilityStatusSP3.dropna().astype(str).str[0].astype(int)
v = df.EligibilityStatusSP2.dropna().astype(str).str[0].astype(int)
u = df.EligibilityStatusSP1.dropna().astype(str).str[0].astype(int)
t = df.EligibilityStatus.dropna().astype(str).str[0].astype(int) #get a series of the first digits of non-nan numbers
df['MCelig'] = ((t < 5)|(t == 9)|(u < 5)|(v < 5)|(w < 5)).astype(int)
df.MCelig = df.MCelig.fillna(0)
like image 488
Chet Meinzer Avatar asked Dec 30 '25 08:12

Chet Meinzer


1 Answers

t = df.checkVar.dropna().astype(str).str[0].astype(int) #get a series of the first digits of non-nan numbers
df['newVar'] = ((t > 5) | (t < 2)).astype(int)
df.newVar = df.newVar.fillna(0)

this might be slightly better, unsure, but another, very similar way to approach it.

t = df.checkVar.dropna().astype(str).str[0].astype(int)
df['newVar'] = 0
df.newVar.update(((t > 5) | (t < 2)).astype(int))
like image 144
acushner Avatar answered Dec 31 '25 23:12

acushner