Problem
I need to test the first digit of each number in a column for conditions.
Conditions
is the first digit of checkVar greater than 5
or
is the first digit of checkVar less than 2
then set newVar=1
Solution
One thought that I had was to convert to it a string, left strip the space, and then take [0], but i can't figure out the code.
perhaps something like,
df.ix[df.checkVar.str[0:1].str.contains('1'),'newVar']=1
It isn't what I want, and for some reason i get this error
invalid index to scalar variable.
testing my original variable i get values that should meet the condition
df.checkVar.value_counts()
301 62
1 15
2 5
999 3
dtype: int64
ideally it would look something like this:
checkVar newVar
NaN 1 nan
2 nan
3 nan
4 nan
5 301.0
6 301.0
7 301.0
8 301.0
9 301.0
10 301.0
11 301.0
12 301.0
13 301.0
14 1.0 1
15 1.0 1
UPDATE
My final solution, since actual problem was more complex
w = df.EligibilityStatusSP3.dropna().astype(str).str[0].astype(int)
v = df.EligibilityStatusSP2.dropna().astype(str).str[0].astype(int)
u = df.EligibilityStatusSP1.dropna().astype(str).str[0].astype(int)
t = df.EligibilityStatus.dropna().astype(str).str[0].astype(int) #get a series of the first digits of non-nan numbers
df['MCelig'] = ((t < 5)|(t == 9)|(u < 5)|(v < 5)|(w < 5)).astype(int)
df.MCelig = df.MCelig.fillna(0)
t = df.checkVar.dropna().astype(str).str[0].astype(int) #get a series of the first digits of non-nan numbers
df['newVar'] = ((t > 5) | (t < 2)).astype(int)
df.newVar = df.newVar.fillna(0)
this might be slightly better, unsure, but another, very similar way to approach it.
t = df.checkVar.dropna().astype(str).str[0].astype(int)
df['newVar'] = 0
df.newVar.update(((t > 5) | (t < 2)).astype(int))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With