How can I calculate the age of a person (based off the dob column) and add a column to the dataframe with the new value?
dataframe looks like the following:
lname fname dob
0 DOE LAURIE 03011979
1 BOURNE JASON 06111978
2 GRINCH XMAS 12131988
3 DOE JOHN 11121986
I tried doing the following:
now = datetime.now()
df1['age'] = now - df1['dob']
But, received the following error:
TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'str'
import datetime as DT
import io
import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = 'warn'
content = ''' ssno lname fname pos_title ser gender dob
0 23456789 PLILEY JODY BUDG ANAL 0560 F 031871
1 987654321 NOEL HEATHER PRTG SRVCS SPECLST 1654 F 120852
2 234567891 SONJU LAURIE SUPVY CONTR SPECLST 1102 F 010999
3 345678912 MANNING CYNTHIA SOC SCNTST 0101 F 081692
4 456789123 NAUERTZ ELIZABETH OFF AUTOMATION ASST 0326 F 031387'''
df = pd.read_csv(io.StringIO(content), sep='\s{2,}')
df['dob'] = df['dob'].apply('{:06}'.format)
now = pd.Timestamp('now')
df['dob'] = pd.to_datetime(df['dob'], format='%m%d%y') # 1
df['dob'] = df['dob'].where(df['dob'] < now, df['dob'] - np.timedelta64(100, 'Y')) # 2
df['age'] = (now - df['dob']).astype('<m8[Y]') # 3
print(df)
yields
ssno lname fname pos_title ser gender \
0 23456789 PLILEY JODY BUDG ANAL 560 F
1 987654321 NOEL HEATHER PRTG SRVCS SPECLST 1654 F
2 234567891 SONJU LAURIE SUPVY CONTR SPECLST 1102 F
3 345678912 MANNING CYNTHIA SOC SCNTST 101 F
4 456789123 NAUERTZ ELIZABETH OFF AUTOMATION ASST 326 F
dob age
0 1971-03-18 00:00:00 43
1 1952-12-08 18:00:00 61
2 1999-01-09 00:00:00 15
3 1992-08-16 00:00:00 22
4 1987-03-13 00:00:00 27
dob
column are currently strings. First,
convert them to Timestamps
using pd.to_datetime
.'%m%d%y'
converts the last two digits to years, but
unfortunately assumes 52
means 2052. Since that's probably not
Heather Noel's birthyear, let's subtract 100 years from dob
whenever the dob
is greater than now
. You may want to subtract a few years to now
in the condition df['dob'] < now
since it may be slightly more likely to have a 101 year old worker than a 1 year old worker...dob
from now
to obtain timedelta64[ns]. To
convert that to years, use astype('<m8[Y]')
or astype('timedelta64[Y]')
.I found easier solution:
import pandas as pd
from datetime import datetime
from datetime import date
d = {'col0': [1, 2, 6],
'col1': [3, 8, 3],
'col2': ['17.02.1979', '11.11.1993', '01.08.1961']}
df = pd.DataFrame(data=d)
def calculate_age(born):
born = datetime.strptime(born, "%d.%m.%Y").date()
today = date.today()
return today.year - born.year - ((today.month, today.day) < (born.month, born.day))
df['age'] = df['col6'].apply(calculate_age)
print(df)
output:
col0 col1 col3 age
0 1 3 17.02.1979 39
1 2 8 11.11.1993 24
2 6 3 01.08.1961 57
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With