Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas get the age from a date (example: date of birth)

Tags:

python

pandas

How can I calculate the age of a person (based off the dob column) and add a column to the dataframe with the new value?

dataframe looks like the following:

    lname      fname     dob
0    DOE       LAURIE    03011979
1    BOURNE    JASON     06111978
2    GRINCH    XMAS      12131988
3    DOE       JOHN      11121986

I tried doing the following:

now = datetime.now()
df1['age'] = now - df1['dob']

But, received the following error:

TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'str'

like image 349
Dave Avatar asked Sep 06 '25 03:09

Dave


2 Answers

import datetime as DT
import io
import numpy as np
import pandas as pd

pd.options.mode.chained_assignment = 'warn'

content = '''     ssno        lname         fname    pos_title             ser  gender  dob 
0    23456789    PLILEY     JODY        BUDG ANAL             0560  F      031871 
1    987654321   NOEL       HEATHER     PRTG SRVCS SPECLST    1654  F      120852
2    234567891   SONJU      LAURIE      SUPVY CONTR SPECLST   1102  F      010999
3    345678912   MANNING    CYNTHIA     SOC SCNTST            0101  F      081692
4    456789123   NAUERTZ    ELIZABETH   OFF AUTOMATION ASST   0326  F      031387'''

df = pd.read_csv(io.StringIO(content), sep='\s{2,}')
df['dob'] = df['dob'].apply('{:06}'.format)

now = pd.Timestamp('now')
df['dob'] = pd.to_datetime(df['dob'], format='%m%d%y')    # 1
df['dob'] = df['dob'].where(df['dob'] < now, df['dob'] -  np.timedelta64(100, 'Y'))   # 2
df['age'] = (now - df['dob']).astype('<m8[Y]')    # 3
print(df)

yields

        ssno    lname      fname            pos_title   ser gender  \
0   23456789   PLILEY       JODY            BUDG ANAL   560      F   
1  987654321     NOEL    HEATHER   PRTG SRVCS SPECLST  1654      F   
2  234567891    SONJU     LAURIE  SUPVY CONTR SPECLST  1102      F   
3  345678912  MANNING    CYNTHIA           SOC SCNTST   101      F   
4  456789123  NAUERTZ  ELIZABETH  OFF AUTOMATION ASST   326      F   

                  dob  age  
0 1971-03-18 00:00:00   43  
1 1952-12-08 18:00:00   61  
2 1999-01-09 00:00:00   15  
3 1992-08-16 00:00:00   22  
4 1987-03-13 00:00:00   27  

  1. It looks like your dob column are currently strings. First, convert them to Timestamps using pd.to_datetime.
  2. The format '%m%d%y' converts the last two digits to years, but unfortunately assumes 52 means 2052. Since that's probably not Heather Noel's birthyear, let's subtract 100 years from dob whenever the dob is greater than now. You may want to subtract a few years to now in the condition df['dob'] < now since it may be slightly more likely to have a 101 year old worker than a 1 year old worker...
  3. You can subtractdob from now to obtain timedelta64[ns]. To convert that to years, use astype('<m8[Y]') or astype('timedelta64[Y]').
like image 123
unutbu Avatar answered Sep 07 '25 15:09

unutbu


I found easier solution:

import pandas as pd
from datetime import datetime
from datetime import date

d = {'col0': [1, 2, 6], 
     'col1': [3, 8, 3], 
     'col2': ['17.02.1979', '11.11.1993', '01.08.1961']}

df = pd.DataFrame(data=d)

def calculate_age(born):
    born = datetime.strptime(born, "%d.%m.%Y").date()
    today = date.today()
    return today.year - born.year - ((today.month, today.day) < (born.month, born.day))

df['age'] = df['col6'].apply(calculate_age)
print(df)

output:

     col0  col1  col3        age
0       1     3  17.02.1979   39
1       2     8  11.11.1993   24
2       6     3  01.08.1961   57
like image 21
nnaqa Avatar answered Sep 07 '25 17:09

nnaqa