Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating new row for each year in a date range in Python?

I have a dataframe where each row has a range of years. This is the code to build it.

original = pd.DataFrame({'City': ['Paris','Rome','New York', 'Tokyo'], 'Color': ['red', 'orange', 'blue', 'purple'], 'Years': ['2010-2012', '2019-2020', '2015-2018', '2002-2003']})

The table looks something like this.

City    Color   Years
Paris     red       2010-2012
Rome      orange    2019-2020
New York  blue      2015-2018
Tokyo     purple    2002-2003

I want to create a new row for each year in the range of 'Years'. The dataframe should look like this.

City    Color   Years
Paris     red       2010
Paris     red       2011
...
New York  blue      2018
Tokyo     purple    2002
Tokyo     purple    2003

This is the code I'm using right now. I'm trying to add a new row for each year, but it only returns an empty dataframe, and I'm not sure why.

df_empty = pd.DataFrame({'City': [], 'Color': [], 'Years': []})

for index, row in original.iterrows():
    dates = [int(s) for s in row['Years'].split("-") if s.isdigit()]
    for i in range(dates[0],dates[1] + 1):
        newrow = row
        newrow.append(pd.Series([str(i)]))
        df_empty.add(newrow)
like image 955
Aubrey Avatar asked Aug 31 '25 01:08

Aubrey


1 Answers

Idea is Series.str.split columns to new DataFrame, so possible repeat index values by difference by years. Also is used GroupBy.cumcount for add all ranges of years per index values:

df = original['Years'].str.split('-', expand=True).astype(int)
original['Years'] = df[0]
df = original.loc[original.index.repeat(df[1] - df[0] + 1)]
df['Years'] += df.groupby(level=0).cumcount()
df = df.reset_index(drop=True)
print (df)
        City   Color  Years
0      Paris     red   2010
1      Paris     red   2011
2      Paris     red   2012
3       Rome  orange   2019
4       Rome  orange   2020
5   New York    blue   2015
6   New York    blue   2016
7   New York    blue   2017
8   New York    blue   2018
9      Tokyo  purple   2002
10     Tokyo  purple   2003

Another solution with DataFrame.explode and list comprehension for ranges first 4 and last 4 characters for years:

original['Years'] = [[y for y in range(int(x[:4]), int(x[-4:]) + 1)] 
                        for x in original['Years']]

original = original.explode('Years').reset_index(drop=True)
print (original)
        City   Color Years
0      Paris     red  2010
1      Paris     red  2011
2      Paris     red  2012
3       Rome  orange  2019
4       Rome  orange  2020
5   New York    blue  2015
6   New York    blue  2016
7   New York    blue  2017
8   New York    blue  2018
9      Tokyo  purple  2002
10     Tokyo  purple  2003
like image 95
jezrael Avatar answered Sep 02 '25 13:09

jezrael