I want to replicate rows of a dataframe as to prepare for the adding of a column. The dataframe contains years column and I want to add a fixed column of months. The idea is to replicate each same year rows exactly 12 times then add a fixed value column (1-12). my code is the following:
all_years = dataframe["Year"].unique().tolist()
new_dataset = pd.DataFrame()
for idx, year in enumerate(all_years):
rows_dataframe = pd.concat(
[dataframe.where(dataframe["Year"] == year).dropna()] * 12,
ignore_index=True)
new_dataset = pd.concat([rows_dataframe, new_dataset], ignore_index=True)
The results are correct, but can I avoid the for loop here, and implement this in a more "pandas-ic" way?
EDIT: expected results for one value of years (here 2012) is: (to note that months column is not added through my code, but added it to show the final output)
+-------+--------+---------+
| Years | Months | SomeCol |
+-------+--------+---------+
| 2011 | 12 | val1 |
+-------+--------+---------+
| 2012 | 1 | val1 |
+-------+--------+---------+
| 2012 | 2 | val1 |
+-------+--------+---------+
| 2012 | 3 | val1 |
+-------+--------+---------+
| 2012 | 4 | val1 |
+-------+--------+---------+
| 2012 | 5 | ... |
+-------+--------+---------+
| 2012 | 6 | ... |
+-------+--------+---------+
| 2012 | 7 | val1 |
+-------+--------+---------+
| 2012 | 8 | val1 |
+-------+--------+---------+
| 2012 | 9 | val1 |
+-------+--------+---------+
| 2012 | 10 | |
+-------+--------+---------+
| 2012 | 11 | |
+-------+--------+---------+
| 2012 | 12 | |
+-------+--------+---------+
| 2013 | 1 | ... |
+-------+--------+---------+
Use a combination of pd.DataFrame.loc and pd.Index.repeat:
dataframe = dataframe.loc[dataframe.index.repeat(12)].reset_index(drop=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With