Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Melt function for time series data

I am trying to melt my pandas data frame but I am not quiet sure how to assign the variables properly. I looked through the other examples on stack but I can't seem to find a variation matching this. My data frame (df1) looks like this :

[IN]: df1
[OUT]:
             40025.0    21201.0       30061.0   46021.0
date                
2020-08-08  0.000861    0.001292    0.000287    0.001177
2020-08-09  0.001147    0.001290    0.000344    0.001204
2020-08-10  0.001431    0.001288    0.000401    0.001231

Each column is for a different FIPS code, the values are the number of Covid cases per day (this data has been processed for future clustering) and index is a datetime index (day). The data frame is 804 columns by 470 rows. I would like my data frame to look like this:

enter image description here

I know I can make this work if I leave "date" as a column (as opposed to the index) by doing this:

df1 =df1.melt(id_vars="date", var_name="FIPS", value_name="Covid_cases")

But if I do that, then I get an error when trying to convert the "date" column as the index. I need it the index to be a datetime index because I am going to kmeans cluster the time series data and then plot time series clusters. Any input would be greatly appreciated! Thank you!

like image 701
Rachel Cyr Avatar asked Sep 06 '25 03:09

Rachel Cyr


2 Answers

If date is currently the index, you should be able to reset_index() and then set_index('date') afterwards:

df1 = (df1
    .reset_index()
    .melt(id_vars='date', var_name='FIPS', value_name='Covid_cases')
    .set_index('date')
)
               FIPS  Covid_cases
date                            
2020-08-08  40025.0     0.000861
2020-08-09  40025.0     0.001147
2020-08-10  40025.0     0.001431
2020-08-08  21201.0     0.001292
2020-08-09  21201.0     0.001290
2020-08-10  21201.0     0.001288
2020-08-08  30061.0     0.000287
2020-08-09  30061.0     0.000344
2020-08-10  30061.0     0.000401
2020-08-08  46021.0     0.001177
2020-08-09  46021.0     0.001204
2020-08-10  46021.0     0.001231
like image 86
tdy Avatar answered Sep 07 '25 17:09

tdy


or you can do this via stack.

df = (
    df.stack()
    .reset_index()
    .rename(columns={'level_1': 'FIPS', 0: 'Covid_cases'})
    .set_index('date')
)

like image 31
Nk03 Avatar answered Sep 07 '25 16:09

Nk03