How do I turn this:
year artist genre genre_sales
1999 A Pop/Rock 10
1999 B Hip/Hop 15
1999 C Country 8
2000 A Pop/Rock 11
2000 B Hip/Hop 14
2000 D Jazz 1
2001 B Hip/Hop 18
2001 C Country 10
Into this:
year artist genre genre_sales
1999 A Pop/Rock Nan
1999 B Hip/Hop Nan
1999 C Country Nan
2000 A Pop/Rock 10
2000 B Hip/Hop 15
2000 D Jazz (1999 Jazz sales despite D not releasing in '99)
2001 B Hip/Hop 14
2001 C Country (2000 country values, not the 8 from '99)
I saw the groupby-shift question and cookbook page and assumed I could do something like:
df.groupby(['year','artist'])['genre_sales'].shift(1)
or
df.groupby(['year','genre','artist'])['genre_sales'].shift(1)
but I think those would only work if artists produced one album per year.
Current 'Genre Sales' is annual sales in the artist's genre for that year. I'm trying to get the PREVIOUS year's 'Genre Sales' in a row with each artist. Each genre will have multiple artists every year, and each artist could have zero or multiple entries in that genre in a given year.
Another way to do it is using the groupby-apply combination
create a function that is applied to each segment and adds a new column that is the target column shifted:
def shiftCol(grp, newCol, col):
grp[newCol] = grp[col].shift()
return grp
Then just call the function, specifying the name of the columns to shift
df.groupby(['artist']).apply(shiftCol, newCol = 'prev_genre_sales',col = 'genre_sales')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With