Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert a pandas dataframe column of unique rows into separate column headings, count, and sum the adjacent row values?

I have dynamic item names, so I'd like the code to:

  • Determine all of the item names found per user/date
  • Create column headings of the unique items found
  • Count and sum number of items found per user/date
  • Sum Minutes of number of items found per user/date

This is the code route I am currently going down, but I'm not sure if there is a more streamlined way to do it rather than creating an empty dataframe and trying to populate the data into it? Any suggestions are welcome, thank you!

Example df:

Name    Date        Item    Minutes
Dave    10-02-2017  item1   3
Dave    10-02-2017  item2   5
Joe     10-02-2017  item3   2
Dave    10-02-2017  item2   1
Dave    10-02-2017  item2   2
Marcia  10-02-2017  item1   5
Amy     10-02-2017  item2   3

Code:

#find unique values in df column
unique_df = pd.DataFrame(df['Item'].unique())
#number length of unique rows
unique_df_len = len(unique_df)
#create empty dataframe using unique number of items discovered
new_df = pd.DataFrame([(0,)*unique_df_len])
#replace columns headings with unique row value names
new_df.columns = unique_df.iloc[:,0]
#loop through empty dataframe column headings
for column_name in list(new1):
    #loop through df looking for each item name
    for index, row in df.iterrows(): df['Item'] = df.lookup(df.index,df[column_name]) 

This is where I'm stuck.... The second loop above doesn't work.

Desired Output:

Name    Date        item1   item2   item3   total minutes
Dave    10-02-2017  1       3       0       11
Joe     10-02-2017  0       0       1       2
Marcia  10-02-2017  1       0       0       5
Amy     10-02-2017  0       1       0       3
like image 598
Mike Avatar asked Sep 12 '25 22:09

Mike


1 Answers

simple pivot_table

total=df.groupby(['Name','Date']).Minutes.sum()

df=pd.pivot_table(df,index=['Name','Date'],columns='Item',values='Minutes',aggfunc=len,fill_value=0)
Out[1070]: 
    Item               item1  item2  item3
Name   Date                           
Amy    10-02-2017      0      1      0
Dave   10-02-2017      1      3      0
Joe    10-02-2017      0      0      1
Marcia 10-02-2017      1      0      0

df['total minutes']=total

df.reset_index()
Out[1111]: 
Item    Name        Date  item1  item2  item3  total minutes
0        Amy  10-02-2017      0      1      0              3
1       Dave  10-02-2017      1      3      0             11
2        Joe  10-02-2017      0      0      1              2
3     Marcia  10-02-2017      1      0      0              5

Or you can use crosstab get the count

df=pd.crosstab(index=[df['Name'],df['Date']],columns=df['Item'])
df
Out[1093]: 
Item               item1  item2  item3
Name   Date                           
Amy    10-02-2017      0      1      0
Dave   10-02-2017      1      3      0
Joe    10-02-2017      0      0      1
Marcia 10-02-2017      1      0      0
like image 145
BENY Avatar answered Sep 14 '25 17:09

BENY