Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function to split DF into multiple DFs and perform all operations on each one

I have a DF with about 20,000 rows in it. I have built a python script to run lots of cleaning and mathematical operations on this data including pivot tables.

I would like to split this DF into 3 separate DFs, and then split each of those 3 into 6 more DFs based on column values and run the operations on each of the 18 resulting DFs. In the end I would like to output 18 separate excel files.

Note: I cannot split the data after running all operations on the original DF because I am creating pivot tables during the process.

I would really like to write a function to do the split in the beginning but do not know how to do this.

Things I have tried: Repeating operations for all DFs.

| total |  big  |  med  | small|   Type   |   Name   |
|:-----:|:-----:|:-----:|:----:|:--------:|:--------:| 
|   5   |   4   |   0   |   1  |   Pig    |   John   |
|   6   |   0   |   3   |   3  |  Horse   |   Mike   | 
|   5   |   2   |   3   |   0  |   Cow    |   Rick   |
|   5   |   2   |   3   |   0  |   Horse  |   Paul   |
|   5   |   2   |   3   |   0  |   Cow    |   Nick   |
|   5   |   2   |   3   |   0  |   Cow    |   Peter  |

So I would like to split by 'Type' and 'Name'. After splitting i would like to run operations on all data frames - for sake of example lets say 'small' * 3. After running operations on all of these dfs I would like to export them all. I would really like to not use imbedded for loops because in reality there is about 100 lines of operations being done and i do not want everything indented etc.

3 different 'Types', 6 different 'Names'

FYI df combos = Pig/John, Pig/Mike, Pig/Rick, Horse/John....etc

EDIT:

def main():
    
    for idx, dg in df.groupby(['Type', 'Name']):
        dg = func_1()   << function that loads entire file as df
        dg = func_2(dg)
        dg = func_3(dg)
        dg = func_4(dg)
        df = fun_5(dg)

Im having trouble making this work. Any thoughts?

like image 233
big_soapy Avatar asked Dec 06 '25 08:12

big_soapy


1 Answers

The mantra of DataFrame.groupby is "split-apply-combine". In this case, the last part is undesirable and you want something like "split-apply-export" so we can manually iterate over the groups.

#SPLIT
for idx, gp in df.groupby(['Type', 'Name']):
    # `idx` is a tuple of unique (Type, Name) combinations, i.e. ('Pig', 'John')
    # `gp` is the susbet of the DataFrame equivalent to: 
    #     df[df['Type'].eq(idx[0]) & df['Name'].eq(idx[1])]

    # APPLY whatever complicated operation(s)
    gp['small'] = gp['small']*3

    # EXPORT
    # Creates files 'Cow_Nick.csv', 'Cow_Peter.csv', 'Cow_Rick.csv', ...
    gp.to_csv(f"{'_'.join(idx)}.csv")

Output: 'Horse_Mike.csv'

'small' was multiplied by 3, original index still preserved.

,total,big,med,small,Type,Name
1,6,0,3,9,Horse,Mike
like image 195
ALollz Avatar answered Dec 07 '25 20:12

ALollz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!