I have a DF with about 20,000 rows in it. I have built a python script to run lots of cleaning and mathematical operations on this data including pivot tables.
I would like to split this DF into 3 separate DFs, and then split each of those 3 into 6 more DFs based on column values and run the operations on each of the 18 resulting DFs. In the end I would like to output 18 separate excel files.
Note: I cannot split the data after running all operations on the original DF because I am creating pivot tables during the process.
I would really like to write a function to do the split in the beginning but do not know how to do this.
Things I have tried: Repeating operations for all DFs.
| total | big | med | small| Type | Name |
|:-----:|:-----:|:-----:|:----:|:--------:|:--------:|
| 5 | 4 | 0 | 1 | Pig | John |
| 6 | 0 | 3 | 3 | Horse | Mike |
| 5 | 2 | 3 | 0 | Cow | Rick |
| 5 | 2 | 3 | 0 | Horse | Paul |
| 5 | 2 | 3 | 0 | Cow | Nick |
| 5 | 2 | 3 | 0 | Cow | Peter |
So I would like to split by 'Type' and 'Name'. After splitting i would like to run operations on all data frames - for sake of example lets say 'small' * 3. After running operations on all of these dfs I would like to export them all. I would really like to not use imbedded for loops because in reality there is about 100 lines of operations being done and i do not want everything indented etc.
3 different 'Types', 6 different 'Names'
FYI df combos = Pig/John, Pig/Mike, Pig/Rick, Horse/John....etc
EDIT:
def main():
for idx, dg in df.groupby(['Type', 'Name']):
dg = func_1() << function that loads entire file as df
dg = func_2(dg)
dg = func_3(dg)
dg = func_4(dg)
df = fun_5(dg)
Im having trouble making this work. Any thoughts?
The mantra of DataFrame.groupby is "split-apply-combine". In this case, the last part is undesirable and you want something like "split-apply-export" so we can manually iterate over the groups.
#SPLIT
for idx, gp in df.groupby(['Type', 'Name']):
# `idx` is a tuple of unique (Type, Name) combinations, i.e. ('Pig', 'John')
# `gp` is the susbet of the DataFrame equivalent to:
# df[df['Type'].eq(idx[0]) & df['Name'].eq(idx[1])]
# APPLY whatever complicated operation(s)
gp['small'] = gp['small']*3
# EXPORT
# Creates files 'Cow_Nick.csv', 'Cow_Peter.csv', 'Cow_Rick.csv', ...
gp.to_csv(f"{'_'.join(idx)}.csv")
Output: 'Horse_Mike.csv'
'small' was multiplied by 3, original index still preserved.
,total,big,med,small,Type,Name
1,6,0,3,9,Horse,Mike
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With