I have a bit of code which currently looks like this:
if os.path.isfile('D:\\df_1'):
df_1 = pd.read_pickle('D:\\df_1')
else:
df_1 = pd.DataFrame(columns = ['Date', 'Location', 'Product'])
if os.path.isfile('D:\\df_2'):
df_2 = pd.read_pickle('D:\\df_2')
else:
df_2 = pd.DataFrame(columns = ['Date', 'Location', 'Product'])
[...]
if os.path.isfile('D:\\df_20'):
df_20 = pd.read_pickle('D:\\df_20')
else:
df_20 = pd.DataFrame(columns = ['Date', 'Location', 'Product'])
Basically what I'm doing is checking if the Dataframe already exists, if it does load it, otherwise create an empty dataframe. I need this because then the code will try to append new data to each of the dataframe. So I will have something like:
[retrieve new data and clean it]
df_1 = pd.concat([df_1, df_1_new_data])
Do this for all the 20 dataframes I have (they contain different things, so I want to keep them separate), and then save them in order to retrieve them again the day after and add new data to them:
df_1.to_pickle('D:\\df_1')
df_2.to_pickle('D:\\df_2')
[...]
df_20.to_pickle('D:\\df_20')
Now, it's already quite heavy to do it with 20 dataframes, but I will probably need to add some more! Is there a way to read the different dataframes, and then write them to pickle in a for loop or something like this? So to reduce the lines of code for the many I have now to a simple 2 lines for loop? Thank you!
DRY : you shouldn't write same stuff many times (more than once really).
Use functions, loops, other basic language tools.
def create_df(path):
if os.path.isfile(path):
df = pd.read_pickle(path)
else:
df = pd.DataFrame(columns = ['Date', 'Location', 'Product'])
return df
all_paths = (...)
# dict where key is you path and value is dataframe
all_df = {p: create_df(p) for p in all_paths}
for p in all_paths:
all_df[p].to_pickle(p)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With