I have a directory of CSV data files and I load all of them in one line using pandas.read_csv() within a list comprehension statement.
import glob
import pandas as pd
file_list = glob.glob('../data/')
df_list = [pd.read_csv(f) for f in file_list]
df = pd.concat(df_list, ignore_index=True)
Now I want to print the file path every time when it loads a data file, but I cannot find a way to use multiple statements in list comprehension. For example, something like [pd.read_csv(f); print(f) for f in file_list] will cause a SyntaxError.
The closest thing I can get is to let print() to return None in an if-statement, which works like a pass after printing.
df_list = [pd.read_csv(f) for f in file_list if print(f) is None]
Is there a proper way of doing this? I like list comprehension for its conciseness, but it does not seem to allow multiple statements.
If you want a list comprehension (understandable given the speed improvement over a for loop), you could slightly modify your solution because None is falsy:
df_list = [pd.read_csv(f) for f in file_list if not print(f)]
Alternatively make a function that does the work:
def read_and_print(f):
print(f)
return pd.read_csv(f)
df_list = [read_and_print(f) for f in file_list]
However, the approaches violate the Command–query separation principle that Python generally follows because the function has both a side-effect and a return value of interest. Nonetheless, I think these are quite pragmatic, particularly if you want to print() now to view the data, but later you plan to remove the print() calls.
List comprehension was not designed for that. Rather, just for populating a list looping over some iterable, and (optionally) if a condition is met. Python likes to emphasise readability over lines of code.
The proper way to do what you want is to not use list comprehension at all, rather a for loop:
for f in file_list:
print(f)
df_list.append(pd.read_csv(f))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With