I want to take the log of each cell in a very sparse pandas DataFrame and must avoid the 0s. At first I was checking for 0s with a lambda function, then I thought it might be faster to replace the many 0s with NaNs. I got some inspiration from this closely related question, and tried using a "mask." Is there a better way?
# first approach
# 7.61 s ± 1.46 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
def get_log_1(df):
return df.applymap(
lambda x: math.log(x) if x != 0 else 0)
# second approach (faster!)
# 5.36 s ± 968 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
def get_log_2(df):
return (df
.replace(0, np.nan)
.applymap(math.log)
.replace(np.nan, 0))
# third apprach (even faster!!)
# 4.76 s ± 941 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
def get_log_3(df):
return (df
.mask(df <= 0)
.applymap(math.log)
.fillna(0))
One possible solution is use numpy.log
:
print (np.log(df.mask(df <=0)).fillna(0))
Or pure numpy
:
df1= pd.DataFrame(np.ma.log(df.values).filled(0), index=df.index, columns=df.columns)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With