related to why should I make a copy of a data frame in pandas
I noticed that in the popular backtesting library,
def __init__(self, data: pd.DataFrame)
    data = data.copy(False)
in row 631. What's the purpose of such a copy?
A shallow copy allows you
In backtesting the developer tries to change the index to datetime format (line 640) and adds a new column 'Volume' with np.nan values if it's not already in dataframe. And those changes won't reflect on the original dataframe.
Example
>>> a = pd.DataFrame([[1, 'a'], [2, 'b']], columns=['i', 's'])
>>> b = a.copy(False)
>>> a
    i  s
 0  1  a
 1  2  b
>>> b
    i  s
 0  1  a
 1  2  b
>>> b.index = pd.to_datetime(b.index)
>>> b['volume'] = 0
>>> b
                               i  s  volume
1970-01-01 00:00:00.000000000  1  a       0
1970-01-01 00:00:00.000000001  2  b       0
>>> a
    i  s
 0  1  a
 1  2  b
Of course, if you won't create a shallow copy, those changes to dataframe structure will reflect in the original one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With