I am using the arma_order_select_ic from the statsmodel library to calculate the (p,q) order for the ARMA model, I am using for loop to loop over the different companies that are in each column of the data-frame. The code is as follows:
import pandas as pd
from statsmodels.tsa.stattools import arma_order_select_ic
df = pd.read_csv("Adjusted_Log_Returns.csv", index_col = 'Date').dropna()
main_df = pd.DataFrame()
for i in range(146):
order_selection = arma_order_select_ic(df.iloc[i].values, max_ar = 4,
max_ma = 2, ic = "aic")
ticker = [df.columns[i]]
df_aic_min = pd.DataFrame([order_selection["aic_min_order"]], index =
ticker)
main_df = main_df.append(df_aic_min)
main_df.to_csv("aic_min_orders.csv")
The code runs fine and I get all the results in the csv file at the end but the thing thats confusing me is that when I compute the (p,q) outside the for loop for a single company then I get different results
order_selection = arma_order_select_ic(df["ABL"].values, max_ar = 4,
max_ma = 2, ic = "aic")
The order for the company ABL is (1,1) when computed in the for loop while its (4,1) when computed outside of it.
So my question is what am I doing wrong or why is it like this? Any help would be appreciated.
Thanks in Advance
It's pretty clear from your code that you're trying to find the parameters for an ARMA model on the columns' data, but it's not what the code is doing: you're finding in the loop the parameters for the rows.
Consider this:
import pandas as pd
df = pd.DataFrame({'a': [3, 4]})
>>> df.iloc[0]
a 3
Name: 0, dtype: int64
>>> df['a']
0 3
1 4
Name: a, dtype: int64
You should probably change your code to
for c in df.columns:
order_selection = arma_order_select_ic(df[c].values, max_ar = 4,
max_ma = 2, ic = "aic")
ticker = [c]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With