1/26
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
statsmodel library
library for estimating statistical models
importing statsmodel
import statsmodel
import statsmodel.formula.api as smf (specifying models from pandas df)
from statsmodels.tsa.ar_model import AutoReg
scikit-learn library
library for machine learning tools
importing scikit-learn
import sklearn
from sklearn.linear_model import LinearRegression
declaring data as time-series
df[‘x’] = pd.to_datetime(df[‘x’])
do this prior to setting date as the index bc python will no longer recognize it as a column
mean reverting
deviations from the avg are expected to reverse, return to avg over time
what do to if time series variable is non mean reverting
work w/ the variable's differences
2 mean reversion tests
ADF
KPSS
ADF test
not robust
suboptimal test
good in periods of calm markets
Ho: has a unit root, Ha: has no unit root
rejecting null provides evidence supporting non-stationarity
robust
produces reliable results even w/ outliers and violated assumptions
KPSS test
robust
optimal
Ho: trend stationary, Ha: has a unit root (not trend stationary)
generating variable differences
∆GT = GT - GT-1
generating variable differences in python
df2 = df.set_index(‘Date’).diff().reset_index()
1st element becomes unavailable bc no differences
df.dropna() for NaNs
descriptive stats on python
df.describe()
df.corr()
correlation
measure of linear dependence
most financial variables are non-linearly dependent
correlation threshold to run a regression
0.5
skew
measure of asymmetry around the mean
x = df[‘x’].skew() (n/a for time series)
kurtosis
measure of tail fatmess
x = df[‘x’].kurt()
testing for mean reversion steps
mean reversion test (KPSS then ADF)
transform variable if needed
check skew/kurtosis
jarque bera testing
import scipy.stats as stats
x = stats.jarque_bera(df[‘x’])
Ho: normally distributed
running OLS in python
x = df[[‘x1’, ‘x2’]]
y = df[‘y’]
x = sm.add_constant(x)
model = sm.OLS(x,y).fit()
print(model.summary())
unit root
a feature of non-stationary time series data
key library for ADF test
from statsmodels.tsa.stattools import adfuller
key library for KPSS test
from statsmodels.tsa.stattools import kpss
key conclusion of ADF and KPSS tests
reject Ho in ADF + fail to reject Ho in KPSS = likely stationary
fail to reject Ho in ADF + reject Ho in KPSS = likely not stationary
ADF test in python (all variable loop)
print(‘ADF Test Results’)
for column in df.columnsL
print(‘\nColumn: ‘ + column)
adftest = adfuller(df[column], autolag=’AIC’)
print(‘ADF Stat: ‘ + str(adftest[0]))
print(‘P-Value : ‘ + str(adftest[1]))
if adftest[1] <= 0.05:
print(‘Reject Ho, the series is stationary)
else:
print(Fail to reject Ho, the series isn’t stationary)
KPSS test in python (all variable loop)
print(‘KPSS Test Results’)
for column in df.columnsL
print(‘\nColumn: ‘ + column)
kpsstest = kpss(df[column], regression=’c’, nlags=’auto’)
print(‘KPSS Stat: ‘ + str(kpsstest[0]))
print(‘P-Value : ‘ + str(kpsstest[1]))
if kpsstest[1] <= 0.05:
print(‘Reject Ho, the series isn’t stationary)
else:
print(Fail to reject Ho, the series is stationary)