OLS Regression Model




OLS Regression Model







We will talk about Ordinary Least Square model in Python. In this article, we will just go over how to use OLS in Python without explaining the interpretation of the result.

Here, we will use sklearn and statsmodels packages to perform OLS modeling and compare the differences

In [105]:
from sklearn import linear_model as lm
import statsmodels.api as sm
from data_source_lib import *
from matplotlib import pyplot as pl

Use our data source class to get AAPL and SPY daily stock price

In [121]:
data_source = get_stock_data(tic_list=["AAPL","SPY"],freq = "daily")
data = data_source.get_ondemand_data()

# We can screen each stock by ticker names
AAPL = data[data.Ticker=="AAPL"]
SPY = data[data.Ticker=="SPY"]
Finished SPY
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.11it/s]

First, we will use sklearn package

In [109]:
# define the instance 
reg = lm.LinearRegression()
In [122]:
# Before applying the data, we should 
# turn it into numpy array

AAPL = np.array(AAPL[["close"]]) # we have imported numpy in our data source libary
SPY = np.array(SPY[["close"]])

# The reason I use SPY[["close"]] is to get 2D array
In [123]:
reg.fit(X=AAPL,y=SPY)
Out[123]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [124]:
reg.coef_
Out[124]:
array([[0.39143143]])
In [125]:
reg.intercept_
Out[125]:
array([199.09234191])

However, the sklearn package didn’t offer full statistic information in the OLS model. So we should use statsmodel instead for more information

Second, statsmodel Package

In [133]:
# This is to add a constant into the independent side of the model

AAPL2 = sm.add_constant(AAPL)
In [134]:
model = sm.OLS(SPY,AAPL2)
In [135]:
model = model.fit()
In [136]:
model.summary()
Out[136]:
OLS Regression Results
Dep. Variable: y R-squared: 0.639
Model: OLS Adj. R-squared: 0.636
Method: Least Squares F-statistic: 221.5
Date: Sun, 28 Oct 2018 Prob (F-statistic): 1.89e-29
Time: 23:33:17 Log-Likelihood: -383.96
No. Observations: 127 AIC: 771.9
Df Residuals: 125 BIC: 777.6
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 199.0923 5.344 37.257 0.000 188.516 209.668
x1 0.3914 0.026 14.882 0.000 0.339 0.443
Omnibus: 35.484 Durbin-Watson: 0.103
Prob(Omnibus): 0.000 Jarque-Bera (JB): 59.620
Skew: -1.304 Prob(JB): 1.13e-13
Kurtosis: 5.114 Cond. No. 2.44e+03