OLS Regression Model

We will talk about Ordinary Least Square model in Python. In this article, we will just go over how to use OLS in Python without explaining the interpretation of the result.

Here, we will use sklearn and statsmodels packages to perform OLS modeling and compare the differences

from sklearn import linear_model as lm
import statsmodels.api as sm
from data_source_lib import *
from matplotlib import pyplot as pl


Use our data source class to get AAPL and SPY daily stock price

data_source = get_stock_data(tic_list=["AAPL","SPY"],freq = "daily")
data = data_source.get_ondemand_data()

# We can screen each stock by ticker names
AAPL = data[data.Ticker=="AAPL"]
SPY = data[data.Ticker=="SPY"]

Finished SPY

### First, we will use sklearn package¶

# define the instance
reg = lm.LinearRegression()

# Before applying the data, we should
# turn it into numpy array

AAPL = np.array(AAPL[["close"]]) # we have imported numpy in our data source libary
SPY = np.array(SPY[["close"]])

# The reason I use SPY[["close"]] is to get 2D array

reg.fit(X=AAPL,y=SPY)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
reg.coef_

array([[0.39143143]])
reg.intercept_

array([199.09234191])

However, the sklearn package didn’t offer full statistic information in the OLS model. So we should use statsmodel instead for more information

### Second, statsmodel Package¶

# This is to add a constant into the independent side of the model


model = sm.OLS(SPY,AAPL2)

model = model.fit()

model.summary()

Dep. Variable: R-squared: y 0.639 OLS 0.636 Least Squares 221.5 Sun, 28 Oct 2018 1.89e-29 23:33:17 -383.96 127 771.9 125 777.6 1 nonrobust
coef std err t P>|t| [0.025 0.975] 199.0923 5.344 37.257 0.000 188.516 209.668 0.3914 0.026 14.882 0.000 0.339 0.443
 Omnibus: Durbin-Watson: 35.484 0.103 0 59.62 -1.304 1.13e-13 5.114 2440