We will talk about Ordinary Least Square model in Python. In this article, we will just go over how to use OLS in Python without explaining the interpretation of the result.
Here, we will use sklearn and statsmodels packages to perform OLS modeling and compare the differences
In [105]:
from sklearn import linear_model as lm
import statsmodels.api as sm
from data_source_lib import *
from matplotlib import pyplot as pl
Use our data source class to get AAPL and SPY daily stock price
In [121]:
data_source = get_stock_data(tic_list=["AAPL","SPY"],freq = "daily")
data = data_source.get_ondemand_data()
# We can screen each stock by ticker names
AAPL = data[data.Ticker=="AAPL"]
SPY = data[data.Ticker=="SPY"]
First, we will use sklearn package¶
In [109]:
# define the instance
reg = lm.LinearRegression()
In [122]:
# Before applying the data, we should
# turn it into numpy array
AAPL = np.array(AAPL[["close"]]) # we have imported numpy in our data source libary
SPY = np.array(SPY[["close"]])
# The reason I use SPY[["close"]] is to get 2D array
In [123]:
reg.fit(X=AAPL,y=SPY)
Out[123]:
In [124]:
reg.coef_
Out[124]:
In [125]:
reg.intercept_
Out[125]:
However, the sklearn package didn’t offer full statistic information in the OLS model. So we should use statsmodel instead for more information
Second, statsmodel Package¶
In [133]:
# This is to add a constant into the independent side of the model
AAPL2 = sm.add_constant(AAPL)
In [134]:
model = sm.OLS(SPY,AAPL2)
In [135]:
model = model.fit()
In [136]:
model.summary()
Out[136]: