Shiny APP

This is a Shiny web application for data visualization. Click for app view: https://kenpyfin.shinyapps.io/ShinyHW/

Untitled

Run Regression in Python with Statsmodel Package

Run Regression
In [9]:
from statsmodels import api as sm
from my_libs import *

Regress the SPY and VIX index

  • Need to translate the result into np.array
  • Need to change type to float
In [51]:
spy = get_price_data(["SPY"],method='day',back_day=20).dropna().Return.values.astype(float)
spy_ = spy*30
All price data of Close is actually Adj Close
Connection Successful
Finished SPY

Constructed a model of vix = intercept + b0 * spy + b1 * spy * 30

In [52]:
ip = pd.DataFrame({"spy":spy,"spy_":spy_})
dp = get_price_data(["^VIX"],method='day',back_day=20).dropna().Return.values.astype(float)
All price data of Close is actually Adj Close
Connection Successful
no data for ^VIX
'NoneType' object has no attribute 'index'
switching to realtimeday method
Finished ^VIX
In [53]:
ip = sm.add_constant(ip)
/home/ken/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2389: FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
  return ptp(axis=axis, out=out, **kwargs)
In [54]:
sm.OLS(dp,ip).fit().summary()
/home/ken/.local/lib/python2.7/site-packages/scipy/stats/stats.py:1416: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=13
  "anyway, n=%i" % int(n))
Out[54]:
OLS Regression Results
Dep. Variable: y R-squared: 0.737
Model: OLS Adj. R-squared: 0.713
Method: Least Squares F-statistic: 30.80
Date: Sun, 04 Aug 2019 Prob (F-statistic): 0.000173
Time: 19:40:53 Log-Likelihood: 25.241
No. Observations: 13 AIC: -46.48
Df Residuals: 11 BIC: -45.35
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 0.0033 0.011 0.305 0.766 -0.021 0.027
spy -0.0109 0.002 -5.550 0.000 -0.015 -0.007
spy_ -0.3256 0.059 -5.550 0.000 -0.455 -0.196
Omnibus: 9.222 Durbin-Watson: 1.071
Prob(Omnibus): 0.010 Jarque-Bera (JB): 4.912
Skew: -1.262 Prob(JB): 0.0858
Kurtosis: 4.641 Cond. No. 5.85e+17


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 3.82e-35. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

OLS Regression Model




OLS Regression Model







We will talk about Ordinary Least Square model in Python. In this article, we will just go over how to use OLS in Python without explaining the interpretation of the result.

Here, we will use sklearn and statsmodels packages to perform OLS modeling and compare the differences

In [105]:
from sklearn import linear_model as lm
import statsmodels.api as sm
from data_source_lib import *
from matplotlib import pyplot as pl

Use our data source class to get AAPL and SPY daily stock price

In [121]:
data_source = get_stock_data(tic_list=["AAPL","SPY"],freq = "daily")
data = data_source.get_ondemand_data()

# We can screen each stock by ticker names
AAPL = data[data.Ticker=="AAPL"]
SPY = data[data.Ticker=="SPY"]
Finished SPY
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.11it/s]

First, we will use sklearn package

In [109]:
# define the instance 
reg = lm.LinearRegression()
In [122]:
# Before applying the data, we should 
# turn it into numpy array

AAPL = np.array(AAPL[["close"]]) # we have imported numpy in our data source libary
SPY = np.array(SPY[["close"]])

# The reason I use SPY[["close"]] is to get 2D array
In [123]:
reg.fit(X=AAPL,y=SPY)
Out[123]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [124]:
reg.coef_
Out[124]:
array([[0.39143143]])
In [125]:
reg.intercept_
Out[125]:
array([199.09234191])

However, the sklearn package didn’t offer full statistic information in the OLS model. So we should use statsmodel instead for more information

Second, statsmodel Package

In [133]:
# This is to add a constant into the independent side of the model

AAPL2 = sm.add_constant(AAPL)
In [134]:
model = sm.OLS(SPY,AAPL2)
In [135]:
model = model.fit()
In [136]:
model.summary()
Out[136]:
OLS Regression Results
Dep. Variable: y R-squared: 0.639
Model: OLS Adj. R-squared: 0.636
Method: Least Squares F-statistic: 221.5
Date: Sun, 28 Oct 2018 Prob (F-statistic): 1.89e-29
Time: 23:33:17 Log-Likelihood: -383.96
No. Observations: 127 AIC: 771.9
Df Residuals: 125 BIC: 777.6
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 199.0923 5.344 37.257 0.000 188.516 209.668
x1 0.3914 0.026 14.882 0.000 0.339 0.443
Omnibus: 35.484 Durbin-Watson: 0.103
Prob(Omnibus): 0.000 Jarque-Bera (JB): 59.620
Skew: -1.304 Prob(JB): 1.13e-13
Kurtosis: 5.114 Cond. No. 2.44e+03

Graphing/Plotting




There are a lot to say for graphing, or plotting. The package we used in Python is matplotlib, versus ggplot in R programming. In order to illustrate plotting, I also import numpy here to create some sample dataset.

In [4]:
from matplotlib import pylab as plt
import numpy as np

You can simply create plots like this:

In [28]:
x =np.linspace(-np.pi, np.pi, 255,endpoint=True)
# numpy's linespace function is pretty good at creating a x axis

y = np.sin(x)
plt.plot(x,y)
plt.show()
In [32]:
# Of course you can overlap plots

x =np.linspace(-np.pi, np.pi, 255,endpoint=True)
y = np.sin(x)
z = 2*x
plt.plot(x,y)
plt.plot(x,z)
plt.show()

Just need to make sure two plots run at the same time and they will be overlapped. At least one axis is the same among plots otherwise it will return an error.

If you want to customize the output, you need to do more.

Creating Subplot

You can have more than one plot in one canvas. The way to control it is to use subplot() method

The syntax for subplot is plt.subplot(No.row No.Col No.)

The top left plot of a 2×2 plot is plt.subplot(221)
The plot on its right is plt.subplot(222)

In [40]:
my_plot = plt.subplot(221) 
my_plot.plot(x,y)
# usually we store it into a variale for further formatting

my_plot2=plt.subplot(222) 
my_plot2.plot(x,z)

plt.show()

Setting The Plot Space

In [42]:
# Set the canvas
# The value in figsize is how many increments
plt.figure(figsize=(8,5), dpi=80)

my_plot = plt.subplot(111)
my_plot.plot(x,z)

# You can also set how the plot is being framed

my_plot.spines['right'].set_color('none')
my_plot.spines['top'].set_color('none')
my_plot.xaxis.set_ticks_position('bottom')
my_plot.yaxis.set_ticks_position('left')

# This property sets how the graph looks like
my_plot.spines['left'].set_position(('axes',0))
my_plot.spines['bottom'].set_position(("axes",0))

##### What if we change "axes" to" data"? 

plt.show()

And More

In [40]:
# Set the canvas
# The value in figsize is how many increments
plt.figure(figsize=(8,5), dpi=80)

my_plot = plt.subplot(111)

# You can also set color, line width, style and label
my_plot.plot(x,y,color="red", linewidth=1.5, linestyle="-", label="cosine")

# You can also set how the plot is being framed

my_plot.spines['right'].set_color('none')
my_plot.spines['top'].set_color('none')
my_plot.xaxis.set_ticks_position('bottom')
my_plot.yaxis.set_ticks_position('left')

# This property sets how the graph looks like
my_plot.spines['left'].set_position(('data',0))
my_plot.spines['bottom'].set_position(("data",0))


# I can also manipulate the axises 
plt.xlim(x.min()*1.1, x.max()*1.1) # set limits of current axis
plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],
           [r'$-\pi

#39;, r‘$-\pi/2


#39;, r‘$0


#39;, r‘$+\pi/2


#39;, r‘$+\pi


#39;])
plt.ylim(y.min()*1.1,y.max()*1.1)
plt.yticks(range(10,10,1)
)
# annotate a specific point
plt.annotate(r‘$\sin(\frac{\pi}{2})=1


#39;,
xy=(np.pi/2,1), xycoords=‘data’,
xytext=(60, 40), textcoords=‘offset points’, fontsize=16,
arrowprops=dict(arrowstyle=“->”, connectionstyle=“arc3,rad=.2”))

plt.show()

Scatter Plot and More

In [44]:
import time
np.random.seed(int(time.time()))
#trial = [i for i in np.random.rand(100) ]
trial = np.array(np.random.rand(100))
y = trial *2
plt.scatter(trial,y)

# we can save the picture file
plt.savefig("test.png",dpi=72)