from statsmodels import api as sm
from my_libs import *

Regress the SPY and VIX index

Need to translate the result into np.array
Need to change type to float

spy = get_price_data(["SPY"],method='day',back_day=20).dropna().Return.values.astype(float)
spy_ = spy*30

All price data of Close is actually Adj Close
Connection Successful
Finished SPY

Constructed a model of vix = intercept + b0 * spy + b1 * spy * 30

ip = pd.DataFrame({"spy":spy,"spy_":spy_})
dp = get_price_data(["^VIX"],method='day',back_day=20).dropna().Return.values.astype(float)

All price data of Close is actually Adj Close
Connection Successful
no data for ^VIX
'NoneType' object has no attribute 'index'
switching to realtimeday method
Finished ^VIX

ip = sm.add_constant(ip)

/home/ken/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2389: FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
  return ptp(axis=axis, out=out, **kwargs)

sm.OLS(dp,ip).fit().summary()

/home/ken/.local/lib/python2.7/site-packages/scipy/stats/stats.py:1416: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=13
  "anyway, n=%i" % int(n))

We will talk about Ordinary Least Square model in Python. In this article, we will just go over how to use OLS in Python without explaining the interpretation of the result.

Here, we will use sklearn and statsmodels packages to perform OLS modeling and compare the differences

from sklearn import linear_model as lm
import statsmodels.api as sm
from data_source_lib import *
from matplotlib import pyplot as pl

Use our data source class to get AAPL and SPY daily stock price

data_source = get_stock_data(tic_list=["AAPL","SPY"],freq = "daily")
data = data_source.get_ondemand_data()

# We can screen each stock by ticker names
AAPL = data[data.Ticker=="AAPL"]
SPY = data[data.Ticker=="SPY"]

Finished SPY

100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.11it/s]

First, we will use sklearn package¶

# define the instance 
reg = lm.LinearRegression()

# Before applying the data, we should 
# turn it into numpy array

AAPL = np.array(AAPL[["close"]]) # we have imported numpy in our data source libary
SPY = np.array(SPY[["close"]])

# The reason I use SPY[["close"]] is to get 2D array

reg.fit(X=AAPL,y=SPY)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

reg.coef_

array([[0.39143143]])

reg.intercept_

array([199.09234191])

However, the sklearn package didn’t offer full statistic information in the OLS model. So we should use statsmodel instead for more information

Second, statsmodel Package¶

# This is to add a constant into the independent side of the model

AAPL2 = sm.add_constant(AAPL)

model = sm.OLS(SPY,AAPL2)

model = model.fit()

model.summary()

There are a lot to say for graphing, or plotting. The package we used in Python is matplotlib, versus ggplot in R programming. In order to illustrate plotting, I also import numpy here to create some sample dataset.¶

from matplotlib import pylab as plt
import numpy as np

You can simply create plots like this:¶

x =np.linspace(-np.pi, np.pi, 255,endpoint=True)
# numpy's linespace function is pretty good at creating a x axis

y = np.sin(x)
plt.plot(x,y)
plt.show()

# Of course you can overlap plots

x =np.linspace(-np.pi, np.pi, 255,endpoint=True)
y = np.sin(x)
z = 2*x
plt.plot(x,y)
plt.plot(x,z)
plt.show()

Just need to make sure two plots run at the same time and they will be overlapped. At least one axis is the same among plots otherwise it will return an error.¶

If you want to customize the output, you need to do more.¶

Creating Subplot¶

You can have more than one plot in one canvas. The way to control it is to use subplot() method¶

The syntax for subplot is plt.subplot(No.row No.Col No.)

The top left plot of a 2×2 plot is plt.subplot(221)
The plot on its right is plt.subplot(222)

my_plot = plt.subplot(221) 
my_plot.plot(x,y)
# usually we store it into a variale for further formatting

my_plot2=plt.subplot(222) 
my_plot2.plot(x,z)

plt.show()

Setting The Plot Space¶

# Set the canvas
# The value in figsize is how many increments
plt.figure(figsize=(8,5), dpi=80)

my_plot = plt.subplot(111)
my_plot.plot(x,z)

# You can also set how the plot is being framed

my_plot.spines['right'].set_color('none')
my_plot.spines['top'].set_color('none')
my_plot.xaxis.set_ticks_position('bottom')
my_plot.yaxis.set_ticks_position('left')

# This property sets how the graph looks like
my_plot.spines['left'].set_position(('axes',0))
my_plot.spines['bottom'].set_position(("axes",0))

##### What if we change "axes" to" data"? 

plt.show()

And More¶

# Set the canvas
# The value in figsize is how many increments
plt.figure(figsize=(8,5), dpi=80)

my_plot = plt.subplot(111)

# You can also set color, line width, style and label
my_plot.plot(x,y,color="red", linewidth=1.5, linestyle="-", label="cosine")

# You can also set how the plot is being framed

my_plot.spines['right'].set_color('none')
my_plot.spines['top'].set_color('none')
my_plot.xaxis.set_ticks_position('bottom')
my_plot.yaxis.set_ticks_position('left')

# This property sets how the graph looks like
my_plot.spines['left'].set_position(('data',0))
my_plot.spines['bottom'].set_position(("data",0))


# I can also manipulate the axises 
plt.xlim(x.min()*1.1, x.max()*1.1) # set limits of current axis
plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],
           [r'$-\pi

Scatter Plot and More¶

import time
np.random.seed(int(time.time()))
#trial = [i for i in np.random.rand(100) ]
trial = np.array(np.random.rand(100))
y = trial *2
plt.scatter(trial,y)

# we can save the picture file
plt.savefig("test.png",dpi=72)

Histogram¶

Let’s get some finance data for this example¶

from data_source_lib import *

# import our magic lab

get_ins = get_stock_data(["AAPL"],freq= "daily",day_range=300)
my_data = get_ins.get_ondemand_data()["close"]

Finished AAPL

100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.07it/s]

plt.figure(figsize=(8,5), dpi=80)
my_hist = plt.hist(my_data)
plt.xticks(range(50,300,10))
plt.show()

import talib as ta
import data_source_lib as da

get_data = da.get_stock_data(["AAPL"],freq = "daily")
price = get_data.get_ondemand_data()

Finished AAPL

100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.56it/s]

 def get_technicals(price) :
    import pandas as pd
    import tqdm
    from IPython.display import clear_output
    if not isinstance(price, pd.DataFrame):
        raise "Please feed a DataFrame object"
    for i in tqdm.tqdm(range(len(set(price.Ticker)))):
       
       
        i = list(set(price.Ticker))
        #print price .loc[price.Ticker==i]

        #price.groupby('Ticker').get_group(list(set(price.Ticker))[i])
        #price.loc[price.Ticker==i,"ADX"]= ta.ADX(price.loc[price.Ticker==i].High.values, price.loc[price.Ticker==i].Low.values, price.loc[price.Ticker==i].Close.values, timeperiod=14)
        price.loc[price.Ticker==i,"ADXR"]= ta.ADXR(price.loc[price.Ticker==i].High.values, price.loc[price.Ticker==i].Low.values,\
                                                   price.loc[price.Ticker==i].Close.values, timeperiod=14)
        price.loc[price.Ticker==i,"APO"]= ta.APO(price.loc[price.Ticker==i].Close.values, fastperiod=12, slowperiod=26, matype=0)
        price.loc[price.Ticker==i,"AROONOSC"]= ta.AROONOSC(price.loc[price.Ticker==i].High.values,price.loc[price.Ticker==i].Close.values, timeperiod=14)
        price.loc[price.Ticker==i,"CCI"]= ta.CCI(price.loc[price.Ticker==i].High.values,price.loc[price.Ticker==i].Low.values,price.loc[price.Ticker==i].Close.values, timeperiod=14)
        price.loc[price.Ticker==i,"MFI"]= ta.MFI(price.loc[price.Ticker==i].High.values, price.loc[price.Ticker==i].Low.values, price.loc[price.Ticker==i].Close.values,\
                                                 price.loc[price.Ticker==i].loc[price.Ticker==i].Volume.values.astype(float),timeperiod=14)
        price.loc[price.Ticker==i,"MACD"], price.loc[price.Ticker==i,"MACD_signal"], price.loc[price.Ticker==i,"MACD_hist"] = ta.MACD(price.loc[price.Ticker==i].Close.values, fastperiod=12, slowperiod=26, signalperiod=9)
        price.loc[price.Ticker==i,"ROCP"]= ta.ROCP(price.loc[price.Ticker==i].Close.values, timeperiod=10)
        #price.loc[price.Ticker==i,"ROCR100"]= ta.ROCR100(price.loc[price.Ticker==i].Close.values, timeperiod=10)
        price.loc[price.Ticker==i,"RSI"]= ta.RSI(price.loc[price.Ticker==i].Close.values, timeperiod=14)
        price.loc[price.Ticker==i,"MA_fast"] = price.Close.rolling(10).mean()
        price.loc[price.Ticker==i,"MA_slow"] = price.Close.rolling(30).mean()
        clear_output()
        print "\nDone:", i

Base on the available stock data sources we have, we should create our own data reader in abstract data structure, which means we will hide the data getting process but only takes in command and give out standardized data table.

from enum import Enum
from datetime import datetime, timedelta
import pandas as pd
import time
from IPython.display import clear_output
import tqdm
import requests as re
import json

class get_stock_data():
    
     def __init__(self,tic_list, output="table", **kwargs):
        self.arg_list = {"freq": 'minutes',"start_date": datetime.now()-timedelta(days =256),\
                    "end_date":datetime.now(), "day_range": 256, "file_name":""}
        
        self.tic_list = tic_list
        self.output = output
        self.arg_list["start_date"] 
        
        
        for key , arg in kwargs.iteritems():
            
            if key in ["freq","start_date","end_date"]:
                self.arg_list[key]=arg
            
            if key in ["timeframe"]:
                self.arg_list[key]=arg
                self.arg_list["start_date"] = datetime.now()-timedelta(days =arg)
    
        self.error = []
    
     def data_output(self):
       
        self.result = self.result.reset_index()
        self.result["Close"] = self.result["close"]
        self.result = self.result.rename(columns={'symbol':'Ticker','timestamp':"TimeStamp","high":"High","low":"Low","open":"Open","volume":"Volume"})
        self.result["Return"]=( self.result.Close.diff(1)/self.result.Close)
        
        if self.output == "table":
            
            return self.result
    
        if self.output == "file":
            self.result.to_csv(self.arg_list["file_name"])
     
        

    
     def get_ondemand_data(self, interval = 1):
            
            self.result = pd.DataFrame()
            
            for i in tqdm.tqdm(range(len(self.tic_list))):
                trial = 0
                i = self.tic_list[i].upper()
                while trial <3:
                    try:
                        api_key = '95b5894daf3abced33fe48e7f265315e'
                        start_date=self.arg_list["start_date"].strftime("%Y%m%d%H%M%S")
                        end_date=self.arg_list["end_date"].strftime("%Y%m%d%H%M%S")
                        # This is the required format for datetimes to access the API

                        api_url = 'http://marketdata.websol.barchart.com/getHistory.csv?' + \
                                                'key={}&symbol={}&type={}&startDate={}&endDate={}&interval={}'\
                                                 .format(api_key, i, self.arg_list["freq"], start_date,end_date,interval)

                        temp = pd.read_csv(api_url, parse_dates=['timestamp'])
                        temp.set_index('timestamp', inplace=True)



                        #index= pd.MultiIndex.from_product([[i],temp.index])
                        #temp=pd.DataFrame(data=temp.values,index=index,columns=temp.columns)

                        self.result = self.result.append(temp)
                        clear_output()
                        print "Finished", i
                        
                        #time.sleep(5)
                        trial=3

                    except Exception as e:
                        print e
                        print "error occorded in getting data for ", i
                        trial +=1
                        time.sleep(10)
                        if trial == 3:
                            self.error.append([i,'get_ondemand'])
            return self.data_output()
           
            
            
     def get_quote(self):
        
        self.result = pd.DataFrame()
        
        for i in tqdm.tqdm(range(len(self.tic_list))):
            i = self.tic_list[i].upper()

        profile="https://financialmodelingprep.com/api/company/price/{}".format(i)

        temp = re.get(profile, verify=False).text

        temp=self.result.replace("\n","")

        temp = self.result.replace("<pre>","")

        temp= json.loads(result)

        temp = pd.DataFrame(result).transpose()
        
        self.result = self.result.append(temp)
        
        self.data_output()

my = get_stock_data(["AAPL"],day_range=2)
my.get_ondemand_data().head()

Finished AAPL


100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.24s/it]

Dep. Variable:	y	R-squared:	0.737
Model:	OLS	Adj. R-squared:	0.713
Method:	Least Squares	F-statistic:	30.80
Date:	Sun, 04 Aug 2019	Prob (F-statistic):	0.000173
Time:	19:40:53	Log-Likelihood:	25.241
No. Observations:	13	AIC:	-46.48
Df Residuals:	11	BIC:	-45.35
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	0.0033	0.011	0.305	0.766	-0.021	0.027
spy	-0.0109	0.002	-5.550	0.000	-0.015	-0.007
spy_	-0.3256	0.059	-5.550	0.000	-0.455	-0.196

Omnibus:	9.222	Durbin-Watson:	1.071
Prob(Omnibus):	0.010	Jarque-Bera (JB):	4.912
Skew:	-1.262	Prob(JB):	0.0858
Kurtosis:	4.641	Cond. No.	5.85e+17

Dep. Variable:	y	R-squared:	0.639
Model:	OLS	Adj. R-squared:	0.636
Method:	Least Squares	F-statistic:	221.5
Date:	Sun, 28 Oct 2018	Prob (F-statistic):	1.89e-29
Time:	23:33:17	Log-Likelihood:	-383.96
No. Observations:	127	AIC:	771.9
Df Residuals:	125	BIC:	777.6
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	199.0923	5.344	37.257	0.000	188.516	209.668
x1	0.3914	0.026	14.882	0.000	0.339	0.443

Analytics Tools

Building an Interactive Data Exploration App with R Shiny

Introduction

Prerequisites

App Overview

Getting Started

Data Preparation

Building the UI

Server Logic

UI Design in R Shiny

Basic Structure

Organizing Content with Headers and Separators

Data Display

Interactive Inputs

Action Buttons

Displaying Plots

Interactive Plots

Tabbed Panels

Download Handlers

Running the App

Run Regression in Python with Statsmodel Package

Set up AWS Lightsail for Multiple WordPress Sites

Creating AWS Account

Setting up Lightsail LAMP Linux instance and Statics IP

Setting server configuration for WordPress

Installing WordPress

Create SSL Certification using Let’s Encrypt

Setup SSL 443 Port in Apache Config

OLS Regression Model