Graphing/Plotting




There are a lot to say for graphing, or plotting. The package we used in Python is matplotlib, versus ggplot in R programming. In order to illustrate plotting, I also import numpy here to create some sample dataset.

In [4]:
from matplotlib import pylab as plt
import numpy as np

You can simply create plots like this:

In [28]:
x =np.linspace(-np.pi, np.pi, 255,endpoint=True)
# numpy's linespace function is pretty good at creating a x axis

y = np.sin(x)
plt.plot(x,y)
plt.show()
In [32]:
# Of course you can overlap plots

x =np.linspace(-np.pi, np.pi, 255,endpoint=True)
y = np.sin(x)
z = 2*x
plt.plot(x,y)
plt.plot(x,z)
plt.show()

Just need to make sure two plots run at the same time and they will be overlapped. At least one axis is the same among plots otherwise it will return an error.

If you want to customize the output, you need to do more.

Creating Subplot

You can have more than one plot in one canvas. The way to control it is to use subplot() method

The syntax for subplot is plt.subplot(No.row No.Col No.)

The top left plot of a 2×2 plot is plt.subplot(221)
The plot on its right is plt.subplot(222)

In [40]:
my_plot = plt.subplot(221) 
my_plot.plot(x,y)
# usually we store it into a variale for further formatting

my_plot2=plt.subplot(222) 
my_plot2.plot(x,z)

plt.show()

Setting The Plot Space

In [42]:
# Set the canvas
# The value in figsize is how many increments
plt.figure(figsize=(8,5), dpi=80)

my_plot = plt.subplot(111)
my_plot.plot(x,z)

# You can also set how the plot is being framed

my_plot.spines['right'].set_color('none')
my_plot.spines['top'].set_color('none')
my_plot.xaxis.set_ticks_position('bottom')
my_plot.yaxis.set_ticks_position('left')

# This property sets how the graph looks like
my_plot.spines['left'].set_position(('axes',0))
my_plot.spines['bottom'].set_position(("axes",0))

##### What if we change "axes" to" data"? 

plt.show()

And More

In [40]:
# Set the canvas
# The value in figsize is how many increments
plt.figure(figsize=(8,5), dpi=80)

my_plot = plt.subplot(111)

# You can also set color, line width, style and label
my_plot.plot(x,y,color="red", linewidth=1.5, linestyle="-", label="cosine")

# You can also set how the plot is being framed

my_plot.spines['right'].set_color('none')
my_plot.spines['top'].set_color('none')
my_plot.xaxis.set_ticks_position('bottom')
my_plot.yaxis.set_ticks_position('left')

# This property sets how the graph looks like
my_plot.spines['left'].set_position(('data',0))
my_plot.spines['bottom'].set_position(("data",0))


# I can also manipulate the axises 
plt.xlim(x.min()*1.1, x.max()*1.1) # set limits of current axis
plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],
           [r'$-\pi

#39;, r‘$-\pi/2


#39;, r‘$0


#39;, r‘$+\pi/2


#39;, r‘$+\pi


#39;])
plt.ylim(y.min()*1.1,y.max()*1.1)
plt.yticks(range(10,10,1)
)
# annotate a specific point
plt.annotate(r‘$\sin(\frac{\pi}{2})=1


#39;,
xy=(np.pi/2,1), xycoords=‘data’,
xytext=(60, 40), textcoords=‘offset points’, fontsize=16,
arrowprops=dict(arrowstyle=“->”, connectionstyle=“arc3,rad=.2”))

plt.show()

Scatter Plot and More

In [44]:
import time
np.random.seed(int(time.time()))
#trial = [i for i in np.random.rand(100) ]
trial = np.array(np.random.rand(100))
y = trial *2
plt.scatter(trial,y)

# we can save the picture file
plt.savefig("test.png",dpi=72)

Histogram

Let’s get some finance data for this example

In [1]:
from data_source_lib import *

# import our magic lab
In [11]:
get_ins = get_stock_data(["AAPL"],freq= "daily",day_range=300)
my_data = get_ins.get_ondemand_data()["close"]
Finished AAPL
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.07it/s]
In [26]:
plt.figure(figsize=(8,5), dpi=80)
my_hist = plt.hist(my_data)
plt.xticks(range(50,300,10))
plt.show()

Choosing The Right Python Data Type for Analysis


Untitled






In [4]:
import datetime 

Dictionary

Dictionary is used when there exists a mapping relationship, for example, in stock market data, stock prices are linked to a specific date.

In [94]:
stock = {"Header":["Open","High","Low","Close"],"12/12/2015":[32.03,50,40,32]}
In [95]:
stock["12/12/2015"]
Out[95]:
[32.03, 50, 40, 32]

Dictionary provides some useful methods

In [74]:
list(stock.iterkeys())  #key iterator, giving you a list
Out[74]:
['Header', '12/12/2015']
In [78]:
list(stock.iteritems()) #iterm iterator, giving you tuples, representing relations
Out[78]:
[('Header', ['Open', 'High', 'Low', 'Close']),
 ('12/12/2015', [32.03, 50, 40, 32])]

Dictionary can also be feeded in the constructor of pandas dataframe. We will talk about this in my later posts.

Tuple

Anything you want to be considered as a whole should use turple becuase it’s immutable. Turple often used in feeding a set of arguments into function caller.

In [68]:
x=12
y=23
z=23.6

point = (x,y,z) # a coordinate can use turple to replesent because the
                # position of each element matters 

Some people also argues that you can compare tuple to strcut in C/C++ since tuple usually holds heterogeneous collections.

As I mentioned, tuple also used to represent relations in discrete data structure. I can use you an example using dictionary and tuple

In [66]:
# for example, we have y = x^2
f = {1:1,2:4,3:9}
f.items()  # the items method can turn dictionary into a tuple
Out[66]:
[(1, 1), (2, 4), (3, 9)]

List

List is like array in other programming language. You should use list in situations that uses array. In python, list provides you more methods to perform comprehensive analysis.

List as stack operation

In [14]:
l = []
l.append(1)
l.append(2)
l.append(3)
In [15]:
l
Out[15]:
[1, 2, 3]
In [22]:
l.pop()  # First in first out 
Out[22]:
1
In [17]:
l
Out[17]:
[1, 2]
In [18]:
l.append(4)
l
Out[18]:
[1, 2, 4]
In [19]:
l.pop()
Out[19]:
4
In [20]:
l.pop()
Out[20]:
2
In [21]:
l
Out[21]:
[1]

List sorting

In [32]:
l = [2,3,9,4]
l.sort()
In [33]:
l
Out[33]:
[2, 3, 4, 9]

List Reversing

In [34]:
l.reverse()
In [35]:
l
Out[35]:
[9, 4, 3, 2]

List Extend Method

In [40]:
l.extend([1]) # need to feed a list, and will insert the element to a proper place
In [39]:
l
Out[39]:
[9, 4, 3, 2, 1]

Other List Methods

In [42]:
l.remove(1) #remove elements
l
Out[42]:
[9, 4, 3, 2]
In [52]:
l.index(4) # return the position of an element
Out[52]:
1
In [62]:
l.insert(0,2) # insert 2 into index 0
l
Out[62]:
[2, 9, 4, 3, 2]

We will talk about data structure usage more in-depth later in my blog when we go further into analysis.

Python Basic Data Structure — Set




Set







A set in python is the same concept of the set in Discret Math. A set contains unique unorder elements.

In python, a set is definded by putting “{}”

In [2]:
s = {1,2,3}
s
Out[2]:
{1, 2, 3}
In [5]:
s = {"1231",(1,2,3),2} # A set can contain different data type

A set is nonordered, so it can not access by indexing

In [7]:
s[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-4e98c4f87897> in <module>()
----> 1 s[0]

TypeError: 'set' object does not support indexing

However, it can be access by iterating

In [6]:
for ele in s:
    print ele
2
1231
(1, 2, 3)

If feeding a set with same elements, it will only save that element once, for example,

In [12]:
s_1 = {1,1,(1,2),(1,2)}
s_1
Out[12]:
{1, (1, 2)}

You can turn other data strcture in python into a set by using set() function

In [15]:
l = [1,2,3,3,2]
t = (2,2,2,1)
set(l)
Out[15]:
{1, 2, 3}
In [16]:
set(t)
Out[16]:
{1, 2}

Python Basic Data Structure — Tuple




Tuple







First of all, a tuple is defined by putting “()”, and using “,” to seperate values. Turple is sequencial, meaning the ordering matters.

In [3]:
t=(1,2,3)
t
Out[3]:
(1, 2, 3)

Same as list or ditionary, tuple can hold values of different type

In [4]:
t=("1",2.0,3)
t
Out[4]:
('1', 2.0, 3)
In [5]:
t[0]  #accessing tuple value is the same as list
Out[5]:
'1'

However, you can not alter element in tuple because tuple is considered, because turple is considered immutable. In discrete math, turple usually represent functions and directional object.

In [6]:
t[0]=2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-0c0cc230f53e> in <module>()
----> 1 t[0]=2

TypeError: 'tuple' object does not support item assignment

One thing needs to pay attention is when you want to assign a singlton, you need to leave a comma, for example,

In [15]:
t2=(2)   # this won't give you a turple
t2  
Out[15]:
2
In [16]:
type(t2)
Out[16]:
int
In [17]:
t3=(2,)  # you have to leave a comma
t3
Out[17]:
(2,)
In [19]:
type(t3)
Out[19]:
tuple

Python Basic Data Structure — Dictionary




Dictionary







In this part, we will show examples of dictionary. Dictionary provides a way for you to map a specific value to another one.

In [2]:
my_dict = {"1":1,"2":2}
print my_dict
{'1': 1, '2': 2}

Different than list, dictionary require the get method for its elements

In [4]:
my_dict.get("1")
Out[4]:
1
In [20]:
my_dict.update({"1":2})

You can use copy method to define a new dictionary

In [21]:
new_dict = my_dict.copy()
print new_dict
{'1': 2, '2': 2}

You can use iterkeys method to get the iternerator of keys, and wrap a list function to get the list value.

In [22]:
print (list(new_dict.iterkeys()))

print (list(new_dict.itervalues()))
['1', '2']
[2, 2]
In [30]:
new_dict["3"] =3
print new_dict
{'1': 2, '3': 3, '2': 2}

Python Basic Data Structure– List

List

List is the most common type of handling data

In [2]:
#generate list of 0 to 9 
my_list = range(10)
In [3]:
my_list
Out[3]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

As you can see, list is braced by [ ]

In [4]:
# You can retrive list element
my_list[0]
Out[4]:
0

List can hold different data types

In [5]:
# To add something into a list, use append methon
# This methond can only append one element at a time
my_list.append("1")
In [6]:
my_list
Out[6]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, '1']
In [7]:
# You can also store a list in a list
my_list.append(["1",1.2032,0x2f2])
In [8]:
# when you print the list hexedecimal number will be converted to decimal
my_list
Out[8]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, '1', ['1', 1.2032, 754]]

List Indexing

In [9]:
# remember index starts with 0
print (my_list[0])
print (my_list[11])
# You can also count backward, and it starts with -1
print (my_list[-1])
0
['1', 1.2032, 754]
['1', 1.2032, 754]

List Slicing

In [10]:
start_position = 0
end_position = 10

my_list[start_position:end_position]
Out[10]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [11]:
start_position = 2
end_position = 3

my_list[start_position:end_position]
Out[11]:
[2]
In [12]:
start_position = -5
end_position = -1

my_list[start_position:end_position]
Out[12]:
[7, 8, 9, '1']

Just pay attention to these 3 examples. When doing list slicing, you are not pointing element, but slicing between elements. That's why my_list[2:3] gives you the second element.