1 Python資料分析 NumPy Pandas Tushare财經資料接口包

Python資料分析

1 NumPy子產品

1.1 介紹

NumPy(Numerical Python) 是用于科學計算的基礎庫，支援多元度的數組與矩陣運算。

1.2 ndarray對象

1.2.1 介紹

ndarray對象是用于存放同類型元素的多元數組對象，是一系列同類型資料的集合。

ndarray對象中的每個元素在記憶體中都占據相同大小的存儲區域。

數組和清單的差別是數組中的所有元素類型必須相同，類型優先級：字元串 > 浮點型 > 整數

1.2.2 建立ndarray對象

1.2.2.1 numpy.array

numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)

參數	說明
object	數組或嵌套的數列
dtype	元素的資料類型
copy	對象是否需要複制
order	建立數組的樣式，C為行方向，F為列方向，A為任意方向(預設)
subok	預設傳回一個與基類類型一緻的數組
ndmin	指定生成數組的最小次元

import numpy as np

# 一維數組
np.array([1, 2, 3])  # array([1, 2, 3])
# 二維數組
np.array([[1, 2, 3], [4, 5, 6]])
'''
array([[1, 2, 3],
       [4, 5, 6]])
'''
# 數組中的所有元素類型必須相同，類型優先級：字元串>浮點型>整數
np.array([1, 1.2, '12'])  # array(['1', '1.2', '12'], dtype='<U32')
np.array([1, 1.2, 12])  # array([ 1. ,  1.2, 12. ])
# 指定最小次元
np.array([1, 2, 3], ndmin=5)  # array([[[[[1, 2, 3]]]]]))
# 指定dtype
np.array([1, 2, 3], dtype=complex)  # array([1.+0.j, 2.+0.j, 3.+0.j])

1.2.2.2 numpy提供的routines函數

numpy.empty

用于建立未初始化的數組，元素為随機值。

arr = np.empty([3, 2], dtype=int) 
arr.fill(100)
'''
array([[100, 100],
       [100, 100]])
'''

numpy.zeros

np.zeros(shape=(2, 3))  # 預設為浮點數
'''
array([[0., 0., 0.],
       [0., 0., 0.]])
'''
np.zeros(3, dtype=np.int)  # 指定為整形
array([0, 0, 0])

numpy.ones

np.ones(3)  # 預設為浮點數
'''
array([1., 1., 1.])
'''
np.ones(3, dtype=np.int)  # 指定為整形
'''
array([1, 1, 1])
'''

np.linspace

指定元素個數，傳回一維等差數列。

np.linspace(開始值, 終止值, 元素個數)

np.linspace(0, 20, num=10)
'''
array([ 0.        ,  2.22222222,  4.44444444,  6.66666667,  8.88888889,
       11.11111111, 13.33333333, 15.55555556, 17.77777778, 20.        ])
'''

np.arange

指定步長，傳回一維等差數列。

np.arange(開始值, 終止值, 步長)

np.random.randint

傳回随機數組，指定元素取值範圍。

np.random.randint(0, 100, size=(2, 3))
'''
array([[84, 26, 20],
       [10, 88,  4]])
'''

np.random.random

傳回随機數組，元素取值範圍為[0, 1]。

np.random.random(size=(2, 3))
'''
array([[0.34844375, 0.44087602, 0.82370203],
       [0.04277734, 0.8713185 , 0.57144526]])
'''

1.2.2.3 matplotlib.pyplot

利用圖檔資料生成ndarray對象。

import matplotlib.pyplot as plt

# imread傳回numpy數組。
img_arr = plt.imread('./test.jpg')
# 使用numpy數組進行圖像展示。
plt.imshow(img_arr)

1.2.3 NumPy基本類型

1 Python資料分析 NumPy Pandas Tushare财經資料接口包

arr0 = np.array([1, 2, 3], dtype='float32')  # array([1., 2., 3.], dtype=float32)
arr1 = arr0.astype('int8')  
print(arr1)  # array([1, 2, 3], dtype=int8)
print(arr0)  # array([1., 2., 3.], dtype=float32)

arr0.dtype = 'int16'  
print(arr0)  # array([    0, 16256,     0, 16384,     0, 16448], dtype=int16)

1.2.4 ndarray對象的屬性

屬性	說明
ndim	秩(rank)，即軸的數量或次元的數量
shape	數組的次元
size	數組元素的總個數
dtype	數組的元素類型

arr = np.random.random(size=(2, 3))

arr.ndim  # 2
arr.shape  # (2, 3)
arr.size  # 6
arr.dtype  # dtype('float64')

1.3 操作ndarray對象

1.3.1 索引操作

arr = np.random.randint(0, 100, size=(5, 6))
'''
array([[36, 50, 31, 15],
       [71,  3,  8,  9],
       [47, 12,  8, 47]])
'''
arr[1][3]  # 9

1.3.2 切片操作

# 取出前兩行的資料
arr[0:2]
'''
array([[36, 50, 31, 15],
       [71,  3,  8,  9]])
'''
# 取出前兩列的資料
arr[:, 0:2]
'''
array([[36, 50],
       [71,  3],
       [47, 12]])
'''
# 取出前兩行前兩列的資料
arr[0:2, 0:2]
'''
array([[36, 50],
       [71,  3]])
'''

1.3.3 翻轉操作

arr
'''
array([[36, 50, 31, 15],
       [71,  3,  8,  9],
       [47, 12,  8, 47]])
'''
# 按行翻轉
arr[::-1]
'''
array([[47, 12,  8, 47],
       [71,  3,  8,  9],
       [36, 50, 31, 15]])
'''
# 按列翻轉
arr[:, ::-1]
'''
array([[15, 31, 50, 36],
       [ 9,  8,  3, 71],
       [47,  8, 12, 47]])
'''
# 按行和列翻轉
arr[::-1, ::-1]
'''
array([[47,  8, 12, 47],
       [ 9,  8,  3, 71],
       [15, 31, 50, 36]])
'''

1.3.4 翻轉操作案例翻轉圖檔

img_arr.shape  # (626, 413, 3)
# 将圖檔上下翻轉
plt.imshow(img_arr[::-1])
plt.imshow(img_arr[::-1, :, :])
# 将圖檔左右翻轉
plt.imshow(img_arr[:, ::-1, :])
# 反色處理
plt.imshow(img_arr[:, :, ::-1])
# 圖檔裁剪
plt.imshow(img_arr[170:390, 100:320, :])

1.3.5 變形操作 reshape

變形操作不能修改數組的元素個數。

arr.shape  # (3, 4)
# 二維數組 => 一維數組
arr1 = arr.reshape((12,))  # array([36, 50, 31, 15, 71,  3,  8,  9, 47, 12,  8, 47])
# 二維數組變形
arr2 = arr.reshape(2, 6)
'''
array([[36, 50, 31, 15, 71,  3],
       [ 8,  9, 47, 12,  8, 47]])
'''

1.3.6 級聯操作 concatenate

将多個numpy數組在行方向上進行橫向拼接或在列方向上進行縱向拼接。

numpy.concatenate((a1, a2, ...), axis)

a1, a2, …：相同類型的數組；

axis：軸向，0表示列(預設)，1表示行。

axis=0，列方向進行拼接，列數要相等；

axis=1，行方向進行拼接，行數要相等。

arr1 = np.random.randint(0, 100, size=(2, 1))
arr2 = np.random.randint(0, 100, size=(2, 3))
np.concatenate((arr1, arr2), axis=1)
'''
array([[83, 24, 42, 66],
       [96, 66, 25, 52]])
'''

1.3.7 級聯操作案例圖檔九宮格

img_arr_3 = np.concatenate((img_arr, img_arr, img_arr), axis=1)  # 橫向拼接
img_arr_9 = np.concatenate((img_arr_3, img_arr_3, img_arr_3), axis=0)  # 縱向拼接
plt.imshow(img_arr_9)

1.4 函數

1.4.1 統計函數

1.4.1.1 amin，amax

numpy.amin() 用于擷取數組中指定軸向上的元素最小值。

numpy.amax() 用于擷取數組中指定軸向上的元素最大值。

參數axis指定軸向，0表示列(預設)，1表示行。

arr
'''
array([[36, 50, 31, 15],
       [71,  3,  8,  9],
       [47, 12,  8, 47]])
'''
np.amin(arr, axis=0)  # array([36,  3,  8,  9])
np.amin(arr, axis=1)  # array([15,  3,  8])

1.4.1.2 極差 ptp

numpy.ptp()函數用于計算數組中指定軸向上的元素最大值與最小值之差，即：最大值 - 最小值。

arr
'''
array([[36, 50, 31, 15],
       [71,  3,  8,  9],
       [47, 12,  8, 47]])
'''
np.ptp(arr, axis=0)  # array([35, 47, 23, 38])
np.ptp(arr, axis=1)  # array([35, 68, 39])

1.4.1.3 中位數 median

numpy.median()函數用于計算中位數。

arr
'''
array([[36, 50, 31, 15],
       [71,  3,  8,  9],
       [47, 12,  8, 47]])
'''
np.median(arr, axis=0)  # array([47., 12.,  8., 15.])
np.median(arr, axis=1)  # array([33.5,  8.5, 29.5])

1.4.1.4 算術平均值 mean

numpy.mean()函數用于計算算術平均值。

arr
'''
array([[36, 50, 31, 15],
       [71,  3,  8,  9],
       [47, 12,  8, 47]])
'''
np.mean(arr, axis=0)  # array([51.33333333, 21.66666667, 15.66666667, 23.66666667])
np.mean(arr, axis=1)  # array([33.  , 22.75, 28.5 ])

1.4.1.5 方差 var

方差是每個樣本值與全體樣本值的平均值之差的平方的平均數，即

mean((x - x.mean())**2)

。

arr
'''
array([[36, 50, 31, 15],
       [71,  3,  8,  9],
       [47, 12,  8, 47]])
'''
np.var(arr, axis=0)  # array([213.55555556, 414.88888889, 117.55555556, 278.22222222])
np.var(arr, axis=1)  # array([156.5   , 781.1875, 344.25  ])

1.4.1.6 标準差 std

标準差是方差的算術平方根，用于表示一組資料的離散程度。

std = sqrt(mean((x - x.mean())**2))

arr
'''
array([[36, 50, 31, 15],
       [71,  3,  8,  9],
       [47, 12,  8, 47]])
'''
np.std(arr, axis=0)  # array([14.61354014, 20.36882149, 10.84230398, 16.67999467])
np.std(arr, axis=1)  # array([12.509996  , 27.94973166, 18.55397532])

1.4.2 數學函數

1.4.2.1 三角函數

标準的三角函數：sin()、cos()、tan()。

a = np.array([0, 45, 60, 270, 540])
np.sin(a*np.pi/180)  # array([ 0.00000000e+00,  7.07106781e-01,  8.66025404e-01, -1.00000000e+00, 3.67394040e-16])
np.cos(a*np.pi/180)  # array([ 1.00000000e+00,  7.07106781e-01,  5.00000000e-01, -1.83697020e-16, -1.00000000e+00])
np.tan(a*np.pi/180)  # array([ 0.00000000e+00,  1.00000000e+00,  1.73205081e+00,  5.44374645e+15, -3.67394040e-16])

1.4.2.2 四舍五入 around

numpy.around() 函數傳回指定數字的四舍五入值。

參數decimals表示舍入的小數位數，預設值為0，如果為負，将四舍五入到小數點左側對應的位置。

a = np.array([1.0, 5.55, 123, 0.567, 25.532])  
np.around(a)  # array([  1.,   6., 123.,   1.,  26.])
np.around(a, decimals=1)  # array([  1. ,   5.6, 123. ,   0.6,  25.5])
np.around(a, decimals=-1)  # array([  0.,  10., 120.,   0.,  30.])

1.4.3 線性代數與矩陣

1.4.3.1 點積 dot

numpy.dot()用于計算兩個數組的矩陣乘積。

np.dot([[2,1], [4,3]], [[1,2], [1,0]])
'''
array([[3, 4],
       [7, 8]])
'''

1.4.3.2 轉置 T

arr
'''
array([[36, 50, 31, 15],
       [71,  3,  8,  9],
       [47, 12,  8, 47]])
'''
arr.T
'''
array([[36, 71, 47],
       [50,  3, 12],
       [31,  8,  8],
       [15,  9, 47]])
'''

1.4.3.3 矩陣庫numpy.matlib簡介

NumPy中包含一個矩陣庫numpy.matlib，矩陣庫中的函數傳回的是一個矩陣，而不是ndarray對象。

2 Pandas

Pandas是一個分析結構化資料的工具集，用于資料挖掘和資料分析，同時也提供資料清洗功能。

資料結構

資料結構	次元	描述
Series	1	帶标簽的一維同構數組
DataFrame	2	帶标簽的大小可變的二維異構表格

2.1 Series

2.1.1 介紹

Series是一種類似于一維數組的對象，由資料(NumPy的ndarray數組對象)及對應的索引組成。

導入包

import pandas as pd
import numpy as np
from pandas import Series, DataFrame

2.1.2 建立Series

list1 = [1, 2, 3, 4, 5]
Series(data=list1)  # 隐式索引
'''
0    1
1    2
2    3
3    4
4    5
dtype: int64
'''

dict1 = {
    'A': 100,
    'B': 99,
    'C': 120,
}
Series(data=dict1)  # 顯式索引
'''
A    100
B     99
C    120
dtype: int64
'''

2.1.3 索引與切片

設定索引

隐式索引：未指定時自動生成的索引（0，1，2…）;

顯示索引：自定義索引，通過index參數設定或傳入字典資料。

s = Series(data=np.random.randint(0, 100, size=(3,)), index=['A', 'B', 'C'])
'''
A    61
B    17
C    37
dtype: int32
'''

索引取值

s[0]  # 44
s['A']  # 44
s.A  # 44

切片取值

s[0: 3]
s['A': 'D']
'''
A    44
B    90
C    39
dtype: int32
'''

2.1.4 屬性

# 索引
s.index  # Index(['A', 'B', 'C'], dtype='object')
# 值
s.values  # array([44, 90, 39])

s.size  # 3
s.shape  # (3,)

shape

2.1.5 常用方法

head 和 tail

s.head(2)
'''
A    44
B    90
dtype: int32
'''

s.tail(2)
'''
B    90
C    39
dtype: int32
'''

unique 和 nunique

s = Series(data=[1,1,2,2,3,3,3,3,3,3,4,5,6,7,7,7])
# 去除重複元素
s.unique()  # array([1, 2, 3, 4, 5, 6, 7])
# 統計去重後的元素個數
s.nunique()  # 7

算術運算

索引一緻的元素可以進行算數運算，否則補空(NaN)。

s1 = Series(data=[1,2,3,4,5], index=['a','b','c','d','e'])
s2 = Series(data=[1,2,3,4,5], index=['a','b','f','d','e'])
s3 = s1 + s2
'''
a     2.0
b     4.0
c     NaN
d     8.0
e    10.0
f     NaN
dtype: float64
'''

isnull 和 notnull

# 檢測Series中的元素是否為空，空則傳回True，否則傳回False。
s.isnull()
'''
a    False
b    False
c     True
d    False
e    False
f     True
dtype: bool
'''
# 取出空資料
s[s.isnull()]
'''
c   NaN
f   NaN
dtype: float64
'''

# 檢測Series中的元素是否不為空，非空則傳回True，否則傳回False。
s.notnull()
'''
a     True
b     True
c    False
d     True
e     True
f    False
dtype: bool
'''
# 取出非空資料，資料清洗。
s[s.notnull()]
'''
a     2.0
b     4.0
d     8.0
e    10.0
dtype: float64
'''

2.2 DataFrame

2.2.1 介紹

DataFrame是Pandas中的表格型資料結構，包含有一組有序的列，列與列之間資料類型可以不同(數值、字元串、布爾型等)，可以視為由Series組成的字典。

行索引：index
列索引：columns
值：values

2.2.2 建立DataFrame

ndarray數組

df = DataFrame(data=np.random.randint(0, 100, size=(5, 6)), columns=['a','b','c','d','e','f'], index=['A','B','C','D','E'])
'''
    a	b	c	d	e	f
A	16	49	89	28	14	17
B	86	35	95	90	4	85
C	88	67	57	13	1	76
D	31	34	62	30	52	89
E	92	56	98	20	1	16
'''

字典

dict1 = {
    'name': ['A', 'B', 'C'],
    'salary': [10000, 20000, 30000]
}
df = DataFrame(data=dict1, index=['a', 'b', 'c'])
'''
   name	salary
a	A	10000
b	B	20000
c	C	30000
'''

練習

根據以下考試成績表，建立一個DataFrame，命名為score_df。

張三	李四
國文	150
數學	150
英語	150
理綜	300

score_dict = {
    '張三': [150, 150, 150, 300],
    '李四': [0, 0, 0, 0]
}
score_df = DataFrame(data=score_dict, index=['國文', '數學', '英語', '理綜'])

2.2.3 屬性

dict1 = {
    'name': ['A', 'B', 'C'],
    'salary': [10000, 20000, 30000]
}
df = DataFrame(data=dict1, index=['a','b','c'])

df.values
'''
array([['A', 10000],
       ['B', 20000],
       ['C', 30000]], dtype=object)
'''

df.columns
'''
Index(['name', 'salary'], dtype='object')
'''

df.index
'''
Index(['a', 'b', 'c'], dtype='object')
'''

df.shape  # (3, 2)

2.2.4 索引操作

df
'''
	    張三	李四
國文	150	    0
數學	150	    0
英語	150	    0
理綜	300	    0
'''

對列進行索引取值。

df['張三']
df[['張三', '李四']]

iloc與loc對行進行索引取值。

iloc是通過隐式索引取行；

loc是通過顯式索引取行。

df.loc['國文']
df.iloc[0]
df.iloc[[1, 2, 3]]

取元素

df.loc['數學', '張三']  # 150
df.iloc[1, 0]  # 150

df.iloc[[0, 2], 0]
'''
國文    150
英語    150
Name: 張三, dtype: int64
'''

2.2.5 切片操作

對行進行切片

df[1: 3]
'''
	    張三	李四
數學	150	     0
英語	150	     0
'''

對列進行切片

df.iloc[:, 0: 1]
'''
	    張三
國文	150
數學	150
英語	150
理綜	300
'''

索引
df[col]: 取列
df.loc[index]: 取行
df.iloc[index, col]: 取元素

切片
df[index1: index3]: 切行
df.iloc[:, col1: col3]: 切列

2.2.6 練習

初始資料

# 期中考試成績
midterm_score_dict = {
    '張三': [150, 150, 150, 300],
    '李四': [0, 0, 0, 0]
}
midterm_score_df = DataFrame(data=midterm_score_dict, index=['國文', '數學', '英語', '理綜'])

# 期末考試成績
final_score_dict = {
	'張三': [100, 90, 90, 100],
    '李四': [0, 0, 0, 0]
}
final_score_df = DataFrame(data=final_score_dict, index=['國文', '數學', '英語', '理綜'])

求期中期末的平均值。

張三期中考試數學被發現作弊，記0分處理。

李四因為舉報張三作弊有功，期中考試所有科目加100分。

期中考試給每位學生的每個科目都加10分。

2.2.7 時間資料類型轉換

pd.to_datetime(col)

準備資料

info_dict = {
    'name': ['Jay', 'Tom', 'Bobo'],
    'hire_date': ['2010-10-11', '2012-12-01', '2011-11-12'],
    'salary': [10000, 20000, 30000]
}
df = DataFrame(data=info_dict)
'''
    name	hire_date	salary
0	Jay	    2010-10-11	10000
1	Tom	    2012-12-01	20000
2	Bobo	2011-11-12	30000
'''

檢視資訊

df.info()
'''
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   name       3 non-null      object
 1   hire_date  3 non-null      object
 2   salary     3 non-null      int64 
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes
'''

字元串格式的時間資料轉換成時間序列類型資料

再次檢視資訊

df.info()
'''
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   name       3 non-null      object        
 1   hire_date  3 non-null      datetime64[ns]
 2   salary     3 non-null      int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 200.0+ bytes
'''

2.2.8 将某一列設定為行索引

将hire_date列設定為行索引。

new_df = df.set_index('hire_date')
'''
	        name	salary
hire_date		
2010-10-11	Jay	    10000
2012-12-01	Tom	    20000
2011-11-12	Bobo	30000
'''

new_df.shape  # (3, 2)

3 Tushare财經資料接口包

3.1 簡介

Tushare是一個财經資料接口包，主要用于提供便于分析的股票等金融資料。Tushare傳回的資料類型基本都是Pandas的DataFrame，便于使用Pandas/NumPy/Matplotlib進行資料分析和可視化。

安裝Tushare

python -m pip install tushare

3.2 股票分析

需求：

使用Tushare包擷取貴州茅台[600519]的近十年股票行情資料；
輸出該股票所有收盤比開盤的漲幅超過3%的日期；
輸出該股票所有開盤比前日收盤的跌幅超過2%的日期；
假如從2010年1月1日開始，每月第一個交易日買入1手股票，每年最後一個交易日賣出所有股票，到今天為止收益如何？

3.2.1 問題1

使用Tushare包擷取某股票(600519)的曆史行情資料。

導入包

import tushare as ts
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

用Tushare包擷取某股票的曆史行情資料

持久化存儲

從外部加載資料

df = pd.read_csv('./maotai.csv')
df.head()

删除Unnamed: 0列

注意，drop系列函數中axis=0表示行，axis=1表示列。

檢視資料資訊。

df.info()
'''
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2551 entries, 0 to 2550
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   date    2551 non-null   object 
 1   open    2551 non-null   float64
 2   close   2551 non-null   float64
 3   high    2551 non-null   float64
 4   low     2551 non-null   float64
 5   volume  2551 non-null   float64
 6   code    2551 non-null   int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 139.6+ KB
'''

格式轉換，将date列的字元串類型的時間資料轉換為時間序列類型。

df['date'].dtype  # dtype('<M8[ns]')

df.info()
'''
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2551 entries, 0 to 2550
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    2551 non-null   datetime64[ns]
 1   open    2551 non-null   float64       
 2   close   2551 non-null   float64       
 3   high    2551 non-null   float64       
 4   low     2551 non-null   float64       
 5   volume  2551 non-null   float64       
 6   code    2551 non-null   int64         
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 139.6 KB
'''

将date列作為源資料的行索引。

df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2551 entries, 2010-01-04 to 2020-07-13
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    2551 non-null   float64
 1   close   2551 non-null   float64
 2   high    2551 non-null   float64
 3   low     2551 non-null   float64
 4   volume  2551 non-null   float64
 5   code    2551 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 139.5 KB

3.2.2 問題2

輸出該股票所有收盤比開盤的漲幅超過3%的日期。

(收盤 - 開盤) / 開盤 > 0.03
(df['close'] - df['open']) / df['open'] > 0.03

注，在df的相關操作中如果傳回了布爾值，下一步馬上考慮将布爾值作為原始資料的行索引。

擷取滿足要求的資料

擷取滿足要求的日期

df.loc[(df['close'] - df['open']) / df['open'] > 0.03].index

3.2.3 問題3

輸出該股票所有開盤比前日收盤的跌幅超過2%的日期。

(開盤 - 前日收盤) / 前日收盤 < -0.02

df['close'].shift(1).head()
'''
date
2010-01-04        NaN
2010-01-05    108.446
2010-01-06    108.127
2010-01-07    106.417
2010-01-08    104.477
Name: close, dtype: float64
'''

(df['open'] - df['close'].shift(1)) / df['close'].shift(1) < -0.02
df.loc[(df['open'] - df['close'].shift(1)) / df['close'].shift(1) < -0.02]
df.loc[(df['open'] - df['close'].shift(1)) / df['close'].shift(1) < -0.02].index

3.2.4 問題4

假如從2010年1月1日開始，每月第一個交易日買入1手(100支)股票，每年最後一個交易日賣出所有股票，到今天為止收益如何？

分析：

買股票
每月的第一個交易日根據開盤價買入一手股票，即100支股票，
則一年需要買入12月 * 100支 = 1200支股票。

賣股票
每年最後一個交易日(12-31)根據開盤價賣出所有的股票，
則一年需要賣出1200支股票。

現在是2020年7月，則2020年隻能買入700支股票，無法賣出。此時在計算總收益時需要将剩餘股票的價值也計算在内。

資料的重新取樣 resample

# 每個月第一個交易日對應的行資料。
df_monthly = new_df.resample(rule='M').first()

計算買入股票一共花了多少錢

計算賣出股票收入多少錢，A表示年。

df_yearly = new_df.resample('A').last()[0:-1]
recv = df_yearly['open'].sum() * 1200  # 4368184.8

計算剩餘股票的價值

計算總收益

1 Python資料分析 NumPy Pandas Tushare财經資料接口包

Python資料分析

1 NumPy子產品

1.1 介紹

1.2 ndarray對象

1.2.1 介紹

1.2.2 建立ndarray對象

1.2.3 NumPy基本類型

1.2.4 ndarray對象的屬性

1.3 操作ndarray對象

1.3.1 索引操作

1.3.2 切片操作

1.3.3 翻轉操作

1.3.4 翻轉操作案例 翻轉圖檔

1.3.5 變形操作 reshape

1.3.6 級聯操作 concatenate

1.3.7 級聯操作案例 圖檔九宮格

1.4 函數

1.4.1 統計函數

1.4.2 數學函數

1.4.3 線性代數與矩陣

2 Pandas

2.1 Series

2.1.1 介紹

2.1.2 建立Series

2.1.3 索引與切片

2.1.4 屬性

2.1.5 常用方法

2.2 DataFrame

2.2.1 介紹

2.2.2 建立DataFrame

2.2.3 屬性

2.2.4 索引操作

2.2.5 切片操作

2.2.6 練習

2.2.7 時間資料類型轉換

2.2.8 将某一列設定為行索引

3 Tushare财經資料接口包

3.1 簡介

3.2 股票分析

3.2.1 問題1

3.2.2 問題2

3.2.3 問題3

3.2.4 問題4

繼續閱讀

1.3.4 翻轉操作案例翻轉圖檔

1.3.7 級聯操作案例圖檔九宮格