移動标準差以及移動平均值(movstd、movmean)

最近在工作中遇到這樣一個問題：

有一個序列長度為 n n n 的序列 T = [ t 0 , t 1 , … , t n − 1 ] T=[t_0, t_1, \dots, t_{n-1}] T=[t0,t1,…,tn−1]，給定一個窗大小 m ( m < = n ) m (m <= n) m(m<=n)，下标從0開始，計算窗大小的均值和标準差，即計算T[0:m-1]、T[1:m]、T[2:m+1]…T[n-m+1:n] 的平均值和标準差

移動标準差以及移動平均值(movstd、movmean)移動标準差以及移動平均值(movstd、movmean)總結

暴力解法

最簡單的無腦的方法就是暴力循環了，很明顯這種方法特别慢，時間複雜度為 O ( n ∗ m ) O(n*m) O(n∗m)

下面為你呈現暴力代碼

import numpy as np
import time

# generate time sequence
n = 1000 * 1000
m = 1000
T = np.random.rand(n)

# brute force
means = np.zeros(n - m + 1)
stds = np.zeros(n - m + 1)

start_time = time.time()
for i in range(n - m + 1):
    means[i] = np.mean( T[i:i+m] )
end_time = time.time()
print('Running time of brute force for mean is {}s'.format((end_time - start_time)))

start_time = time.time()
for i in range(n - m + 1):
    stds[i] = np.std( T[i:i+m] )
end_time = time.time()
print('Running time of brute force for std is {}s'.format((end_time - start_time)))

Running time of brute force for mean is 5.774143934249878s
Running time of brute force for std is 20.721827030181885s

movmean

有沒有辦法進行優化呢？這裡介紹移動标準差(movstd)和移動平均值(movmean)

先從移動平均值(movmean)開始，它很簡單并且符合直覺：在滑動的過程中，有很多重疊部分，我們可以利用重疊的部分，進而節約計算時間

移動标準差以及移動平均值(movstd、movmean)移動标準差以及移動平均值(movstd、movmean)總結

如上圖所示，計算時可以利用前一個均值，這樣就避免了不必要的加法操作，平均值的計算複雜度降低為 O ( n ) O(n) O(n)

μ i ∗ m = ( t i + t i + 1 + ⋯ + t i + m − 1 ) \mu_i*m = (t_i + t_{i+1} + \dots + t_{i+m-1}) μi∗m=(ti+ti+1+⋯+ti+m−1)

μ i + 1 ∗ m = μ i ∗ m − t i + t m \mu_{i+1}*m = \mu_i*m - t_i + t_m μi+1∗m=μi∗m−ti+tm

下面代碼顯示了如何實作 movmean

def movmean(T, m):
    assert(m <= T.shape[0])
    n = T.shape[0]
    
    sums = np.zeros(n - m + 1)
    sums[0] = np.sum(T[0:m])
    
    cumsum = np.cumsum(T)
    cumsum = np.insert(cumsum, 0, 0) # 在數組開頭插入一個0
    
    sums = cumsum[m:] - cumsum[:-m]
    return sums/m

start_time = time.time()
means_2 = movmean(T, m)
end_time = time.time()
print('Running time of movmean is {}s'.format((end_time - start_time)))

Running time of movmean is 0.009449005126953125s

movstd

在介紹移動标準差之前，我們先回顧下标準差計算公式：

σ = 1 m ∑ i = 1 m ( t i − u ) 2 \sigma = \sqrt{\frac{1}{m} \sum_{i=1}^m(t_i - u)^2} σ=m1i=1∑m(ti−u)2

假設有一個長度為 3 的序列 [a, b, c]，我們來計算一下它的标準差

首先計算均值：

μ = 1 m ( a + b + c ) \mu = \frac{1}{m}(a+b+c) μ=m1(a+b+c)

然後标準差：

σ = 1 3 ( ( a − μ ) 2 + ( b − μ ) 2 + ( c − μ ) 2 ) = 1 3 ( a 2 + b 2 + c 2 − 2 a μ − 2 b μ − 2 c μ + μ 2 ) = 1 3 ( a 2 + b 2 + c 2 ) − ( 1 3 ( a + b + c ) ) 2 \begin{array}{l} \sigma &= \sqrt{ \frac{1}{3} ((a-\mu)^2 + (b-\mu)^2 + (c-\mu)^2)} \\ &= \sqrt{ \frac{1}{3} (a^2 + b^2 + c^2 -2a\mu - 2b\mu - 2c\mu + \mu^2)} \\ &= \sqrt{ \frac{1}{3}(a^2+b^2+c^2) - (\frac{1}{3}(a+b+c))^2 } \\ \end{array} σ=31((a−μ)2+(b−μ)2+(c−μ)2)

=31(a2+b2+c2−2aμ−2bμ−2cμ+μ2)

=31(a2+b2+c2)−(31(a+b+c))2

我們可以發現，标準差的計算可以用累計和來表示，而累加和是可以在 O ( n ) O(n) O(n)時間内完成，這就是 movstd

下面的代碼展示了如何計算 movstd

def movstd(T, m):
    n = T.shape[0]
    
    cumsum = np.cumsum(T)
    cumsum_square = np.cumsum(T**2)
    
    cumsum = np.insert(cumsum, 0, 0)               # 在數組開頭插入一個0
    cumsum_square = np.insert(cumsum_square, 0, 0) # 在數組開頭插入一個0
    
    seg_sum = cumsum[m:] - cumsum[:-m]
    seg_sum_square = cumsum_square[m:] - cumsum_square[:-m]
    
    return np.sqrt( seg_sum_square/m - (seg_sum/m)**2 )

start_time = time.time()
stds_2 = movstd(T, m)
end_time = time.time()
print('Running time of movstd is {}s'.format((end_time - start_time)))

Running time of movstd is 0.03198814392089844s

總結

通過提前計算好累計和，移動平均和移動标準差以空間換時間，算法速度比起暴力方法提升了幾個數量級

移動标準差以及移動平均值(movstd、movmean)移動标準差以及移動平均值(movstd、movmean)總結

移動标準差以及移動平均值(movstd、movmean)

暴力解法

movmean

movstd

總結

繼續閱讀

【數學】HDU 1222 Wolf and Rabbit

多屬性決策模型詳解（matlab）

2020年高教社模組化國賽真題B題--穿越沙漠

網際網路面試題之趣味數學題

多元函數微分學強化班

【數學】MIT線性代數總結

ZOJ 1007 Numerical Summation of a Series

BWT (Burrows–Wheeler_transform) 解碼分析

Day07-數學基礎-經濟問題(DataWhale)

HDOJ 3664 Permutation Counting / UVALive 5092 DP

MACD起飛預警（附源碼可選股），不怕得罪和驚動主力，可慢慢埋伏？啟動時第一時間被發現，真的這麼難嗎？MACD金叉背離這

有關拉格朗日反演的擴充形式的證明

時序分析(9) -- 時序平滑（上)時序分析(9)

【CF 125D】 Two progressions 劃分等差數列

Codeforces Round #319 (Div. 2) B C D

2017多校訓練Contest4: 1003 Counting Divisors hdu6069