目录
-
-
- 学习算法
-
- Viterbi学习算法
- Baum-Welch学习算法
- python实现
-
代码地址:https://gitee.com/liangcd/speech_learning/tree/master/HMM
HMM基本概念、概率计算、预测算法请看上一篇文章,感谢您的阅读!
学习算法
已知观测序列 O = ( o 1 , o 2 , … , o T ) , b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) O=\left(o_{1}, o_{2}, \ldots, o_{T}\right), b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) O=(o1,o2,…,oT),bj(ot)=∑m=1McjmN(ot;μjm,Σjm),估计GMM-HMM参数 λ \lambda λ,使 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)最大。参数 λ \lambda λ包括:
- 初始状态概率向量 π = ( π i ) \pi = (\pi_i) π=(πi)
- 转移概率矩阵 A = [ a i j ] N × N A=[a_{ij}]_{N\times N} A=[aij]N×N
- 状态 j j j的GMM参数 ( c j m , μ j m , Σ j m ) (c_{jm},\mu_{jm},\Sigma_{jm}) (cjm,μjm,Σjm), j = 1 , 2 , . . . , N ; m = 1 , . . . , M j=1,2,...,N; m=1,...,M j=1,2,...,N;m=1,...,M表示GMM分量标号。
Viterbi学习算法
如果已知状态-观测对齐序列,每个观测 o t o_t ot对应一个具体的状态,状态-观测对齐序列可通过Viterbi解码算法得到,也可通过人工标注得到。知道每个观测对应的状态,则:
π i \pi_i πi可通过最大似然估计得到:
- 令 C ( i ) C(i) C(i)表示初始状态为 i i i 的次数
-
π ^ i = C ( i ) ∑ k C ( k ) \widehat{\pi}_{i}=\frac{C(i)}{\sum_{k} C(k)} π
i=∑kC(k)C(i)
a i j a_{ij} aij也可通过最大似然估计得到:
- 令 C ( i → j ) C(i \rightarrow j) C(i→j)表示从状态 i i i到状态 j j j的转移次数
- a ^ i j = C ( i → j ) ∑ k C ( i → k ) \hat{a}_{i j}=\frac{C(i \rightarrow j)}{\sum_{k} C(i \rightarrow k)} a^ij=∑kC(i→k)C(i→j)
可得每个状态 j j j对应的观测集合 Z j = ( y 1 , y 2 , … , y N ) Z_{j}=\left(y_{1}, y_{2}, \ldots, y_{N}\right) Zj=(y1,y2,…,yN),每个状态对应一个GMM,也就得到了每个GMM对应的观测集合 Z j = ( y 1 , y 2 , … , y N ) Z_{j}=\left(y_{1}, y_{2}, \ldots, y_{N}\right) Zj=(y1,y2,…,yN)。
问题:用viterbi算法得到对齐序列需要用到模型参数 λ \lambda λ,最初的 λ \lambda λ怎么得到?
当GMM只有一个分量, b j ( o t ) = N ( o t ; μ j , Σ j ) b_{j}\left(o_{t}\right)=\mathcal{N}\left(o_{t} ; \mu_{j}, \Sigma_{j}\right) bj(ot)=N(ot;μj,Σj), ∣ Z j ∣ |Z_j| ∣Zj∣表示 Z j Z_j Zj的元素个数,则:
μ ^ j = ∑ o t ∈ Z j o t ∣ Z j ∣ \hat{\mu}_{j}=\frac{\sum_{o_{t} \in Z_{j}} o_{t}}{\left|Z_{j}\right|} μ^j=∣Zj∣∑ot∈Zjot
Σ ^ j = ∑ o t ∈ Z j ( o t − μ ^ j ) ( o t − μ ^ j ) T ∣ Z j ∣ \hat{\Sigma}_{j}=\frac{\sum_{o_{t} \in Z_{j}}\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\left|Z_{j}\right|} Σ^j=∣Zj∣∑ot∈Zj(ot−μ^j)(ot−μ^j)T
当GMM有多个分量, b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) bj(ot)=∑m=1McjmN(ot;μjm,Σjm),可以利用EM算法进行更新(参考上一篇文章)。
Viterbi学习算法
- 初始化GMM-HMM参数 λ = ( π i , a i j , G M M 参 数 ) \lambda = (\pi_i,a_{ij},GMM参数) λ=(πi,aij,GMM参数),其中每个状态 j j j对应的GMM参数为 ( c j m , μ j m , Σ j m ) (c_{jm},\mu_{jm},\Sigma_{jm}) (cjm,μjm,Σjm)
- 基于GMM-HMM参数 λ \lambda λ和Viterbi算法得到状态-观测对齐,得到每个观测对应的隐藏状态
- 更新参数 λ \lambda λ
- π ^ i = C ( i ) ∑ k C ( k ) , C ( i ) \hat{\pi}_{i}=\frac{C(i)}{\sum_{k} C(k)}, C(i) π^i=∑kC(k)C(i),C(i) 表示初始状态为 i i i 的次数
- a ^ i j = C ( i → j ) ∑ k C ( i → k ) , C ( i → j ) \hat{a}_{i j}=\frac{C(i \rightarrow j)}{\sum_{k} C(i \rightarrow k)}, C(i \rightarrow j) a^ij=∑kC(i→k)C(i→j),C(i→j) 表示从状态 i i i 到状态 j j j 的转移次数
- 用上一篇文章将的EM算法更新GMM参数
- 重复2,3步,直到收敛
Baum-Welch学习算法
首先看单分量GMM的情况
Viterbi学习算法是一种近似,只考虑了最优对齐路径。而每个时刻 t t t每个状态 j j j以一定的概率出现,而不是硬对齐(每个时刻只对应一个状态)。
状态占用概率:给定模型 λ \lambda λ和观测 O O O,在时刻 t t t处于状态 q i q_i qi的概率为 γ t ( i ) \gamma_{t}(i) γt(i):
γ t ( i ) = P ( i t = q i ∣ O , λ ) = α t ( i ) β t ( i ) ∑ i = 1 N α T ( i ) \gamma_{t}(i)=P\left({i_{t}=q_{i}} \mid {O, \lambda}\right)=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{i=1}^{N} \alpha_{T}(i)} γt(i)=P(it=qi∣O,λ)=∑i=1NαT(i)αt(i)βt(i)
可以将这个概率用于EM算法,学习到参数 λ \lambda λ:
- E步:估计状态占用概率
- M步:基于估计的状态占用概率,重新估计参数 λ \lambda λ(最大化)
问题:证明 γ t ( i ) = P ( i t = q i ∣ O , λ ) = α t ( i ) β t ( i ) ∑ i = 1 N α T ( i ) \gamma_{t}(i)=P\left({i_{t}=q_{i}} \mid {O, \lambda}\right)=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{i=1}^{N} \alpha_{T}(i)} γt(i)=P(it=qi∣O,λ)=∑i=1NαT(i)αt(i)βt(i)成立。
证:省略 λ \lambda λ, o 1 t = o 1 , o 2 , . . . , o t o_1^t = o_1,o_2,...,o_t o1t=o1,o2,...,ot, O = o 1 T O=o_1^T O=o1T
P ( i t = q i , O ∣ λ ) = P ( i t = q i , o 1 t , o t + 1 T ) = P ( i t = q i , o 1 t ) P ( o t + 1 T ∣ i t = q i , o 1 t ) = P ( i t = q i , o 1 t ) P ( o t + 1 T ∣ i t = q i ) = α t ( i ) β t ( i ) P ( O ∣ λ ) = ∑ j = 1 N α T ( j ) P ( i t = q i ∣ O , λ ) = P ( i t = q i , O ∣ λ ) P ( O ∣ λ ) = α t ( i ) β t ( i ) ∑ j = 1 N α T ( j ) \begin{aligned} P\left(i_{t}=q_{i}, O \mid \lambda\right) &=P\left(i_{t}=q_{i}, o_{1}^{t}, o_{t+1}^{T}\right) \\ &=P\left(i_{t}=q_{i}, o_{1}^{t}\right) P\left(o_{t+1}^{T} \mid i_{t}=q_{i}, o_{1}^{t}\right) \\ &=P\left(i_{t}=q_{i}, o_{1}^{t}\right) P\left(o_{t+1}^{T} \mid i_{t}=q_{i}\right) \\ &=\alpha_{t}(i) \beta_{t}(i) \\ P(O \mid \lambda) &=\sum_{j=1}^{N} \alpha_{T}(j) \\ P\left(i_{t}=q_{i} \mid O, \lambda\right) &=\frac{P\left(i_{t}=q_{i}, O \mid \lambda\right)}{P(O \mid \lambda)}=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{T}(j)} \end{aligned} P(it=qi,O∣λ)P(O∣λ)P(it=qi∣O,λ)=P(it=qi,o1t,ot+1T)=P(it=qi,o1t)P(ot+1T∣it=qi,o1t)=P(it=qi,o1t)P(ot+1T∣it=qi)=αt(i)βt(i)=j=1∑NαT(j)=P(O∣λ)P(it=qi,O∣λ)=∑j=1NαT(j)αt(i)βt(i)
对于某个状态,将所有时刻的状态占用概率相加,可认为是一个软次数,使用该软次数重新估计HMM参数:
μ ^ j = ∑ t = 1 T γ t ( j ) o t ∑ t = 1 T γ t ( j ) Σ ^ j = ∑ t = 1 T γ t ( j ) ( o t − μ ^ j ) ( o t − μ ^ j ) T ∑ t = 1 T γ t ( i ) \begin{array}{c} \hat{\mu}_{j}=\frac{\sum_{t=1}^{T} \gamma_{t}(j) o_{t}}{\sum_{t=1}^{T} \gamma_{t}(j)} \\ \hat{\Sigma}_{j}=\frac{\sum_{t=1}^{T} \gamma_{t}(j)\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\sum_{t=1}^{T} \gamma_{t}(i)} \end{array} μ^j=∑t=1Tγt(j)∑t=1Tγt(j)otΣ^j=∑t=1Tγt(i)∑t=1Tγt(j)(ot−μ^j)(ot−μ^j)T
对比Viterbi算法(硬次数):
μ ^ j = ∑ o t ∈ Z j o t ∣ Z j ∣ Σ ^ j = ∑ o t ∈ Z j ( o t − μ ^ j ) ( o t − μ ^ j ) T ∣ Z j ∣ \hat{\mu}_{j}=\frac{\sum_{o_{t} \in Z_{j}} o_{t}}{\left|Z_{j}\right|} \\ \hat{\Sigma}_{j}=\frac{\sum_{o_{t} \in Z_{j}}\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\left|Z_{j}\right|} μ^j=∣Zj∣∑ot∈ZjotΣ^j=∣Zj∣∑ot∈Zj(ot−μ^j)(ot−μ^j)T
当GMM有多个分量
当 b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) bj(ot)=∑m=1McjmN(ot;μjm,Σjm) 时,与前面类似,定义给定模型 λ \lambda λ 和观测 O O O ,在时刻 t t t 处于状态 q j q_{j} qj 且为GMM第 k k k 个分量的概率为 ζ t ( j , k ) \zeta_{t}(j, k) ζt(j,k)
ζ t ( j , k ) = P ( i t = q j , m t = k ∣ 0 , λ ) = P ( i t = q j , m t = k , 0 ∣ λ ) P ( O ∣ λ ) = ∑ i α t − 1 ( i ) a i j c j k b j k ( o t ) β t ( j ) ∑ i = 1 N α T ( i ) \begin{aligned} \zeta_{t}(j, k) &=P\left(i_{t}=q_{j}, m_{t}=k \mid 0, \lambda\right) \\ &=\frac{P\left(i_{t}=q_{j}, m_{t}=k, 0 \mid \lambda\right)}{P(O \mid \lambda)} \\ &=\frac{\sum_{i} \alpha_{t-1}(i) a_{i j} c_{j k} b_{j k}\left(o_{t}\right) \beta_{t}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} \end{aligned} ζt(j,k)=P(it=qj,mt=k∣0,λ)=P(O∣λ)P(it=qj,mt=k,0∣λ)=∑i=1NαT(i)∑iαt−1(i)aijcjkbjk(ot)βt(j)
则有
μ ^ j k = ∑ t = 1 T ζ t ( j , k ) o t ∑ t = 1 T ζ t ( j , k ) Σ ^ j k = ∑ t = 1 T ζ t ( j , k ) ( o t − μ ^ j k ) ( o t − μ ^ j k ) T ∑ t = 1 T ζ t ( j , k ) c ^ j k = ∑ t = 1 T ζ t ( j , k ) ∑ t = 1 T ∑ k ζ t ( j , k ) \begin{array}{l} \hat{\mu}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k) o_{t}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{\Sigma}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)\left(o_{t}-\hat{\mu}_{j k}\right)\left(o_{t}-\hat{\mu}_{j k}\right)^{T}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{c}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)}{\sum_{t=1}^{T} \sum_{k} \zeta_{t}(j, k)} \end{array} μ^jk=∑t=1Tζt(j,k)∑t=1Tζt(j,k)otΣ^jk=∑t=1Tζt(j,k)∑t=1Tζt(j,k)(ot−μ^jk)(ot−μ^jk)Tc^jk=∑t=1T∑kζt(j,k)∑t=1Tζt(j,k)
与状态占用概率类似,定义给定模型 λ \lambda λ 和观测 O O O ,在时刻 t t t 处于状态 q i q_{i} qi 且在时刻 t + 1 t+1 t+1 处于 状态 q j q_{j} qj 的概率为 ξ t ( i , j ) \xi_{t}(\boldsymbol{i}, \boldsymbol{j}) ξt(i,j) :
ξ t ( i , j ) = P ( i t = q i , i t + 1 = q j ∣ 0 , λ ) = P ( i t = q i , i t + 1 = q j , O ∣ λ ) P ( O ∣ λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N α T ( i ) \begin{aligned} \xi_{t}(i, j) &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j} \mid 0, \lambda\right) \\ &=\frac{P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O \mid \lambda\right)}{P(O \mid \lambda)} \\ &=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} \end{aligned} ξt(i,j)=P(it=qi,it+1=qj∣0,λ)=P(O∣λ)P(it=qi,it+1=qj,O∣λ)=∑i=1NαT(i)αt(i)aijbj(ot+1)βt+1(j)
且有 γ t ( i ) = ∑ k = 1 N ξ t ( i , k ) \gamma_{t}(i)=\sum_{k=1}^{N} \xi_{t}(i, k) γt(i)=∑k=1Nξt(i,k)
由该概率可得转移概率和初始概率:
a ^ i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 ∑ k = 1 N ξ t ( i , k ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) π ^ i = γ 1 ( i ) \begin{aligned} \hat{a}_{i j} &=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \sum_{k=1}^{N} \xi_{t}(i, k)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} \\ \hat{\pi}_{i} &=\gamma_{1}(i) \end{aligned} a^ijπ^i=∑t=1T−1∑k=1Nξt(i,k)∑t=1T−1ξt(i,j)=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)=γ1(i)
问题:证明 ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N α T ( i ) \xi_{t}(i, j)=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} ξt(i,j)=∑i=1NαT(i)αt(i)aijbj(ot+1)βt+1(j)
证:由前向公式的递推证明可知 α t ( i ) a i j b j ( o t + 1 ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 ∣ λ ) \alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right)=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid \lambda\right) αt(i)aijbj(ot+1)=P(it=qi,it+1=qj,o1t+1∣λ)
后向公式定义可知 β t + 1 ( j ) = P ( o t + 2 T ∣ i t + 1 = q j , λ ) \beta_{t+1}(j)=P\left(o_{t+2}^{T} \mid i_{t+1}=q_{j}, \lambda\right) βt+1(j)=P(ot+2T∣it+1=qj,λ)
P ( i t = q i , i t + 1 = q j , 0 ∣ λ ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 , o t + 2 T ∣ λ ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 ∣ o t + 2 T , λ ) P ( o t + 2 T ∣ λ ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 ∣ λ ) P ( o t + 2 T ∣ i t + 1 = q j , λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) \begin{aligned} P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, 0 \mid \lambda\right) &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1}, o_{t+2}^{T} \mid \lambda\right) \\ &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid o_{t+2}^{T}, \lambda\right) P\left(o_{t+2}^{T} \mid \lambda\right) \\ &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid \lambda\right) P\left(o_{t+2}^{T} \mid i_{t+1}=q_{j}, \lambda\right) \\ &=\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j) \end{aligned} P(it=qi,it+1=qj,0∣λ)=P(it=qi,it+1=qj,o1t+1,ot+2T∣λ)=P(it=qi,it+1=qj,o1t+1∣ot+2T,λ)P(ot+2T∣λ)=P(it=qi,it+1=qj,o1t+1∣λ)P(ot+2T∣it+1=qj,λ)=αt(i)aijbj(ot+1)βt+1(j)
Baum-Welch学习算法总结
- 初始化GMM-HMM参数 λ = ( π i , a i j , ( c j m , μ j m , Σ j m ) ) \lambda=\left(\pi_{i}, a_{i j},\left(c_{j m}, \mu_{j m}, \Sigma_{j m}\right)\right) λ=(πi,aij,(cjm,μjm,Σjm))
- E步:对所有时间 t t t 、状态 i i i
- 递推计算前向概率 α t ( i ) \alpha_{t}(i) αt(i) 和后向概率 β t ( i ) \beta_{t}(i) βt(i)
- 计算 ζ t ( j , k ) = ∑ i α t − 1 ( i ) a i j c j k b j k ( o t ) β t ( j ) ∑ i = 1 N α T ( i ) , ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N α T ( i ) , γ t ( i ) = ∑ k = 1 N ξ t ( i , k ) \zeta_{t}(j, k)=\frac{\sum_{i} \alpha_{t-1}(i) a_{i j} c_{j k} b_{j k}\left(o_{t}\right) \beta_{t}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)}, \xi_{t}(i, j)=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)}, \gamma_{t}(i)=\sum_{k=1}^{N} \xi_{t}(i, k) ζt(j,k)=∑i=1NαT(i)∑iαt−1(i)aijcjkbjk(ot)βt(j),ξt(i,j)=∑i=1NαT(i)αt(i)aijbj(ot+1)βt+1(j),γt(i)=∑k=1Nξt(i,k)
- M步: 更新参数
μ ^ j k = ∑ t = 1 T ζ t ( j , k ) o t ∑ t = 1 T ζ t ( j , k ) Σ ^ j k = ∑ t = 1 T ζ t ( j , k ) ( o t − μ ^ j k ) ( o t − μ ^ j k ) T ∑ t = 1 T ζ t ( j , k ) c ^ j k = ∑ t = 1 T ζ t ( j , k ) ∑ t = 1 T ∑ k ζ t ( j , k ) a ^ i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 ∑ k = 1 N ξ t ( i , k ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) π ^ i = γ 1 ( i ) \begin{aligned} \hat{\mu}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k) o_{t}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{\Sigma}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)\left(o_{t}-\hat{\mu}_{j k}\right)\left(o_{t}-\hat{\mu}_{j k}\right)^{T}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{c}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)}{\sum_{t=1}^{T} \sum_{k} \zeta_{t}(j, k)} \\ \hat{a}_{i j} &=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \sum_{k=1}^{N} \xi_{t}(i, k)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} \\ \hat{\pi}_{i} &=\gamma_{1}(i) \end{aligned} μ^jkΣ^jkc^jka^ijπ^i=∑t=1Tζt(j,k)∑t=1Tζt(j,k)ot=∑t=1Tζt(j,k)∑t=1Tζt(j,k)(ot−μ^jk)(ot−μ^jk)T=∑t=1T∑kζt(j,k)∑t=1Tζt(j,k)=∑t=1T−1∑k=1Nξt(i,k)∑t=1T−1ξt(i,j)=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)=γ1(i)
4. 重复2,3步,直到收敛
python实现
考虑盒子和球模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π), 状态集合 Q = { 1 , 2 , 3 } Q=\{1,2,3\} Q={1,2,3}, 观测集合 V = { V=\{ V={ 红, 白 } \} }:
A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , π = ( 0.2 , 0.4 , 0.4 ) T A=\left[\begin{array}{lll}0.5 & 0.2 & 0.3 \\ 0.3 & 0.5 & 0.2 \\ 0.2 & 0.3 & 0.5\end{array}\right], \mathrm{B}=\left[\begin{array}{cc}0.5 & 0.5 \\ 0.4 & 0.6 \\ 0.7 & 0.3\end{array}\right], \pi=(0.2,0.4,0.4)^{T} A=⎣⎡0.50.30.20.20.50.30.30.20.5⎦⎤,B=⎣⎡0.50.40.70.50.60.3⎦⎤,π=(0.2,0.4,0.4)T
设 T = 3 , O = ( T=3, O=( T=3,O=( 红,白, 红)
- 实现前向算法和后向算法,分别计算 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)
- 实现Viterbi算法,求最优状态序列(最优路径)
前向算法:
公式:
初值:
- $\alpha_1(i) = \pi_ib_i(o_1),i=1,2,…,N $
-
递推:对 t = 1 , 2 , . . . , T − 1 t=1,2,...,T-1 t=1,2,...,T−1
α t + 1 ( i ) = [ ∑ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) \alpha_{t+1}(i)=\left[\sum_{j=1}^{N} \alpha_{t}(j) a_{j i}\right] b_{i}\left(o_{t+1}\right) αt+1(i)=[j=1∑Nαt(j)aji]bi(ot+1)
- 终止: P ( O ∣ λ ) = ∑ i = 1 N α T ( i ) P(O \mid \lambda)=\sum_{i=1}^{N} \alpha_{T}(i) P(O∣λ)=∑i=1NαT(i)
def forward_algorithm(O, HMM_model):
"""HMM Forward Algorithm.
Args:
O: (o1, o2, ..., oT), observations
HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
Return:
prob: the probability of HMM_model generating O.
"""
pi, A, B = HMM_model
T = len(O)
N = len(pi)
prob = 0.0
alphas = np.zeros((N, T))
for t in range(T):
for i in range(N):
if t == 0:
alphas[i][t] = pi[i] * B[i][O[t]]
else:
alphas[i][t] = np.dot([alpha[t-1] for alpha in alphas], [a[i] for a in A]) * B[i][O[t]]
prob = np.sum([alpha[T-1] for alpha in alphas])
return prob
后向算法:
初值: β T ( i ) = 1 , i = 1 , 2 , . . . , N \beta_{T}(i) = 1, i=1,2,...,N βT(i)=1,i=1,2,...,N
-
递推:对 t = T − 1 , T − 2 , . . . , 1 t=T-1,T-2,...,1 t=T−1,T−2,...,1
β t ( i ) = ∑ j = 1 N a i j b j ( o t + 1 ) β t + 1 ( j ) , i = 1 , 2 , ⋯ , N \beta_{t}(i)=\sum_{j=1}^{N} a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j), \quad i=1,2, \cdots, N βt(i)=j=1∑Naijbj(ot+1)βt+1(j),i=1,2,⋯,N
- 终止: P ( O ∣ λ ) = ∑ i = 1 N π i b i ( o 1 ) β 1 ( i ) P(O \mid \lambda)=\sum_{i=1}^{N} \pi_{i} b_{i}\left(o_{1}\right) \beta_{1}(i) P(O∣λ)=∑i=1Nπibi(o1)β1(i)
def backward_algorithm(O, HMM_model):
"""HMM Backward Algorithm.
Args:
O: (o1, o2, ..., oT), observations
HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
Return:
prob: the probability of HMM_model generating O.
"""
pi, A, B = HMM_model
T = len(O)
N = len(pi)
prob = 0.0
betas = np.zeros((N, T))
for i in range(N):
betas[i][0] = 1
for t in range(1, T):
for i in range(N):
for j in range(N):
betas[i][t] += A[i][j]*B[j][O[T-t]]*betas[j][t-1]
for i in range(N):
prob += pi[i]*B[i][O[0]]*betas[i][-1]
return prob
Viterbi算法:
公式:
δ t + 1 ( i ) = max i 1 , i 2 , ⋯ , i t P ( i t + 1 = i , i t , ⋯ , i 1 , o t + 1 , ⋯ , o 1 ∣ λ ) = max 1 ⩽ j ⩽ N [ δ t ( j ) a j i ] b i ( o t + 1 ) , i = 1 , 2 , ⋯ , N ; t = 1 , 2 , ⋯ , T − 1 \begin{aligned} \delta_{t+1}(i) &=\max _{i_1, i_2, \cdots, i_t} P\left(i_{t+1}=i, i_{t}, \cdots, i_{1}, o_{t+1}, \cdots, o_{1} \mid \lambda\right) \\ &=\max _{1 \leqslant j \leqslant N}\left[\delta_{t}(j) a_{j i}\right] b_{i}\left(o_{t+1}\right), \quad i=1,2, \cdots, N ; t=1,2, \cdots, T-1 \end{aligned} δt+1(i)=i1,i2,⋯,itmaxP(it+1=i,it,⋯,i1,ot+1,⋯,o1∣λ)=1⩽j⩽Nmax[δt(j)aji]bi(ot+1),i=1,2,⋯,N;t=1,2,⋯,T−1
ψ t ( i ) = arg max 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] , i = 1 , 2 , ⋯ , N \psi_{t}(i)=\arg \max _{1 \leqslant j \leqslant N}\left[\delta_{t-1}(j) a_{j i}\right], \quad i=1,2, \cdots, N ψt(i)=arg1⩽j⩽Nmax[δt−1(j)aji],i=1,2,⋯,N
def Viterbi_algorithm(O, HMM_model):
"""Viterbi decoding.
Args:
O: (o1, o2, ..., oT), observations
HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
Returns:
best_prob: the probability of the best state sequence
best_path: the best state sequence
"""
pi, A, B = HMM_model
T = len(O)
N = len(pi)
best_prob, best_path = 0.0, []
# Begin Assignment
deltas = np.zeros((N,T), dtype=np.float64)
nodes = np.zeros((N,T), dtype=np.int)
for i in range(N):
deltas[0][i] = pi[i]*B[i][0]
for t in range(1, T):
tmp = [deltas[t-1][j] * A[j][i] for j in range(N)]
nodes[t][i] = int(np.argmax(tmp))
deltas[t][i] = tmp[nodes[t][i]] * B[i][O[t]]
best_path = np.zeros((T), dtype=np.int)
best_path[T-1] = np.argmax(deltas[T-1])
for t in range(T-2, -1, -1):
best_path[t] = nodes[t+1][best_path[t+1]]
#
best_prob = deltas[best_path[1]][best_path[-1]]
# transform the state as index in python start from '0'
best_path = [(val+1) for val in best_path]
return best_prob, best_path