天天看点

隐马尔科夫模型HMM详解(2)——python实现

目录

      • 学习算法
        • Viterbi学习算法
        • Baum-Welch学习算法
      • python实现

代码地址:https://gitee.com/liangcd/speech_learning/tree/master/HMM

HMM基本概念、概率计算、预测算法请看上一篇文章,感谢您的阅读!

学习算法

已知观测序列 O = ( o 1 , o 2 , … , o T ) , b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) O=\left(o_{1}, o_{2}, \ldots, o_{T}\right), b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) O=(o1​,o2​,…,oT​),bj​(ot​)=∑m=1M​cjm​N(ot​;μjm​,Σjm​),估计GMM-HMM参数 λ \lambda λ,使 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)最大。参数 λ \lambda λ包括:

  • 初始状态概率向量 π = ( π i ) \pi = (\pi_i) π=(πi​)
  • 转移概率矩阵 A = [ a i j ] N × N A=[a_{ij}]_{N\times N} A=[aij​]N×N​
  • 状态 j j j的GMM参数 ( c j m , μ j m , Σ j m ) (c_{jm},\mu_{jm},\Sigma_{jm}) (cjm​,μjm​,Σjm​), j = 1 , 2 , . . . , N ; m = 1 , . . . , M j=1,2,...,N; m=1,...,M j=1,2,...,N;m=1,...,M表示GMM分量标号。

Viterbi学习算法

如果已知状态-观测对齐序列,每个观测 o t o_t ot​对应一个具体的状态,状态-观测对齐序列可通过Viterbi解码算法得到,也可通过人工标注得到。知道每个观测对应的状态,则:

π i \pi_i πi​可通过最大似然估计得到:

  • 令 C ( i ) C(i) C(i)表示初始状态为 i i i 的次数
  • π ^ i = C ( i ) ∑ k C ( k ) \widehat{\pi}_{i}=\frac{C(i)}{\sum_{k} C(k)} π

    i​=∑k​C(k)C(i)​

a i j a_{ij} aij​也可通过最大似然估计得到:

  • 令 C ( i → j ) C(i \rightarrow j) C(i→j)表示从状态 i i i到状态 j j j的转移次数
  • a ^ i j = C ( i → j ) ∑ k C ( i → k ) \hat{a}_{i j}=\frac{C(i \rightarrow j)}{\sum_{k} C(i \rightarrow k)} a^ij​=∑k​C(i→k)C(i→j)​

可得每个状态 j j j对应的观测集合 Z j = ( y 1 , y 2 , … , y N ) Z_{j}=\left(y_{1}, y_{2}, \ldots, y_{N}\right) Zj​=(y1​,y2​,…,yN​),每个状态对应一个GMM,也就得到了每个GMM对应的观测集合 Z j = ( y 1 , y 2 , … , y N ) Z_{j}=\left(y_{1}, y_{2}, \ldots, y_{N}\right) Zj​=(y1​,y2​,…,yN​)。

问题:用viterbi算法得到对齐序列需要用到模型参数 λ \lambda λ,最初的 λ \lambda λ怎么得到?

当GMM只有一个分量, b j ( o t ) = N ( o t ; μ j , Σ j ) b_{j}\left(o_{t}\right)=\mathcal{N}\left(o_{t} ; \mu_{j}, \Sigma_{j}\right) bj​(ot​)=N(ot​;μj​,Σj​), ∣ Z j ∣ |Z_j| ∣Zj​∣表示 Z j Z_j Zj​的元素个数,则:

μ ^ j = ∑ o t ∈ Z j o t ∣ Z j ∣ \hat{\mu}_{j}=\frac{\sum_{o_{t} \in Z_{j}} o_{t}}{\left|Z_{j}\right|} μ^​j​=∣Zj​∣∑ot​∈Zj​​ot​​

Σ ^ j = ∑ o t ∈ Z j ( o t − μ ^ j ) ( o t − μ ^ j ) T ∣ Z j ∣ \hat{\Sigma}_{j}=\frac{\sum_{o_{t} \in Z_{j}}\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\left|Z_{j}\right|} Σ^j​=∣Zj​∣∑ot​∈Zj​​(ot​−μ^​j​)(ot​−μ^​j​)T​

当GMM有多个分量, b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) bj​(ot​)=∑m=1M​cjm​N(ot​;μjm​,Σjm​),可以利用EM算法进行更新(参考上一篇文章)。

Viterbi学习算法
  1. 初始化GMM-HMM参数 λ = ( π i , a i j , G M M 参 数 ) \lambda = (\pi_i,a_{ij},GMM参数) λ=(πi​,aij​,GMM参数),其中每个状态 j j j对应的GMM参数为 ( c j m , μ j m , Σ j m ) (c_{jm},\mu_{jm},\Sigma_{jm}) (cjm​,μjm​,Σjm​)
  2. 基于GMM-HMM参数 λ \lambda λ和Viterbi算法得到状态-观测对齐,得到每个观测对应的隐藏状态
  3. 更新参数 λ \lambda λ
    • π ^ i = C ( i ) ∑ k C ( k ) , C ( i ) \hat{\pi}_{i}=\frac{C(i)}{\sum_{k} C(k)}, C(i) π^i​=∑k​C(k)C(i)​,C(i) 表示初始状态为 i i i 的次数
    • a ^ i j = C ( i → j ) ∑ k C ( i → k ) , C ( i → j ) \hat{a}_{i j}=\frac{C(i \rightarrow j)}{\sum_{k} C(i \rightarrow k)}, C(i \rightarrow j) a^ij​=∑k​C(i→k)C(i→j)​,C(i→j) 表示从状态 i i i 到状态 j j j 的转移次数
    • 用上一篇文章将的EM算法更新GMM参数
  4. 重复2,3步,直到收敛

Baum-Welch学习算法

首先看单分量GMM的情况

Viterbi学习算法是一种近似,只考虑了最优对齐路径。而每个时刻 t t t每个状态 j j j以一定的概率出现,而不是硬对齐(每个时刻只对应一个状态)。

状态占用概率:给定模型 λ \lambda λ和观测 O O O,在时刻 t t t处于状态 q i q_i qi​的概率为 γ t ( i ) \gamma_{t}(i) γt​(i):

γ t ( i ) = P ( i t = q i ∣ O , λ ) = α t ( i ) β t ( i ) ∑ i = 1 N α T ( i ) \gamma_{t}(i)=P\left({i_{t}=q_{i}} \mid {O, \lambda}\right)=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{i=1}^{N} \alpha_{T}(i)} γt​(i)=P(it​=qi​∣O,λ)=∑i=1N​αT​(i)αt​(i)βt​(i)​

可以将这个概率用于EM算法,学习到参数 λ \lambda λ:

  • E步:估计状态占用概率
  • M步:基于估计的状态占用概率,重新估计参数 λ \lambda λ(最大化)

问题:证明 γ t ( i ) = P ( i t = q i ∣ O , λ ) = α t ( i ) β t ( i ) ∑ i = 1 N α T ( i ) \gamma_{t}(i)=P\left({i_{t}=q_{i}} \mid {O, \lambda}\right)=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{i=1}^{N} \alpha_{T}(i)} γt​(i)=P(it​=qi​∣O,λ)=∑i=1N​αT​(i)αt​(i)βt​(i)​成立。

证:省略 λ \lambda λ, o 1 t = o 1 , o 2 , . . . , o t o_1^t = o_1,o_2,...,o_t o1t​=o1​,o2​,...,ot​, O = o 1 T O=o_1^T O=o1T​

P ( i t = q i , O ∣ λ ) = P ( i t = q i , o 1 t , o t + 1 T ) = P ( i t = q i , o 1 t ) P ( o t + 1 T ∣ i t = q i , o 1 t ) = P ( i t = q i , o 1 t ) P ( o t + 1 T ∣ i t = q i ) = α t ( i ) β t ( i ) P ( O ∣ λ ) = ∑ j = 1 N α T ( j ) P ( i t = q i ∣ O , λ ) = P ( i t = q i , O ∣ λ ) P ( O ∣ λ ) = α t ( i ) β t ( i ) ∑ j = 1 N α T ( j ) \begin{aligned} P\left(i_{t}=q_{i}, O \mid \lambda\right) &=P\left(i_{t}=q_{i}, o_{1}^{t}, o_{t+1}^{T}\right) \\ &=P\left(i_{t}=q_{i}, o_{1}^{t}\right) P\left(o_{t+1}^{T} \mid i_{t}=q_{i}, o_{1}^{t}\right) \\ &=P\left(i_{t}=q_{i}, o_{1}^{t}\right) P\left(o_{t+1}^{T} \mid i_{t}=q_{i}\right) \\ &=\alpha_{t}(i) \beta_{t}(i) \\ P(O \mid \lambda) &=\sum_{j=1}^{N} \alpha_{T}(j) \\ P\left(i_{t}=q_{i} \mid O, \lambda\right) &=\frac{P\left(i_{t}=q_{i}, O \mid \lambda\right)}{P(O \mid \lambda)}=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{j=1}^{N} \alpha_{T}(j)} \end{aligned} P(it​=qi​,O∣λ)P(O∣λ)P(it​=qi​∣O,λ)​=P(it​=qi​,o1t​,ot+1T​)=P(it​=qi​,o1t​)P(ot+1T​∣it​=qi​,o1t​)=P(it​=qi​,o1t​)P(ot+1T​∣it​=qi​)=αt​(i)βt​(i)=j=1∑N​αT​(j)=P(O∣λ)P(it​=qi​,O∣λ)​=∑j=1N​αT​(j)αt​(i)βt​(i)​​

对于某个状态,将所有时刻的状态占用概率相加,可认为是一个软次数,使用该软次数重新估计HMM参数:

μ ^ j = ∑ t = 1 T γ t ( j ) o t ∑ t = 1 T γ t ( j ) Σ ^ j = ∑ t = 1 T γ t ( j ) ( o t − μ ^ j ) ( o t − μ ^ j ) T ∑ t = 1 T γ t ( i ) \begin{array}{c} \hat{\mu}_{j}=\frac{\sum_{t=1}^{T} \gamma_{t}(j) o_{t}}{\sum_{t=1}^{T} \gamma_{t}(j)} \\ \hat{\Sigma}_{j}=\frac{\sum_{t=1}^{T} \gamma_{t}(j)\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\sum_{t=1}^{T} \gamma_{t}(i)} \end{array} μ^​j​=∑t=1T​γt​(j)∑t=1T​γt​(j)ot​​Σ^j​=∑t=1T​γt​(i)∑t=1T​γt​(j)(ot​−μ^​j​)(ot​−μ^​j​)T​​

对比Viterbi算法(硬次数):

μ ^ j = ∑ o t ∈ Z j o t ∣ Z j ∣ Σ ^ j = ∑ o t ∈ Z j ( o t − μ ^ j ) ( o t − μ ^ j ) T ∣ Z j ∣ \hat{\mu}_{j}=\frac{\sum_{o_{t} \in Z_{j}} o_{t}}{\left|Z_{j}\right|} \\ \hat{\Sigma}_{j}=\frac{\sum_{o_{t} \in Z_{j}}\left(o_{t}-\hat{\mu}_{j}\right)\left(o_{t}-\hat{\mu}_{j}\right)^{T}}{\left|Z_{j}\right|} μ^​j​=∣Zj​∣∑ot​∈Zj​​ot​​Σ^j​=∣Zj​∣∑ot​∈Zj​​(ot​−μ^​j​)(ot​−μ^​j​)T​

当GMM有多个分量

当 b j ( o t ) = ∑ m = 1 M c j m N ( o t ; μ j m , Σ j m ) b_{j}\left(o_{t}\right)=\sum_{m=1}^{M} c_{j m} \mathcal{N}\left(o_{t} ; \mu_{j m}, \Sigma_{j m}\right) bj​(ot​)=∑m=1M​cjm​N(ot​;μjm​,Σjm​) 时,与前面类似,定义给定模型 λ \lambda λ 和观测 O O O ,在时刻 t t t 处于状态 q j q_{j} qj​ 且为GMM第 k k k 个分量的概率为 ζ t ( j , k ) \zeta_{t}(j, k) ζt​(j,k)

ζ t ( j , k ) = P ( i t = q j , m t = k ∣ 0 , λ ) = P ( i t = q j , m t = k , 0 ∣ λ ) P ( O ∣ λ ) = ∑ i α t − 1 ( i ) a i j c j k b j k ( o t ) β t ( j ) ∑ i = 1 N α T ( i ) \begin{aligned} \zeta_{t}(j, k) &=P\left(i_{t}=q_{j}, m_{t}=k \mid 0, \lambda\right) \\ &=\frac{P\left(i_{t}=q_{j}, m_{t}=k, 0 \mid \lambda\right)}{P(O \mid \lambda)} \\ &=\frac{\sum_{i} \alpha_{t-1}(i) a_{i j} c_{j k} b_{j k}\left(o_{t}\right) \beta_{t}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} \end{aligned} ζt​(j,k)​=P(it​=qj​,mt​=k∣0,λ)=P(O∣λ)P(it​=qj​,mt​=k,0∣λ)​=∑i=1N​αT​(i)∑i​αt−1​(i)aij​cjk​bjk​(ot​)βt​(j)​​

则有

μ ^ j k = ∑ t = 1 T ζ t ( j , k ) o t ∑ t = 1 T ζ t ( j , k ) Σ ^ j k = ∑ t = 1 T ζ t ( j , k ) ( o t − μ ^ j k ) ( o t − μ ^ j k ) T ∑ t = 1 T ζ t ( j , k ) c ^ j k = ∑ t = 1 T ζ t ( j , k ) ∑ t = 1 T ∑ k ζ t ( j , k ) \begin{array}{l} \hat{\mu}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k) o_{t}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{\Sigma}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)\left(o_{t}-\hat{\mu}_{j k}\right)\left(o_{t}-\hat{\mu}_{j k}\right)^{T}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{c}_{j k}=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)}{\sum_{t=1}^{T} \sum_{k} \zeta_{t}(j, k)} \end{array} μ^​jk​=∑t=1T​ζt​(j,k)∑t=1T​ζt​(j,k)ot​​Σ^jk​=∑t=1T​ζt​(j,k)∑t=1T​ζt​(j,k)(ot​−μ^​jk​)(ot​−μ^​jk​)T​c^jk​=∑t=1T​∑k​ζt​(j,k)∑t=1T​ζt​(j,k)​​

与状态占用概率类似,定义给定模型 λ \lambda λ 和观测 O O O ,在时刻 t t t 处于状态 q i q_{i} qi​ 且在时刻 t + 1 t+1 t+1 处于 状态 q j q_{j} qj​ 的概率为 ξ t ( i , j ) \xi_{t}(\boldsymbol{i}, \boldsymbol{j}) ξt​(i,j) :

ξ t ( i , j ) = P ( i t = q i , i t + 1 = q j ∣ 0 , λ ) = P ( i t = q i , i t + 1 = q j , O ∣ λ ) P ( O ∣ λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N α T ( i ) \begin{aligned} \xi_{t}(i, j) &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j} \mid 0, \lambda\right) \\ &=\frac{P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, O \mid \lambda\right)}{P(O \mid \lambda)} \\ &=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} \end{aligned} ξt​(i,j)​=P(it​=qi​,it+1​=qj​∣0,λ)=P(O∣λ)P(it​=qi​,it+1​=qj​,O∣λ)​=∑i=1N​αT​(i)αt​(i)aij​bj​(ot+1​)βt+1​(j)​​

且有 γ t ( i ) = ∑ k = 1 N ξ t ( i , k ) \gamma_{t}(i)=\sum_{k=1}^{N} \xi_{t}(i, k) γt​(i)=∑k=1N​ξt​(i,k)

由该概率可得转移概率和初始概率:

a ^ i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 ∑ k = 1 N ξ t ( i , k ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) π ^ i = γ 1 ( i ) \begin{aligned} \hat{a}_{i j} &=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \sum_{k=1}^{N} \xi_{t}(i, k)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} \\ \hat{\pi}_{i} &=\gamma_{1}(i) \end{aligned} a^ij​π^i​​=∑t=1T−1​∑k=1N​ξt​(i,k)∑t=1T−1​ξt​(i,j)​=∑t=1T−1​γt​(i)∑t=1T−1​ξt​(i,j)​=γ1​(i)​

问题:证明 ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N α T ( i ) \xi_{t}(i, j)=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)} ξt​(i,j)=∑i=1N​αT​(i)αt​(i)aij​bj​(ot+1​)βt+1​(j)​

证:由前向公式的递推证明可知 α t ( i ) a i j b j ( o t + 1 ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 ∣ λ ) \alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right)=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid \lambda\right) αt​(i)aij​bj​(ot+1​)=P(it​=qi​,it+1​=qj​,o1t+1​∣λ)

后向公式定义可知 β t + 1 ( j ) = P ( o t + 2 T ∣ i t + 1 = q j , λ ) \beta_{t+1}(j)=P\left(o_{t+2}^{T} \mid i_{t+1}=q_{j}, \lambda\right) βt+1​(j)=P(ot+2T​∣it+1​=qj​,λ)

P ( i t = q i , i t + 1 = q j , 0 ∣ λ ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 , o t + 2 T ∣ λ ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 ∣ o t + 2 T , λ ) P ( o t + 2 T ∣ λ ) = P ( i t = q i , i t + 1 = q j , o 1 t + 1 ∣ λ ) P ( o t + 2 T ∣ i t + 1 = q j , λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) \begin{aligned} P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, 0 \mid \lambda\right) &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1}, o_{t+2}^{T} \mid \lambda\right) \\ &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid o_{t+2}^{T}, \lambda\right) P\left(o_{t+2}^{T} \mid \lambda\right) \\ &=P\left(i_{t}=q_{i}, i_{t+1}=q_{j}, o_{1}^{t+1} \mid \lambda\right) P\left(o_{t+2}^{T} \mid i_{t+1}=q_{j}, \lambda\right) \\ &=\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j) \end{aligned} P(it​=qi​,it+1​=qj​,0∣λ)​=P(it​=qi​,it+1​=qj​,o1t+1​,ot+2T​∣λ)=P(it​=qi​,it+1​=qj​,o1t+1​∣ot+2T​,λ)P(ot+2T​∣λ)=P(it​=qi​,it+1​=qj​,o1t+1​∣λ)P(ot+2T​∣it+1​=qj​,λ)=αt​(i)aij​bj​(ot+1​)βt+1​(j)​

Baum-Welch学习算法总结
  1. 初始化GMM-HMM参数 λ = ( π i , a i j , ( c j m , μ j m , Σ j m ) ) \lambda=\left(\pi_{i}, a_{i j},\left(c_{j m}, \mu_{j m}, \Sigma_{j m}\right)\right) λ=(πi​,aij​,(cjm​,μjm​,Σjm​))
  2. E步:对所有时间 t t t 、状态 i i i
    • 递推计算前向概率 α t ( i ) \alpha_{t}(i) αt​(i) 和后向概率 β t ( i ) \beta_{t}(i) βt​(i)
    • 计算 ζ t ( j , k ) = ∑ i α t − 1 ( i ) a i j c j k b j k ( o t ) β t ( j ) ∑ i = 1 N α T ( i ) , ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∑ i = 1 N α T ( i ) , γ t ( i ) = ∑ k = 1 N ξ t ( i , k ) \zeta_{t}(j, k)=\frac{\sum_{i} \alpha_{t-1}(i) a_{i j} c_{j k} b_{j k}\left(o_{t}\right) \beta_{t}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)}, \xi_{t}(i, j)=\frac{\alpha_{t}(i) a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j)}{\sum_{i=1}^{N} \alpha_{T}(i)}, \gamma_{t}(i)=\sum_{k=1}^{N} \xi_{t}(i, k) ζt​(j,k)=∑i=1N​αT​(i)∑i​αt−1​(i)aij​cjk​bjk​(ot​)βt​(j)​,ξt​(i,j)=∑i=1N​αT​(i)αt​(i)aij​bj​(ot+1​)βt+1​(j)​,γt​(i)=∑k=1N​ξt​(i,k)
  3. M步: 更新参数

μ ^ j k = ∑ t = 1 T ζ t ( j , k ) o t ∑ t = 1 T ζ t ( j , k ) Σ ^ j k = ∑ t = 1 T ζ t ( j , k ) ( o t − μ ^ j k ) ( o t − μ ^ j k ) T ∑ t = 1 T ζ t ( j , k ) c ^ j k = ∑ t = 1 T ζ t ( j , k ) ∑ t = 1 T ∑ k ζ t ( j , k ) a ^ i j = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 ∑ k = 1 N ξ t ( i , k ) = ∑ t = 1 T − 1 ξ t ( i , j ) ∑ t = 1 T − 1 γ t ( i ) π ^ i = γ 1 ( i ) \begin{aligned} \hat{\mu}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k) o_{t}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{\Sigma}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)\left(o_{t}-\hat{\mu}_{j k}\right)\left(o_{t}-\hat{\mu}_{j k}\right)^{T}}{\sum_{t=1}^{T} \zeta_{t}(j, k)} \\ \hat{c}_{j k} &=\frac{\sum_{t=1}^{T} \zeta_{t}(j, k)}{\sum_{t=1}^{T} \sum_{k} \zeta_{t}(j, k)} \\ \hat{a}_{i j} &=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \sum_{k=1}^{N} \xi_{t}(i, k)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)} \\ \hat{\pi}_{i} &=\gamma_{1}(i) \end{aligned} μ^​jk​Σ^jk​c^jk​a^ij​π^i​​=∑t=1T​ζt​(j,k)∑t=1T​ζt​(j,k)ot​​=∑t=1T​ζt​(j,k)∑t=1T​ζt​(j,k)(ot​−μ^​jk​)(ot​−μ^​jk​)T​=∑t=1T​∑k​ζt​(j,k)∑t=1T​ζt​(j,k)​=∑t=1T−1​∑k=1N​ξt​(i,k)∑t=1T−1​ξt​(i,j)​=∑t=1T−1​γt​(i)∑t=1T−1​ξt​(i,j)​=γ1​(i)​

4. 重复2,3步,直到收敛

python实现

考虑盒子和球模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π), 状态集合 Q = { 1 , 2 , 3 } Q=\{1,2,3\} Q={1,2,3}, 观测集合 V = { V=\{ V={ 红, 白 } \} }:

A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , π = ( 0.2 , 0.4 , 0.4 ) T A=\left[\begin{array}{lll}0.5 & 0.2 & 0.3 \\ 0.3 & 0.5 & 0.2 \\ 0.2 & 0.3 & 0.5\end{array}\right], \mathrm{B}=\left[\begin{array}{cc}0.5 & 0.5 \\ 0.4 & 0.6 \\ 0.7 & 0.3\end{array}\right], \pi=(0.2,0.4,0.4)^{T} A=⎣⎡​0.50.30.2​0.20.50.3​0.30.20.5​⎦⎤​,B=⎣⎡​0.50.40.7​0.50.60.3​⎦⎤​,π=(0.2,0.4,0.4)T

设 T = 3 , O = ( T=3, O=( T=3,O=( 红,白, 红)

  • 实现前向算法和后向算法,分别计算 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)
  • 实现Viterbi算法,求最优状态序列(最优路径)

前向算法:

公式:

初值:

  1. $\alpha_1(i) = \pi_ib_i(o_1),i=1,2,…,N $
  2. 递推:对 t = 1 , 2 , . . . , T − 1 t=1,2,...,T-1 t=1,2,...,T−1

    α t + 1 ( i ) = [ ∑ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) \alpha_{t+1}(i)=\left[\sum_{j=1}^{N} \alpha_{t}(j) a_{j i}\right] b_{i}\left(o_{t+1}\right) αt+1​(i)=[j=1∑N​αt​(j)aji​]bi​(ot+1​)

  3. 终止: P ( O ∣ λ ) = ∑ i = 1 N α T ( i ) P(O \mid \lambda)=\sum_{i=1}^{N} \alpha_{T}(i) P(O∣λ)=∑i=1N​αT​(i)
def forward_algorithm(O, HMM_model):
    """HMM Forward Algorithm.
    Args:
        O: (o1, o2, ..., oT), observations
        HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
    Return:
        prob: the probability of HMM_model generating O.
    """
    pi, A, B = HMM_model
    T = len(O)
    N = len(pi)
    prob = 0.0
    
    alphas = np.zeros((N, T))
    for t in range(T):
        for i in range(N):
            if t == 0:
                alphas[i][t] = pi[i] * B[i][O[t]]
            else:
                alphas[i][t] = np.dot([alpha[t-1] for alpha in alphas], [a[i] for a in A]) * B[i][O[t]]
    prob = np.sum([alpha[T-1] for alpha in alphas])
    
    return prob
           

后向算法:

初值: β T ( i ) = 1 , i = 1 , 2 , . . . , N \beta_{T}(i) = 1, i=1,2,...,N βT​(i)=1,i=1,2,...,N

  1. 递推:对 t = T − 1 , T − 2 , . . . , 1 t=T-1,T-2,...,1 t=T−1,T−2,...,1

    β t ( i ) = ∑ j = 1 N a i j b j ( o t + 1 ) β t + 1 ( j ) , i = 1 , 2 , ⋯   , N \beta_{t}(i)=\sum_{j=1}^{N} a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j), \quad i=1,2, \cdots, N βt​(i)=j=1∑N​aij​bj​(ot+1​)βt+1​(j),i=1,2,⋯,N

  2. 终止: P ( O ∣ λ ) = ∑ i = 1 N π i b i ( o 1 ) β 1 ( i ) P(O \mid \lambda)=\sum_{i=1}^{N} \pi_{i} b_{i}\left(o_{1}\right) \beta_{1}(i) P(O∣λ)=∑i=1N​πi​bi​(o1​)β1​(i)
def backward_algorithm(O, HMM_model):
    """HMM Backward Algorithm.
    Args:
        O: (o1, o2, ..., oT), observations
        HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
    Return:
        prob: the probability of HMM_model generating O.
    """
    pi, A, B = HMM_model
    T = len(O)
    N = len(pi)
    prob = 0.0
    betas = np.zeros((N, T))
    for i in range(N):
        betas[i][0] = 1
    for t in range(1, T):
        for i in range(N):
            for j in range(N):
                betas[i][t] += A[i][j]*B[j][O[T-t]]*betas[j][t-1]
    
    for i in range(N):
        prob += pi[i]*B[i][O[0]]*betas[i][-1]
    
    return prob
           

Viterbi算法:

公式:

δ t + 1 ( i ) = max ⁡ i 1 , i 2 , ⋯   , i t P ( i t + 1 = i , i t , ⋯   , i 1 , o t + 1 , ⋯   , o 1 ∣ λ ) = max ⁡ 1 ⩽ j ⩽ N [ δ t ( j ) a j i ] b i ( o t + 1 ) , i = 1 , 2 , ⋯   , N ; t = 1 , 2 , ⋯   , T − 1 \begin{aligned} \delta_{t+1}(i) &=\max _{i_1, i_2, \cdots, i_t} P\left(i_{t+1}=i, i_{t}, \cdots, i_{1}, o_{t+1}, \cdots, o_{1} \mid \lambda\right) \\ &=\max _{1 \leqslant j \leqslant N}\left[\delta_{t}(j) a_{j i}\right] b_{i}\left(o_{t+1}\right), \quad i=1,2, \cdots, N ; t=1,2, \cdots, T-1 \end{aligned} δt+1​(i)​=i1​,i2​,⋯,it​max​P(it+1​=i,it​,⋯,i1​,ot+1​,⋯,o1​∣λ)=1⩽j⩽Nmax​[δt​(j)aji​]bi​(ot+1​),i=1,2,⋯,N;t=1,2,⋯,T−1​

ψ t ( i ) = arg ⁡ max ⁡ 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] , i = 1 , 2 , ⋯   , N \psi_{t}(i)=\arg \max _{1 \leqslant j \leqslant N}\left[\delta_{t-1}(j) a_{j i}\right], \quad i=1,2, \cdots, N ψt​(i)=arg1⩽j⩽Nmax​[δt−1​(j)aji​],i=1,2,⋯,N

def Viterbi_algorithm(O, HMM_model):
    """Viterbi decoding.
    Args:
        O: (o1, o2, ..., oT), observations
        HMM_model: (pi, A, B), (init state prob, transition prob, emitting prob)
    Returns:
        best_prob: the probability of the best state sequence
        best_path: the best state sequence
    """
    pi, A, B = HMM_model
    T = len(O)
    N = len(pi)
    best_prob, best_path = 0.0, []
    # Begin Assignment
    deltas = np.zeros((N,T), dtype=np.float64)
    nodes = np.zeros((N,T), dtype=np.int)
    for i in range(N):
        deltas[0][i] = pi[i]*B[i][0]
    for t in range(1, T):
        tmp = [deltas[t-1][j] * A[j][i] for j in range(N)]
        nodes[t][i] = int(np.argmax(tmp))
        deltas[t][i] = tmp[nodes[t][i]] * B[i][O[t]]
    best_path = np.zeros((T), dtype=np.int)
    best_path[T-1] = np.argmax(deltas[T-1])
    for t in range(T-2, -1, -1):
        best_path[t] = nodes[t+1][best_path[t+1]]
    # 
    best_prob = deltas[best_path[1]][best_path[-1]]
    # transform the state as index in python start from '0'
    best_path = [(val+1) for val in best_path]
   
    return best_prob, best_path