白板推导Pytorch-隐马尔可夫模型(HMM)
状态转移矩阵和观测概率矩阵
状态转移矩阵
A
=
[
a
i
j
]
N
×
N
α
i
j
=
P
(
i
t
+
1
=
q
j
∣
i
t
=
q
i
)
\begin{aligned} A &=\left[a_{i j}\right]_{N \times N} \\ \alpha_{ij} &= P(i_{t+1} = q_j|i_t=q_i) \end{aligned}
Aαij??=[aij?]N×N?=P(it+1?=qj?∣it?=qi?)? 观测概率矩阵
B
=
[
b
j
(
k
)
]
N
×
M
b
j
(
k
)
=
P
(
o
t
=
v
k
∣
i
t
=
q
j
)
\begin{aligned} B &=\left[b_{j}(k)\right]_{N \times M} \\ b_j(k) &= P(o_{t}=v_k|i_t=q_j) \end{aligned}
Bbj?(k)?=[bj?(k)]N×M?=P(ot?=vk?∣it?=qj?)?
盒子和球模型
假设有4个盒子,每个盒子都装有红白两种颜色的球
按照下面的方法抽球,产生一个球的颜色的观测序列:开始,从4个盒子里以等概率随机选取1个盒子,从这个盒子里随机抽出1个球,记录其颜色后,放回;然后,从当前盒子随机转移到下一个盒子,规则是:如果当前盒子是盒子1, 那么下一盒子一定是盒子2,如果当前是盒子2或3,那么分别以概率0.4和0.6转移到左边或右边的盒子,如果当前是盒子4,那么各以0.5的概率停留在盒子4或转移到盒子3;确定转移的盒子后,再从这个盒子里随机抽出1个球,记录其 颜色,放回;如此下去,重复进行5次,得到一个球的颜色的观测序列
O
=
{
红
,
红
,
白
,
白
,
红
}
O = \{\text{红},\text{红},\text{白},\text{白},\text{红}\}
O={红,红,白,白,红} 在这个过程中,观察者只能观测到球的颜色的序列,观测不到球是从哪个盒子取出的,即观测不到盒子的序列
- 状态集合
Q
=
{
盒子
1
,
盒子
2
,
盒子
3
,
盒子
4
}
Q = \{\text{盒子}1,\text{盒子}2,\text{盒子}3,\text{盒子}4\}
Q={盒子1,盒子2,盒子3,盒子4}
- 观测集合
V
=
{
红
,
白
}
V = \{\text{红},\text{白}\}
V={红,白}
- 初始概率分布
π
=
(
0.25
,
0.25
,
0.25
,
0.25
)
T
\pi= (0.25, 0.25, 0.25, 0.25)^T
π=(0.25,0.25,0.25,0.25)T
- 状态转移矩阵
P
(
i
t
+
1
∣
i
1
,
i
2
,
.
.
.
,
i
t
,
o
1
,
o
2
,
.
.
.
o
t
)
P(i_{t+1}|i_1,i_2,...,i_t,o_1,o_2,...o_t)
P(it+1?∣i1?,i2?,...,it?,o1?,o2?,...ot?)
A
=
[
0
1
0
0
0.4
0
0.6
0
0
0.4
0
0.6
0
0
0.5
0.5
]
A=\left[\begin{array}{cccc} 0 & 1 & 0 & 0 \\ 0.4 & 0 & 0.6 & 0 \\ 0 & 0.4 & 0 & 0.6 \\ 0 & 0 & 0.5 & 0.5 \end{array}\right]
A=?????00.400?100.40?00.600.5?000.60.5??????
- 观测概率分布
P
(
o
t
+
1
∣
i
1
,
i
2
,
.
.
.
,
i
t
+
1
,
o
1
,
o
2
,
.
.
.
,
o
t
)
P(o_{t+1}|i_1,i_2,...,i_{t+1},o_1,o_2,...,o_t)
P(ot+1?∣i1?,i2?,...,it+1?,o1?,o2?,...,ot?)
B
=
[
0.5
0.5
0.3
0.7
0.6
0.4
0.8
0.2
]
B=\left[\begin{array}{ll} 0.5 & 0.5 \\ 0.3 & 0.7 \\ 0.6 & 0.4 \\ 0.8 & 0.2 \end{array}\right]
B=?????0.50.30.60.8?0.50.70.40.2??????
两个假设
齐次马尔可夫假设
P
(
i
t
∣
i
t
?
1
,
o
t
?
1
,
?
?
,
i
1
,
o
1
)
=
P
(
i
t
∣
i
t
?
1
)
,
t
=
1
,
2
,
?
?
,
T
P\left(i_{t} \mid i_{t-1}, o_{t-1}, \cdots, i_{1}, o_{1}\right)=P\left(i_{t} \mid i_{t-1}\right), \quad t=1,2, \cdots, T
P(it?∣it?1?,ot?1?,?,i1?,o1?)=P(it?∣it?1?),t=1,2,?,T 观测独立假设
P
(
o
t
∣
i
T
,
o
T
,
i
T
?
1
,
o
T
?
1
,
?
?
,
i
t
+
1
,
o
t
+
1
,
i
t
,
i
t
?
1
,
o
t
?
1
,
?
?
,
i
1
,
o
1
)
=
P
(
o
t
∣
i
t
)
P\left(o_{t} \mid i_{T}, o_{T}, i_{T-1}, o_{T-1}, \cdots, i_{t+1}, o_{t+1}, i_{t}, i_{t-1}, o_{t-1}, \cdots, i_{1}, o_{1}\right)=P\left(o_{t} \mid i_{t}\right)
P(ot?∣iT?,oT?,iT?1?,oT?1?,?,it+1?,ot+1?,it?,it?1?,ot?1?,?,i1?,o1?)=P(ot?∣it?)
现在我们所有知道的东西都在这了,看看我们能做什么
三个问题
概率计算问题(evaluation)
给定模型
λ
=
(
A
,
B
,
π
)
\lambda=(A, B, \pi)
λ=(A,B,π) 和观测序列
O
=
(
o
1
,
o
2
,
?
?
,
o
T
)
O=\left(o_{1}, o_{2}, \cdots, o_{T}\right)
O=(o1?,o2?,?,oT?) , 计算在模型
λ
\lambda
λ 下观测序列
O
O
O 出现的概率
P
(
O
∣
λ
)
P(O \mid \lambda)
P(O∣λ).
直接计算法
P
(
O
∣
λ
)
=
∑
I
P
(
O
,
I
∣
λ
)
=
∑
I
P
(
O
∣
I
,
λ
)
?
P
(
I
∣
λ
)
P(O|\lambda) = \sum_{I}P(O,I|\lambda) = \sum_{I}P(O|I,\lambda)\cdot P(I|\lambda)
P(O∣λ)=I∑?P(O,I∣λ)=I∑?P(O∣I,λ)?P(I∣λ)
前向算法
定义前向概率为
α
t
(
i
)
=
P
(
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
=
q
i
∣
λ
)
\alpha_t(i) = P(o_1,o_2,...,o_t,i_t=q_i|\lambda)
αt?(i)=P(o1?,o2?,...,ot?,it?=qi?∣λ) 那么我们要求的
P
(
O
∣
λ
)
=
P
(
o
1
,
o
2
,
.
.
.
,
o
T
∣
λ
)
=
∑
i
t
P
(
o
1
,
o
2
,
.
.
.
,
o
T
,
i
T
=
q
i
∣
λ
)
=
∑
i
α
T
(
i
)
P(O|\lambda) = P(o_1,o_2,...,o_T|\lambda) = \sum_{i_t}P(o_1,o_2,...,o_T,i_T=q_i|\lambda) = \sum_{i}\alpha_T(i)
P(O∣λ)=P(o1?,o2?,...,oT?∣λ)=it?∑?P(o1?,o2?,...,oT?,iT?=qi?∣λ)=i∑?αT?(i)
算法过程:
(1)初值
α
1
(
i
)
=
P
(
o
1
,
i
1
=
q
i
∣
λ
)
=
P
(
i
1
∣
λ
)
?
P
(
o
1
∣
i
1
=
q
i
,
λ
)
=
π
i
b
i
(
o
1
)
\alpha_1(i) = P(o_1,i_1=q_i|\lambda) = P(i_1|\lambda)\cdot P(o_1|i_1=qi,\lambda) = \pi_i b_i(o_1)
α1?(i)=P(o1?,i1?=qi?∣λ)=P(i1?∣λ)?P(o1?∣i1?=qi,λ)=πi?bi?(o1?) (2)递推
α
t
+
1
(
i
)
=
P
(
o
1
,
o
2
,
.
.
.
,
o
t
,
o
t
+
1
,
i
t
+
1
=
q
i
∣
λ
)
=
P
(
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
+
1
=
q
i
∣
λ
)
?
P
(
o
t
+
1
∣
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
+
1
=
q
i
∣
λ
)
=
[
∑
j
=
1
P
(
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
=
q
j
,
i
t
+
1
=
q
i
∣
λ
)
]
?
P
(
o
t
+
1
∣
i
t
+
1
=
q
i
)
=
[
∑
j
=
1
P
(
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
=
q
j
∣
λ
)
?
P
(
i
t
+
1
=
q
i
∣
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
=
q
j
,
λ
)
]
?
b
i
(
o
t
+
1
)
=
[
∑
j
=
1
α
t
(
j
)
?
P
(
i
t
+
1
=
q
i
∣
i
t
=
q
j
,
λ
)
]
?
b
i
(
o
t
+
1
)
=
[
∑
j
=
1
α
t
(
j
)
?
a
j
i
]
?
b
i
(
o
t
+
1
)
\begin{aligned} \alpha_{t+1}(i) &= P(o_1,o_2,...,o_t,o_{t+1},i_{t+1}=q_i|\lambda) \\ &= P(o_1,o_2,...,o_t,i_{t+1}=q_i|\lambda)\cdot P(o_{t+1}|o_1,o_2,...,o_t,i_{t+1} = q_i|\lambda) \\ &= [\sum_{j=1} P(o_1,o_2,...,o_t,i_t=q_j,i_{t+1}=q_i|\lambda)]\cdot P(o_{t+1}|i_{t+1}=q_i) \\ &= [\sum_{j=1}P(o_1,o_2,...,o_t,i_t=q_j|\lambda)\cdot P(i_{t+1}=q_i|o_1,o_2,...,o_t,i_t=q_j,\lambda)]\cdot b_i(o_{t+1}) \\ &= [\sum_{j=1}\alpha_t(j)\cdot P(i_{t+1}=q_i|i_t=q_j,\lambda)]\cdot b_i(o_{t+1}) \\ &= [\sum_{j=1}\alpha_t(j)\cdot a_{ji}]\cdot b_i(o_{t+1}) \end{aligned}
αt+1?(i)?=P(o1?,o2?,...,ot?,ot+1?,it+1?=qi?∣λ)=P(o1?,o2?,...,ot?,it+1?=qi?∣λ)?P(ot+1?∣o1?,o2?,...,ot?,it+1?=qi?∣λ)=[j=1∑?P(o1?,o2?,...,ot?,it?=qj?,it+1?=qi?∣λ)]?P(ot+1?∣it+1?=qi?)=[j=1∑?P(o1?,o2?,...,ot?,it?=qj?∣λ)?P(it+1?=qi?∣o1?,o2?,...,ot?,it?=qj?,λ)]?bi?(ot+1?)=[j=1∑?αt?(j)?P(it+1?=qi?∣it?=qj?,λ)]?bi?(ot+1?)=[j=1∑?αt?(j)?aji?]?bi?(ot+1?)? (3)终止
P
(
O
∣
λ
)
=
∑
i
α
T
(
i
)
P(O|\lambda) = \sum_{i}\alpha_T(i)
P(O∣λ)=i∑?αT?(i)
算法实现
定义观测数据和
λ
\lambda
λ??参数(
π
,
A
,
B
\pi,A,B
π,A,B)
import numpy as np
O = [1,1,0,0,1]
Pi= [0.25,0.25,0.25,0.25]
A = [
[0, 1, 0, 0 ],
[0.4,0,0.6,0],
[0,0.4,0,0.6],
[0,0,0.5,0.5]
]
B = [
[0.5,0.5],
[0.3,0.7],
[0.6,0.4],
[0.8,0.2]
]
定义模型
class ForwardModel:
def __init__(self,Pi,A,B) -> None:
self.Pi = Pi
self.A = A
self.B = B
self.T = len(A)
def predict(self,O):
alpha = np.zeros(shape=(self.T,),dtype=float)
for i in range(self.T):
alpha[i] = self.Pi[i]*self.B[i][O[0]]
for observation in O:
temp = np.zeros_like(alpha)
for i in range(self.T):
for j in range(self.T):
temp[i] += alpha[j]*self.A[j][i]
temp[i] = temp[i]*self.B[i][observation]
alpha = temp
return np.sum(alpha)
预测
model = ForwardModel(Pi,A,B)
model.predict(O)
0.01164323808
读者可自行验证是否正确
后向算法
定义后向概率为
β
t
(
i
)
=
P
(
o
t
+
1
,
o
t
+
2
,
?
?
,
o
T
∣
i
t
=
q
i
,
λ
)
\beta_{t}(i)=P\left(o_{t+1}, o_{t+2}, \cdots, o_{T} \mid i_{t}=q_{i}, \lambda\right)
βt?(i)=P(ot+1?,ot+2?,?,oT?∣it?=qi?,λ) 那么
P
(
O
∣
λ
)
=
P
(
o
1
,
o
2
,
.
.
.
,
o
T
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
,
o
2
,
.
.
.
,
o
T
,
i
1
=
q
i
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
∣
o
2
,
.
.
.
,
o
T
,
i
1
=
q
i
,
λ
)
?
P
(
o
2
,
.
.
.
,
o
T
,
i
1
=
q
i
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
∣
i
1
=
q
i
,
λ
)
?
P
(
o
2
,
.
.
.
,
o
T
∣
i
1
=
q
i
,
λ
)
?
P
(
i
1
=
q
i
∣
λ
)
=
∑
i
=
1
N
b
i
(
o
1
)
?
β
1
(
i
)
?
π
i
\begin{aligned} P(O|\lambda) &= P(o_1,o_2,...,o_T|\lambda) \\ &=\sum_{i=1}^{N}P(o_1,o_2,...,o_T,i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}P(o_1|o_2,...,o_T,i_1=q_i,\lambda)\cdot P(o_2,...,o_T,i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}P(o_1|i_1=q_i,\lambda)\cdot P(o_2,...,o_T|i_1=q_i,\lambda)\cdot P(i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}b_i(o_1)\cdot \beta_1(i)\cdot \pi_i \end{aligned}
P(O∣λ)?=P(o1?,o2?,...,oT?∣λ)=i=1∑N?P(o1?,o2?,...,oT?,i1?=qi?∣λ)=i=1∑N?P(o1?∣o2?,...,oT?,i1?=qi?,λ)?P(o2?,...,oT?,i1?=qi?∣λ)=i=1∑N?P(o1?∣i1?=qi?,λ)?P(o2?,...,oT?∣i1?=qi?,λ)?P(i1?=qi?∣λ)=i=1∑N?bi?(o1?)?β1?(i)?πi??
算法过程
(1)初值
β
T
(
i
)
=
1
\beta_T(i) = 1
βT?(i)=1
(2)递推
β
t
(
i
)
=
P
(
o
t
+
1
,
o
t
+
2
,
?
?
,
o
T
∣
i
t
=
q
i
,
λ
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
o
t
+
2
,
?
?
,
o
T
,
i
t
+
1
=
q
j
∣
i
t
=
q
i
,
λ
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
o
t
+
2
,
?
?
,
o
T
∣
i
t
+
1
=
q
j
,
i
t
=
q
i
,
λ
)
?
P
(
i
t
+
1
=
q
j
∣
i
t
=
q
i
,
λ
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
o
t
+
2
,
?
?
,
o
T
∣
i
t
+
1
=
q
j
,
i
t
=
q
i
,
λ
)
?
α
i
j
\begin{aligned} \beta_{t}(i) &= P\left(o_{t+1}, o_{t+2}, \cdots, o_{T} \mid i_{t}=q_{i}, \lambda\right) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T},i_{t+1}=q_j \mid i_{t}=q_{i}, \lambda\right) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T}\mid i_{t+1}=q_j,i_{t}=q_{i}, \lambda\right)\cdot P(i_{t+1}=q_j|i_t=q_i,\lambda) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T}\mid i_{t+1}=q_j,i_{t}=q_{i}, \lambda\right)\cdot \alpha_{ij} \end{aligned}
βt?(i)?=P(ot+1?,ot+2?,?,oT?∣it?=qi?,λ)=j=1∑N?P(ot+1?,ot+2?,?,oT?,it+1?=qj?∣it?=qi?,λ)=j=1∑N?P(ot+1?,ot+2?,?,oT?∣it+1?=qj?,it?=qi?,λ)?P(it+1?=qj?∣it?=qi?,λ)=j=1∑N?P(ot+1?,ot+2?,?,oT?∣it+1?=qj?,it?=qi?,λ)?αij?? 接下来这个步骤很关键,根据概率图模型。。。(挖个坑,这儿差个证明)
β
t
(
i
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
.
.
.
o
T
∣
i
t
+
1
=
q
j
,
λ
)
?
α
i
j
=
∑
j
=
1
N
P
(
o
t
+
2
,
.
.
.
o
T
∣
i
t
+
1
=
q
j
,
λ
)
?
P
(
o
t
+
1
∣
o
t
+
2
,
.
.
.
o
T
,
i
t
+
1
=
q
j
,
λ
)
?
α
i
j
=
∑
j
=
1
N
β
t
+
1
(
j
)
?
b
j
(
o
t
+
1
)
?
α
i
j
\begin{aligned} \beta_t(i) &= \sum_{j=1}^{N}P(o_{t+1},...o_T|i_{t+1}=q_j,\lambda)\cdot \alpha_{ij} \\ &=\sum_{j=1}^{N}P(o_{t+2},...o_T|i_{t+1}=q_j,\lambda)\cdot P(o_{t+1}|o_{t+2},...o_T,i_{t+1}=q_j,\lambda) \cdot \alpha_{ij} \\ &= \sum_{j=1}^{N}\beta_{t+1}(j)\cdot b_j(o_{t+1})\cdot \alpha_{ij} \end{aligned}
βt?(i)?=j=1∑N?P(ot+1?,...oT?∣it+1?=qj?,λ)?αij?=j=1∑N?P(ot+2?,...oT?∣it+1?=qj?,λ)?P(ot+1?∣ot+2?,...oT?,it+1?=qj?,λ)?αij?=j=1∑N?βt+1?(j)?bj?(ot+1?)?αij?? (3)终止
P
(
O
∣
λ
)
=
∑
i
=
1
N
b
i
(
o
1
)
?
β
1
(
i
)
?
π
i
\begin{aligned} P(O|\lambda) =\sum_{i=1}^{N}b_i(o_1)\cdot \beta_1(i)\cdot \pi_i \end{aligned}
P(O∣λ)=i=1∑N?bi?(o1?)?β1?(i)?πi??
算法实现
数据和参数定义部分与上面一致,略
定义模型
class BackwardModel:
def __init__(self,Pi,A,B) -> None:
self.Pi = Pi
self.A = A
self.B = B
self.T = len(A)
def predict(self,O):
beta = np.ones(shape=(self.T,))
for o in reversed(O):
temp = np.zeros_like(beta)
for i in range(self.T):
for j in range(self.T):
temp[i] += beta[j]*B[j][o]*A[i][j]
beta = temp
res = 0
for i in range(self.T):
res += B[i][O[0]]*beta[i]*Pi[i]
return res
预测
model = BackwardModel(Pi,A,B)
model.predict(O)
0.01164323808
我们看到这跟前面使用前向算法得出的预测值是相同的
学习问题(Learning)
Baum-Welch算法
剩下的先留个坑,明天补充
|