1.self-attention 對序列進行embedding增強,不做次元降維
2.hierarchialAttention 對次元進行降維
3.BahdanauAttention 對encoder-decoder進行encoer-embedding,不做次元降維
4.wechat deep-semantic-matching
Keyword-Attention:
keyword-mask-shape=[batch,seqlen], elem∈{0,1}
keyword-mask-reshape=[batch,seqlen,1]
real_mask_a=[batch,seqlen]
real_mask_a-reshape=[batch,1,seqlen]
real_mask_b=[batch,seqlen]
real_mask_b-reshape=[batch,1,seqlen]
kw_mask_a=real_mask_a*kw_mask_b
kw_mask_b=real_mask_b*kw_mask_a
Self-Attention:
kw-attention+self-attention