ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
文章目錄
1. A Unified Generative Framework for Various NER Subtasks
Method
2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition
Method
Word Representation
Graph Convolutional Network
Span Representation
Decoding
小感悟
3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
Method
Char-Words Pair Sequence
Lexicon Adapter
Lexicon Enhanced BERT
1. A Unified Generative Framework for Various NER Subtasks
第一篇來看看複旦大學邱錫鵬老師團隊做的能夠解決多種NER任務的統一架構。
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
“have much muscle pain and fatigue.”中,“muscle pain”和“muscle fatigue”為兩個不同的命名實體,muscle是兩個不同實體的start,是以無法使用簡單的序列标注。
是以,本文提出一個生成式的NER統一架構,旨在同時解決不同的類型的NER子任務。針對于不同的子任務,重點是如何為其标簽進行統一模組化處理。本文通過Seq2Seq模型(本文采用預訓練的BART),将實體線性化為實體指針索引序列,一舉解決了三種子任務不相容的問題。具體來說,給定一個輸入token序列 X = [ x 1 , . . . x n ] X=[x_1,...x_n] X=[x1,...xn],目标序列為 Y = [ s 11 , e 11 . . . s 1 j , e 1 j , t 1 . . . , s i 1 , e i 1 . . . s i j , e i j , t 1 ] Y=[s_{11},e_{11}...s_{1j} , e_{1j},t_1...,s_{i1},e_{i1}...s_{ij} , e_{ij},t_1] Y=[s11,e11...s1j,e1j,t1...,si1,ei1...sij,eij,t1]。這裡的 s , e s,e s,e分别表示一個實體的開始以及結束token, t i t_i ti表示實體的tag類型index。因為一個句子中同一個tag下可能有許多不同的實體,是以表述起來就是 s 11 , e 11 . . . s 1 j , e 1 j s_{11},e_{11}...s_{1j} , e_{1j} s11,e11...s1j,e1j對應一個 t 1 t_1 t1。使用 G = [ g 1 , . . . g l ] G=[g_1,...g_l] G=[g1,...gl]來表示 l l l個不同的tag組成的token,假如有兩組不同的實體類型:Person以及Organization,那麼這裡的 G G G就是[Person,Organization]。然後,可以把原句子和tag組成的token拼接在一起,這樣就就能使用索引共同表示了,并且需要強調的是, t i ∈ ( n , n + l ] t_i\in (n,n+l] ti∈(n,n+l]。
Method
接下來就可以借助Seq2Seq進行學習了。生成目标tag的過程可以視為一個求解機率分布的過程:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
y 0 y_0 y0為表示decoder開始的特殊字元。為了求解這個機率,首先需要對token進行encoder:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
注意這裡的token不是與tag拼接的token,而是原token,是以 H e ∈ R n × d H^e\in R^{n×d} He∈Rn×d。之後,在decoder的時候,需要整合目标序列 Y Y Y。但是實際上 Y Y Y是一個index組成的序列,是以無法使用預訓練的模型BART進行計算,是以需要将index轉化為token:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
在encoder和decoder的輸出都已知的情況下,本文采用了如下方式計算index的機率分布 P t P_t Pt:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
這裡的TokenEmbed參數是對 X X X與 G G G共享的。在訓練的過程中,使用negative
2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
Method
Word Representation
首先,借助Bert對單詞進行表示。而Bert在tokenizer的時候,會将一個單詞分成詞素,目的是降低英文單詞中時态的多樣性,比如,單詞 f e v e r s fevers fevers會分為 f e v e r fever fever以及 # # s \#\#s ##s。在這樣的情況下,就使用開頭的單詞塊作為最終的單詞表示。這樣,就會得到一個矩陣 H = { h 1 , . . . h n } ∈ R N × d h H=\{h_1,...h_n\}\in R^{N×d_h} H={h1,...hn}∈RN×dh。
Graph Convolutional Network
在之後,使用圖卷積,引入Dependency syntax。這裡使用了AGGCN:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
這裡将 H t ∈ R N × d h e a d H^t\in R^{N×d_{head}} Ht∈RN×dhead映射到一個query和一個key,并計算注意力系數作為圖卷積的卷積核。因為是多頭,是以在這一步要根據不同的 A ~ t \tilde{A}^t A~t執行多次卷積,然後再經過一個全連接配接得到 H ~ t ∈ R N × d n \tilde{H}^t\in R^{N×d_{n}} H~t∈RN×dn。多頭再進行拼接就會得到一個整合之後的特征:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
這裡 W 1 ∈ R ( N h e a d × d h ) × d h W_1\in R^{(N_{head}×d_h)×d_h} W1∈R(Nhead×dh)×dh,将拼接之後的特征重新映射回 d h d_h dh維,然後将 H H H與之拼接,得到最終的特征表示。
Span Representation
舉個例子來說明span的形式。對于“The mitral valve leaflets are mildly thickened”,span如下:
“The”, “The mitral”, “The mitral valve”, …,“mildly”, “mildly thickened” “thickened”.
對于超過兩個的span,采用頭單詞和尾單詞的拼接作為表示,span的特征用 s i j s_{ij} sij表示:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
p 1 p_1 p1是目前span屬于不同實體的機率。同樣,用MLP識别兩個實體span之間的關系:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
首先需要強調一下,在中文裡,每一個字,是character;而使用分詞方法得到的詞,才是word,而英文是每一個單詞叫一個word。為了充分利用Lexicon資訊,我們将字元序列擴充為字元-詞對序列。給定一個Lexicon,用 D D D表示,一個句子用 s c = { c 1 , . . . c n } s_c=\{c_1,...c_n\} sc={c1,...cn}表示。通過比對字元 c i c_i ci,找到在 D D D中包含 c i c_i ci的所有詞,然後組成 s c w = { ( c 1 , w s 1 ) , . . . ( c n , w s n ) } s_{cw}=\{(c_1,ws_1),...(c_n,ws_n)\} scw={(c1,ws1),...(cn,wsn)}。如圖所示:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
Lexicon Adapter
這個的目的是将Lexicon與Bert相結合。對于 s c s_c sc的第 i i i個輸入,使用兩個向量表示: ( h i c , x i w s ) (h_i^c, x_i^{ws}) (hic,xiws)。前者是 c i c_i ci對應的字元向量,在論文裡是Bert的某一層的輸出,後者是字元對應的所有增強詞彙的一組詞嵌入,也就是 x i w s = { x i 1 w , . . . , x i m w } x_i^{ws}=\{x_{i1}^{w},...,x_{im}^{w}\} xiws={xi1w,...,ximw}。其中 x i j w x_{ij}^{w} xijw通過一個預訓練的單詞嵌入lookup table e w e^w ew表示:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
為了對齊這兩種不同的表示,對單詞向量應用非線性變換:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
如此一來,每一個增強的單詞都會有一個特有的表示,一個字元對應的所有增強單詞表示構成一個集合 V i = { v i 1 w , . . . , v i m w } V_i=\{v_{i1}^{w},...,v_{im}^{w}\} Vi={vi1w,...,vimw}。在這裡,考慮到不同的單詞表示對字元的增強程度不同,是以使用一個注意力機制去計算權重和:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
最終,增強之後的字元的表示為原字元與增強單詞的和:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
總體的流程如圖所示:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
Lexicon Enhanced BERT
Lexicon增強的Bert由Lexicon Adapter(LA)與Bert組成。首先,對于第一層( l = 0 l=0 l=0)Bert來說,需要對 s c s_c sc做embedding,然後輸入到Transformer中:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
這裡 s c s_c sc embedding的輸出就是 H 0 H^0 H0。MHAttn是多頭注意力,FFN是2層的前饋網絡,LN是layer normalization。在第 k k k層,輸出 H k H^k Hk并與增強詞彙的表示做LA:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
最終,經過 L − k L-k L−k次encoder得到了輸出 H L H^L HL。之後就都是Bert的套路了,做一個次元變換使之适配最終的分類:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
然後是相應的損失函數:
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter