天天看點

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

文章目錄

  • 1. A Unified Generative Framework for Various NER Subtasks
    • Method
  • 2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition
    • Method
      • Word Representation
      • Graph Convolutional Network
      • Span Representation
      • Decoding
    • 小感悟
  • 3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
    • Method
      • Char-Words Pair Sequence
      • Lexicon Adapter
      • Lexicon Enhanced BERT

1. A Unified Generative Framework for Various NER Subtasks

第一篇來看看複旦大學邱錫鵬老師團隊做的能夠解決多種NER任務的統一架構。

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

熟悉NER的小夥伴都知道,Flat NER可以轉化為序列标注的任務,但是一些特殊的NER,比如嵌套(nested) NER,不連續(discontinuous) NER都無法簡單地轉化為序列标注任務。用不連續NER舉個例子:

“have much muscle pain and fatigue.”中,“muscle pain”和“muscle fatigue”為兩個不同的命名實體,muscle是兩個不同實體的start,是以無法使用簡單的序列标注。

是以,本文提出一個生成式的NER統一架構,旨在同時解決不同的類型的NER子任務。針對于不同的子任務,重點是如何為其标簽進行統一模組化處理。本文通過Seq2Seq模型(本文采用預訓練的BART),将實體線性化為實體指針索引序列,一舉解決了三種子任務不相容的問題。具體來說,給定一個輸入token序列 X = [ x 1 , . . . x n ] X=[x_1,...x_n] X=[x1​,...xn​],目标序列為 Y = [ s 11 , e 11 . . . s 1 j , e 1 j , t 1 . . . , s i 1 , e i 1 . . . s i j , e i j , t 1 ] Y=[s_{11},e_{11}...s_{1j} , e_{1j},t_1...,s_{i1},e_{i1}...s_{ij} , e_{ij},t_1] Y=[s11​,e11​...s1j​,e1j​,t1​...,si1​,ei1​...sij​,eij​,t1​]。這裡的 s , e s,e s,e分别表示一個實體的開始以及結束token, t i t_i ti​表示實體的tag類型index。因為一個句子中同一個tag下可能有許多不同的實體,是以表述起來就是 s 11 , e 11 . . . s 1 j , e 1 j s_{11},e_{11}...s_{1j} , e_{1j} s11​,e11​...s1j​,e1j​對應一個 t 1 t_1 t1​。使用 G = [ g 1 , . . . g l ] G=[g_1,...g_l] G=[g1​,...gl​]來表示 l l l個不同的tag組成的token,假如有兩組不同的實體類型:Person以及Organization,那麼這裡的 G G G就是[Person,Organization]。然後,可以把原句子和tag組成的token拼接在一起,這樣就就能使用索引共同表示了,并且需要強調的是, t i ∈ ( n , n + l ] t_i\in (n,n+l] ti​∈(n,n+l]。

Method

接下來就可以借助Seq2Seq進行學習了。生成目标tag的過程可以視為一個求解機率分布的過程:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

y 0 y_0 y0​為表示decoder開始的特殊字元。為了求解這個機率,首先需要對token進行encoder:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

注意這裡的token不是與tag拼接的token,而是原token,是以 H e ∈ R n × d H^e\in R^{n×d} He∈Rn×d。之後,在decoder的時候,需要整合目标序列 Y Y Y。但是實際上 Y Y Y是一個index組成的序列,是以無法使用預訓練的模型BART進行計算,是以需要将index轉化為token:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

這樣,借助轉化之後的token以及encoder的輸出,就可以計算decoder的最後一個隐藏狀态了:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

在encoder和decoder的輸出都已知的情況下,本文采用了如下方式計算index的機率分布 P t P_t Pt​:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

這裡的TokenEmbed參數是對 X X X與 G G G共享的。在訓練的過程中,使用negative

log-likelihood loss以及teacher forcing method(不使用上一個state的輸出作為下一個state的輸入,而是直接使用訓練資料的标準答案的對應上一項作為下一個state的輸入。),在預測的時候采用自回歸的方式生成目标序列。

2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

盡管都作用于重疊實體以及不聯系實體的聯合抽取,但與第一篇不一樣,本文使用的是窮舉text span的思路,而非生成式模型。之後,對每一個窮舉的text span進行二分類,判斷是否是一個實體。然後,通過預定義實體之間的關系進一步判斷實體之間是Overlapping還是Succession。總體來說,模型依舊是兩段式:encoder和decoder,并在encoder的過程中引入了句法圖結構,并借助GCN進行增強。之後,聯合實體之間的關系識别任務,建構了一個多任務的decoder架構。如圖所示,模型一共分為四個子產品:(1)單詞表示,(2)圖卷積網絡,(3)span表示,(4)聯合解碼。

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

Method

Word Representation

首先,借助Bert對單詞進行表示。而Bert在tokenizer的時候,會将一個單詞分成詞素,目的是降低英文單詞中時态的多樣性,比如,單詞 f e v e r s fevers fevers會分為 f e v e r fever fever以及 # # s \#\#s ##s。在這樣的情況下,就使用開頭的單詞塊作為最終的單詞表示。這樣,就會得到一個矩陣 H = { h 1 , . . . h n } ∈ R N × d h H=\{h_1,...h_n\}\in R^{N×d_h} H={h1​,...hn​}∈RN×dh​。

Graph Convolutional Network

在之後,使用圖卷積,引入Dependency syntax。這裡使用了AGGCN:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

這裡将 H t ∈ R N × d h e a d H^t\in R^{N×d_{head}} Ht∈RN×dhead​映射到一個query和一個key,并計算注意力系數作為圖卷積的卷積核。因為是多頭,是以在這一步要根據不同的 A ~ t \tilde{A}^t A~t執行多次卷積,然後再經過一個全連接配接得到 H ~ t ∈ R N × d n \tilde{H}^t\in R^{N×d_{n}} H~t∈RN×dn​。多頭再進行拼接就會得到一個整合之後的特征:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

這裡 W 1 ∈ R ( N h e a d × d h ) × d h W_1\in R^{(N_{head}×d_h)×d_h} W1​∈R(Nhead​×dh​)×dh​,将拼接之後的特征重新映射回 d h d_h dh​維,然後将 H H H與之拼接,得到最終的特征表示。

Span Representation

舉個例子來說明span的形式。對于“The mitral valve leaflets are mildly thickened”,span如下:

“The”, “The mitral”, “The mitral valve”, …,“mildly”, “mildly thickened” “thickened”.

對于超過兩個的span,采用頭單詞和尾單詞的拼接作為表示,span的特征用 s i j s_{ij} sij​表示:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

這裡 w w w是一個固定的20次元的次元。

Decoding

現在有了所有的span的次元了,首先識别出所有有效的實體片段,然後對這些片段進行兩兩分類,以揭示它們之間的關系。這部分就很簡單啦,使用一個MLP然後進行分類:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

p 1 p_1 p1​是目前span屬于不同實體的機率。同樣,用MLP識别兩個實體span之間的關系:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

盡管可以在第一步識别出重疊的實體,但這裡我們使用overlapped作為一個輔助政策來進一步增強模型。decode的算法描述如下:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

小感悟

實際上個人覺得窮舉span可能時間複雜度也沒有那麼高,因為對于實體識别來說,不連續的實體不太可能出現在兩個不同的句子中,是以在做span窮舉的時候,隻需要關注一個子句就可以了,子句越短效率越高。而對于Seq2Seq模型,自回歸的方式慢得一。當然,哪種方法更好,大家見仁見智吧。

3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

既然有關NER,怎麼能少得了詞彙增強呢!首先要說明,Lexicon是一個中文NER的分支,可以通過引入一些詞庫(Lexicon)來提升NER的效果,畢竟中文是不帶有空格的。本文最大的創新點就是将Lexicon與Bert內建到一起,提出LEBERT并在命名實體識别、分詞和詞性标注三個任務上提升了實驗結果。這裡沒有過多的背景需要介紹,直接看提出的模型。

Method

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

模型有三個主要部分組成:Char-words Pair Sequence;Lexicon Adapter;Lexicon Enhanced BERT。

Char-Words Pair Sequence

首先需要強調一下,在中文裡,每一個字,是character;而使用分詞方法得到的詞,才是word,而英文是每一個單詞叫一個word。為了充分利用Lexicon資訊,我們将字元序列擴充為字元-詞對序列。給定一個Lexicon,用 D D D表示,一個句子用 s c = { c 1 , . . . c n } s_c=\{c_1,...c_n\} sc​={c1​,...cn​}表示。通過比對字元 c i c_i ci​,找到在 D D D中包含 c i c_i ci​的所有詞,然後組成 s c w = { ( c 1 , w s 1 ) , . . . ( c n , w s n ) } s_{cw}=\{(c_1,ws_1),...(c_n,ws_n)\} scw​={(c1​,ws1​),...(cn​,wsn​)}。如圖所示:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

Lexicon Adapter

這個的目的是将Lexicon與Bert相結合。對于 s c s_c sc​的第 i i i個輸入,使用兩個向量表示: ( h i c , x i w s ) (h_i^c, x_i^{ws}) (hic​,xiws​)。前者是 c i c_i ci​對應的字元向量,在論文裡是Bert的某一層的輸出,後者是字元對應的所有增強詞彙的一組詞嵌入,也就是 x i w s = { x i 1 w , . . . , x i m w } x_i^{ws}=\{x_{i1}^{w},...,x_{im}^{w}\} xiws​={xi1w​,...,ximw​}。其中 x i j w x_{ij}^{w} xijw​通過一個預訓練的單詞嵌入lookup table e w e^w ew表示:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

為了對齊這兩種不同的表示,對單詞向量應用非線性變換:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

如此一來,每一個增強的單詞都會有一個特有的表示,一個字元對應的所有增強單詞表示構成一個集合 V i = { v i 1 w , . . . , v i m w } V_i=\{v_{i1}^{w},...,v_{im}^{w}\} Vi​={vi1w​,...,vimw​}。在這裡,考慮到不同的單詞表示對字元的增強程度不同,是以使用一個注意力機制去計算權重和:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

最終,增強之後的字元的表示為原字元與增強單詞的和:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

總體的流程如圖所示:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

Lexicon Enhanced BERT

Lexicon增強的Bert由Lexicon Adapter(LA)與Bert組成。首先,對于第一層( l = 0 l=0 l=0)Bert來說,需要對 s c s_c sc​做embedding,然後輸入到Transformer中:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

這裡 s c s_c sc​ embedding的輸出就是 H 0 H^0 H0。MHAttn是多頭注意力,FFN是2層的前饋網絡,LN是layer normalization。在第 k k k層,輸出 H k H^k Hk并與增強詞彙的表示做LA:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

最終,經過 L − k L-k L−k次encoder得到了輸出 H L H^L HL。之後就都是Bert的套路了,做一個次元變換使之适配最終的分類:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

然後是相應的損失函數:

ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
ACL2021(之)命名實體識别1. A Unified Generative Framework for Various NER Subtasks2. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition3. Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter

繼續閱讀