第十五講 共指解析
Coreference Resolution
Idea: Identify all noun phrases that refer #說白了就是要搞清楚每個名詞短語指代的是誰 比如 John loves his wife. He prepares breakfirst for her everyday. 我們知道his,He都指代(co-refer)的是John.
None phrases refer to entities in the world, many pairs of noun phrases co-refer, some nested inside others.
Coreference Resolution在機器翻譯、文本了解等方面都有一定的應用。
Evaluation
Precision/Recall
這倆評價名額還挺常見的。
Precision:準确率,也叫查準率,是模型判定為正例且判定正确的樣本占模型判定為正例的樣本的比例
P = T P T P + F P P = \frac {TP} {TP+FP} P=TP+FPTP
Recall: 召回率,也叫查全率,是說模型判定為正例且正确的樣本占真正為正例的樣本的比例
R = T P T P + F N R = \frac {TP} {TP+FN} R=TP+FNTP
Kinds of Reference
- Referring expressions
- John Smith
- President Smith
- the president
- Free variables
- Smith saw his salary increase
- Bound variables
- The dancer hurt herself
#Free variable是說,這個變量并不一定指代和它最近的名詞,而是依賴于具體的上下文。比如上例中,his salary可能是Smith的,但是如果我們在上文中加上一句’John works hard recently’,那麼這個salary其實也可以的是Jhon的; 而Bound variables則十分明确,例子中的herself就是嚴格地依賴于句子中之前提到過的名詞dancer.
- Not all NPs are referring
- No dancer twisted her knee.
-
It is raining.
Coreference: two mentions refer to the same entity
Anaphora: A term(anaphor) refers to another term(anecedent) and the interpretation of anaphor is in some way determined by the interpretation of anecedent. Traditionally the anecedent came first.
Cataphora: anecedent did not come first. And we call the first term cataphor.
Kinds of Coreference Models
-
Mention Pair models
把共指看成是二進制連接配接的集合,每兩對進行一次判别,判斷其是否共指
- Mention ranking models
- 給出一個詞,我們想看它和哪些詞共指,文中可以有若幹個mentions,我們對它們進行排序,然後給出結果
- Entity-Mention models
- 給出具體的entity而不是連結