天天看点

entity annotation与entity linking的区别

实体链接的定义是识别文本中的mention,并将其链接到知识库中。通常包括识别文本中的mention以及将mention链接到知识库中的entity两个步骤。部分工作也默认mention实现提供,而将重点放在实体消歧上。

笔者近日又看到了entity annotation的概念,好奇它和entity linking有什么区别。经查阅相关资料A framework for benchmarking entity-annotation systems,笔者认为,entity annotation的目标是服务于文本表示,是想要抽取出文本中有意义的片段,并将其链接到无歧义的identifiers上。从entity annotation的定义看,entity linking应该包含在entity annotation中,但entity annotation在entity linking的基础上,还会去除无意义的(对表示文本无益)的实体。

Classic approaches to document indexing, clustering, classification and retrieval are based on the bag-of-words paradigm. The limitations of this paradigm are well-known to the IR community and in recent years a good deal of work has attempted to move beyond by “grounding” the processed

texts with respect to an adequate semantic representation, by designing so-called entity annotators. The key idea is to identify, in the input text, short-and-meaningful sequences

of terms (also called mentions) and annotate them with unambiguous identifiers (also called entities) drawn from a catalog. Most recent work adopts anchor texts occurring in

Wikipedia as entity mentions and the respective Wikipedia pages as the mentioned entity, because Wikipedia offers today the best trade-off between catalogs with a rigorous structure but low coverage (such as WordNet, CYC, TAP), and a large text collection with wide coverage but unstructured and noisy content (like the whole Web). The process of entity annotation involves three main steps: (1) parsing of the input text, which is the task to detect candidate entity mentions and link each of them to all possible entities they could mention; (2) disambiguation of mentions, which is the task of selecting the most pertinent Wikipedia page (i.e., entity) that best describes each mention; (3) pruning of a mention, which discards a detected mention and its annotated entity if they are considered not interesting or pertinent to the semantic interpretation of the input text.

继续阅读