【预训练语言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
(1)masked language model(MLM)(类似完形填空一样对一个句子挖掉一个token,然后去预测该token):
The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context.
【预训练语言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
3.1 Input Representation
BERT的输入sequence可以是一个单独的句子sentence(例如在序列标注、文本分类等任务),也可以是一个句子对(a pair of sentences)(例如QA任务),输入的特征包括:
【预训练语言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
【预训练语言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
【预训练语言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
【预训练语言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)