天天看点

如何为chatbot提供训练语料

对话的实质是什么?

我们的生活中充满对话,从和男朋友准备晚餐的聊天,从快餐店订一个烤鸭,对公司季度销售进行总结报告,对话无处不在。对话有不同的长短,不同的主题,不同的重要性和不同的聊天场合,但是我们很少思考:我为什么要进行这次对话?我的目的是什么?

本文中,我们从对话是协同行动(coordinating joint action)这个视角来理解它。对话是动态的,充满了信号和互动。我们可以按照自己的设想开始一段对话,但是很多时候不能保证对话在哪里结束。chatbot和对话是名词,但是要很好的理解它们,我们倾向于把它们想象成动词。我们如何和别人互动?我们怎么确定对话按照我们想要的方向发展?

关于对话

对话的很大一部分构成是场合。设想如果你在舞会上,想请人跳舞,你可能只需要走过去,点点头说一句:我可以么(May I)?你的舞伴就会明白你的意图是想邀请她一起跳舞。但是设想如果你是在大街上这样问一个人,她可能就会很困惑,不知道你的请求是什么,或者只是理解为一次善意的打招呼.这就是不符合场合的对话。

这个道理也适用于chatbot。当你在和一个旅行社或者航空公司的chatbot聊天时,这个场合意味着chatbot应该可以帮你预定一个酒店或者改签你的航班。而不要期望它可以和你深入的聊政治新闻或者微积分方面的知识。场合对成功的对话非常重要,那么对话的是指是关于什么呢?通常我们可以把对话分成4个主要组成部分:

1. 相互问候:这部分很容易理解,但是当你说“早上好”,对方说“吃了么”的时候表达的都是问好,问候有非常多的表达方式,但是目标是一样的:建立良好关系

2. 信息交换: 当你说”你晚上打算做什么“你期望对方给你相应的答案。对话是关于提问和给出答案的过程

3. 鼓动行为:当你说”我们明天一起去逛街吧“或者”你可以帮我拿一下电脑么“时,对话的一部分是关于制定计划和提出请求做某事

4. 确定观点:当你说:”我同意,葡萄比苹果好吃“时,你在确定自己的观点

当然,强行把对话按照分成界限分明的4类是不可能的,但是当你在创建chatbot时,有这种把语句单独分类的意识很重要,你需要AI理解这4个分离的部分和它们之间的相互关系。最重要的是记住对话的关键是:协同行动。一个chatbot需要把对话者的话当作输入内容并且协调统筹后面会发生的事情和要采取的行动:是你的公司要申请退款么?在chatbot行动之前是否需要对话的人提供跟多的信息?是否需要真人介入来解决这个问题?一个chatbot在行动之前需要对这些问题做出判断。

为什么要开发chatbot?

如果你认为chatbot有点死板,这是可以理解的,因为之前人们对chatbot和智能助手进行了大肆宣传,你有理由怀疑:chatbot到底有多大的用处?毕竟在企业投入人力研发之前,我们有必要了解chatbot对我们来说有哪些研发的必要性。

下面是几个突出的原因:

起初大多数的企业研究chatbot用于客户服务。它可以帮助潜在的消费者确认适合的衣服尺寸下周是否会到货,重新预定一个酒店甚至处理更加复杂敏感的金融问题。

我们应该意识到,当对话成了客户服务重要的一部分时,企业是可以大大节省在每个客户身上所花费的服务成本的。毕竟客服人员可以同时服务几个客户并且只是相当于电话互动的成本的30%。当使用chatbot时,可以极大的提高效率,因为在某个场合下,大多数的问题都是反复询问并且可以预测的(一个酒店预定网站处理无数次的关于取消预定,房间升级或者入住时间方面的问题),在很多特定场合下,chatbot可以处理客户遇到的大多数的问题,这使得你的客户服务人员可以解放出来去做更多复杂和必须需要人协助的事情

斯坦福大学的教授Chris Pott曾经有个准则:常规的事件可以用常规的语言解决,非常规事件需要非常规语言来解决。chatbot处理常规事件一般没有问题,是那些非常规的事件它解决起来有困难

这些非常规事件正是客服人员可以并且能够解决的。通过把不同的对话情景分配不同的解决方案(电话,聊天窗口,chatbot),你可以让客服去解决更棘手的问题,在这种情况下你不再需要众多的客服人员从而节省费用。此外,chatbot可以24小时工作,他们在春节期间可以工作,从来不请病假,它是可靠的不会旷工。这些听起来不错。

现在我们想要说另外一个事实:信息发送APP近几年非常流行。事实上,根据商业调查,信息交流APP现在比社交网络更加流行。换句话说,信息交流APP的使用者就是你的客户,chatbot可以无缝的添加到这些APP当中,如果你的客户在whatsapp和Messenger上花费的时间超过了Twitter和领英,那么你为什么不好好利用呢?Chatbot生来就属于那里,聊天APP让你的客户可以方便的和你交流而不用在额外的下载你们公司的聊天软件或者通过电话和你们公司交流。所以为什么不选择chatbot呢?它可以把客户人员从重复枯燥的问题中解救出来,它可以使你的客户在现有的流行聊天软件中随和你们公司互动

好吧,那么我们来开始开发CHATBOT!

首先我们要思考的第一个问题就是我们为什么要重新开发一个CHATBOT。我们经常把CHATBOT用于客户服务,它可以帮助C端消费者做决策,在旅行中帮忙订酒店,为大型SaaS供应商提供问题解答,或者在任何需要大量员工和客户互动来解决问题的场景。记住,你要实现的目标和创建的内容非常重要。你不需要开发类似SIRI那样的CHATBOT,同时如果你曾经尝试问SIRI一些业务细分场景的问题,你会发现它也不能提供满意的回答。同时你还需要确认你的CHATBOT的打分标准,可以是每个小时服务的客户数量或者NPS得分,或者其他指标。你需要检测这些指标,聪明的CHATBOT可以在很多重要衡量指标上向你提供不间断的反馈信息。在这篇文章中,我们会以为航空公司创建CHATBOT为例子,来简要介绍如果使你的CHATBOT更聪明,更敏捷,更强健,最终满足你的商业应用。

我们周围有各式各样的CHATBOT。它们可以帮助得知明天的天气,向你定时更新某个新闻,帮你安排会议时间,管理你的财产或者如果你愿意,你可以和他们谈心,成为朋友。但是我们今天谈论的是应用到这些聊天应用的本质的东西,关于对话和训练CHATBOT对话的一些原则方法

确立了目标之后,你需要考虑我们可以从日常的互动中学习什么经验。其中一个“骗局”就是,其实在CHATBOT背后有很多工程师前提编程好的答案,比如当你对SIRI说:给我将一个笑话时。SIRI并不是真的当场“想“出了一个上周末听到的笑话。实际是,SIRI后台在咨询查询表格(consult a lookup table),苹果的工程师提前设想到我们会问这个问题所以把这个问题编写在SIRI知识库里面。对于大多数的公司,这个方法是可用的。记得我们上文提到的”普通事件可以用普通语言回答么“,你可以对某个应用场景中经常出现的问题,编写相应的回答。这样可行是因为我们预先就可以猜测客户会和我们如何互动,或者我们知道客户的常规的行为方式。通常,你的客服人员会知道客户经常会问哪些问题,或者通过你的选票系统或者其他的大数据分析,你可以知道你的客户经常会问的问题和对话的方式,所以我们只需要尽可能的把把所有的情况都编码进去就可以了是么?实际上,CHATBOT没有那么简单,我们上面谈论的只是一个信息检索系统,或者搜索系统,但是成功的CHATBOT不是搜索栏。它需要有互动,需要有对话,需要协同行动。下面我们想要介绍4重算法训练,你通常需要从你的数据库或者数据服务提供商哪里获取训练数据,这些数据用来使你的CHATBOT里面对话和客户进行互动,分别是:

    表达方式:描述同样一件事有多少种表达方式?你的CHATBOT需要了解尽可能的表达方式,否则它永远是迷惑的

    相关性:某个特定的回答是否和某个问题相关?

    意图检测:你的CHATBOT明白你的客户的意图或者目的么?如果它不明白客户想要做什么,那么无法协同行动。

    实体提取:”我特别想吃苹果"和“这个苹果特别好吃”是不一样的意思。实体提取对于算法理解语言的细微差别非常有帮      助。

训练CHATBOT的4种语言任务

1: Utterance, or, How Many Ways Can YouOrder a Pizza?

To work at all, your chatbot needs to understand what users are asking it to do. And while you can likely easily identify the most frequent,most normal requests from a user, it's tough to come up with every permutation of those core questions on your own.That's what utterance data collection is all about.The task is simple: set up a task where a bunch of people come up with different ways to ask the same question. What's the question? That'sup to you and your team. But you'd be surprised just how many ways there are to ask for the simplest things.

KEY USE CASES

• Transforming FAQ content into a chatbot(you’ve already written answers, but want tomatch them to lots of different questions)

• Building up voice/text activation for a new feature (how many ways are there to ask fora song to play?)

An example? Reddit's Random Acts of Pizza,where people ask for pizzas and potentially the community responds. If we look at 5,671 requests for pizza, we’ll find that 99.4% of all of them have unique titles. In fact, there were only four repeats at all! Inside the body of the posts themselves, the only repetition that exists over 27,000 sentences are basically just greetings and assorted gratitude:

如何为chatbot提供训练语料

This a good example of the breadth of just simple requests. “Please pass the salt” and “Salt!” are both ways to make a request, after all, but they feel rather different. And while people will interact with chatbots differently than people(think about how you search for shoes or use Google; it's not exactly how you talk to your friends), accruing a database of the ways people ask for things gives your chatbot fuel to answer those requests in kind.

Now, a section or two ago, we mentioned that we're going to use this eBook to demonstrate how to create the data you need to train a chatbot. We chose to create data around an airline customer service chatbot, but of course,you can do utterance tasks for whatever utterances you want to capture.For our example job, we chose to ask for ways to ask for "can I change my flight?" Again, there are no specifics here (like "I need to change flight 563" or "I have to fly to Vegas instead") so the pool of utterance data is artificially limited a bit,but here's how you do it:

如何为chatbot提供训练语料

Pretty easy right? Now, one of the things we prides itself on is quality control.But with utterance tasks, that can be tough. You can't come up with the "correct" ways to ask this question (in fact, you're trying to accrue just that data) so you can't use the typical test question format most of jobs take. We get around this ina pretty simple way: two different, intertwined jobs.

Next, let’s look at relevance:

2: Relevance, or, Are We Making Senseor Not?

Once you have a set of utterances, you want tobe able to match them with answers and actions.Relevance tasks do this by giving you trainingdata about you can use to map utterances thatusers might say to the help pages and actiontriggersin your database. They are usually of theform, “here’s a question, here’s an answer, howrelevant is it?”In doing this mapping, you are likely to find thatcertain flavors of questions need longer orshorter responses. The more a response justlooks like “the best matching paragraphs” or "anadjacent answer from our FAQ section," the lessdirect help it offers, the less human it feels, andthe less satisfied your user is.To get a sense of how people know what tosay, let’s look at the four maxims Paul Gricedeveloped that people follow when talking. Ifyou flout these maxims, things get weird.

1. Quantity: be as informative as you possiblycan and give as much information as is needed,and no more

2. Quality: try to be truthful and don’t giveinformation that is false or that is not supportedby evidence

3. Relation: try to be relevant and say things thatare pertinent to the discussion

4. Manner: try to be as clear, as brief, and asorderly as you can in what you say and avoidobscurity and ambiguity

We can reduce these even more. For DanSperber and Deirdre Wilson, the centralthing is “Be relevant”. Or more formally:The issue for chatbots is they can havetrouble understanding context. They'recertainly worse at it than we are. And because of that, some of their responsesare, well, irrelevant. And irrelevantresponses make for bad conversations.They don't coordinate joint action.

This is one of the reasonsit's much simpler to createa chatbot to handle discreteissues (like rescheduling aflight) than one that justwants to talk about any old thing

You see similar tasks in search relevance projects:given a query, does this resultmatch? Is it relevant? Doingthat with chatbot question/answer pairings gives youthe tools you need to tweakyour models and make themmore accurate. It also willshow you where your modelis falling down and where it's succeeding.

3: Intent, or, What Were You Trying to Do Anyway?

When we’re engaging with people in jointactivities like conversation, we are (orbecome) attuned to their intentions. That’swhat’s behind the comedy of somethinglike Lucy and Charlie Brown’s “I knowyou know I know you know” chains ofreasoning. Other minds aren’t entirely opaque to us, even if we tend to fill them inwith our own projections.

Much like the last example, you see intentwork in informational retrieval projects likeinternal search relevance tasks. Basically:does this output match the intent of whatsomeone wanted? When someone searchesfor an iPhone and they're presented with aniPhone case, does that match their intent?The same is true for chatbot replies. Givena question from your utterance corpus,how relevant is the answer your model orhardcoded bot returns?

The reality is that relevance isn't quiteenough for chatbots. Conversationis simply too complicated for simplerelevance to make chatbot responsesgood enough.

Take the airline customer chatbot we'rebuilding. Imagine a customer typing"baggage fees?" What do they actuallymean? Are they asking what the baggagefees for a particular flight are? Are theydemanding a refund for baggage fees theywere recently charged? A chatbot whodoesn't understand context and intentmight just send the customer to an FAQabout baggage fees. And that customerisn't going to be particularly enthusedabout that interaction.

Intent and relevance are intrinsicallylinked. You want to start the process byidentifying which flags your chatbot willbe able to support. Do you want to handleyour top ten issues? Top five? You wantto tackle as many permutations of thoseconversations as possible in your relevanceand intent tasks. And keep in mind,these tasks are sometimes even morevaluable for tuning your bot after it’s beenreleased or with test conversations youconduct with it. You'll be able to analyzewhole conversations, find out wherethey fall down, and give annotators fullerconversations to understand customerintent.

Because, really, that's an important pointhere: intent shows itself most clearlyin the context of a full conversation.That "baggage fees?" comment means amuch different thing based on particular,individual conversations.

如何为chatbot提供训练语料

Intent tasks often present annotators withconversations (or snippets thereof) and askusers if the chatbot is understanding theintent of the customer. In the places it didnot, it's important to understand whereand why your bot hit a snag. Once that'sunderstood, you can hone your models orhard-code answers to deal with preciselythose issues.

Last thing: remember that point we madeabout your chatbot's personality? That playshere. If your chatbot isn't sure it's going tobe relevant (essentially, it's unconfidentabout output) or is at sea over intent, justask! Chatbots that deal with requests byasking a series of probing questions to findthe exact thing that user is looking to doare far, far more successful that those thatmake pseudo-guesses where they're notfully confident. When in doubt, your chatbotshould aim towards further clarity, notaction.

4: Entity Recognition, or, WhichWashington is this Washington?

Entity recognition is the last major trainingjob for your algorithm. Essentially, itinvolves looking at passages of texts andidentifying "entities" within. Those mightbe places, people, product names, youname it, but generally work best lookingfor specific entities that are valid for yourparticular use case.

Take our example use case of an airlineservice chatbot. If you tell it that you'relooking to go to Washington, what doesthat mean? Because it could mean any ofthe following:

如何为chatbot提供训练语料

You get the idea. Now, if you're buildinga chatbot that's looking to engage overAmerican history, Washington has atotally different meaning. Ditto to a botlooking to give out college sports scores.The list goes on.

For starters, this is why more generic,multi-purpose bots are so difficult andwhy context is so important for anychatbot. But it's also why you need towork on entity extraction for your chatbotproject. In fact, named entity recognitionis one of the basic building blocks ofnatural language processing and it allowsyour bot to function properly

We've created an entity extractiontool that's very similar to a popular oneyou may have heard of called BRAT.Essentially, on our platform, you provideusers with text blocks and they highlightthe entities you care about. You can seean example below:

如何为chatbot提供训练语料

In that screenshot, we're interested in afew salient things to build to our airlinechatbot. Note especially that numbers areimportant here. Is it a flight number? Anarrival time? An amount of ounces for carryonsunscreen? The more examples of namedentities your model sees, the more it learnsto understand that some time people typingwon't write "7:25" and instead just write"725" but your bot will actually understand.That increases your bot's accuracy, itsability to actually converse, and, yes, makesit function in the way it's supposed to:coordinating joint action.

CONCLUSION

Nice as it would be, you can't just buy chatbot software out of a box and simply deployit. You need to test, tune, and train your chatbot. Hopefully, this eBook gave you theunderstanding of how that's actually done. But we do want to highlight a few of the keytakeaways we'd love to leave you with now that we're finished:

• Conversations are about coordinating joint action. The best chatbots have realconversations and, thus, coordinate realjoint actions

.• When in doubt, make sure your chatbotis curious. A curious chatbot understandswhat a user really wants before acting. Andpeople are much more willing to answera few extra questions than deal with badoutcomes.

• There are four major chatbot dataprojects. Each are important.

 They are: • Utterance tasks: How many ways arethere to say a thing?

              • Relevance tasks: Does this responseeven make sense?

          • Intent tasks: What did the user want tohappen here?

           • Entity extraction: What are theseparticular words exactly

继续阅读