天天看点

检测播客中的无关内容(CS CL)

播客剧集通常包含与主要内容无关的内容,如广告,在音频和书面描述中交织在一起。我们提供利用文本和收听模式的分类器,以便在播客描述和音频脚本中检测此类内容。我们通过评估播客总结的下游任务来证明我们的模型是有效的,并表明我们可以实质性地提高ROUGE 分数并减少摘要中生成的无关内容。

标题原文:Detecting Extraneous Content in Podcasts

原文:Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to detect such content in podcast descriptions and audio transcripts. We demonstrate that our models are effective by evaluating them on the downstream task of podcast summarization and show that we can substantively improve ROUGE scores and reduce the extraneous content generated in the summaries.

2103.02585.pdf