播客劇集通常包含與主要内容無關的内容,如廣告,在音頻和書面描述中交織在一起。我們提供利用文本和收聽模式的分類器,以便在播客描述和音頻腳本中檢測此類内容。我們通過評估播客總結的下遊任務來證明我們的模型是有效的,并表明我們可以實質性地提高ROUGE 分數并減少摘要中生成的無關内容。
标題原文:Detecting Extraneous Content in Podcasts
原文:Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to detect such content in podcast descriptions and audio transcripts. We demonstrate that our models are effective by evaluating them on the downstream task of podcast summarization and show that we can substantively improve ROUGE scores and reduce the extraneous content generated in the summaries.
2103.02585.pdf