語音頂會Interspeech 論文解讀｜Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robus

2019年，國際語音交流協會INTERSPEECH第20屆年會将于9月15日至19日在奧地利格拉茨舉行。Interspeech是世界上規模最大，最全面的頂級語音領域會議，近2000名一線業界和學界人士将會參與包括主題演講，Tutorial，論文講解和主會展覽等活動，本次阿裡論文有8篇入選，本文為Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma的論文《Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition》

點選下載下傳論文

文章解讀

自動語音識别系統（ASR）在實際生活中有着廣泛的應用場景，不過通常由于周遭環境的噪聲和混響的影響，自動語音識别的結果出現錯誤和不穩定的情況。提高自動語音識别系統的魯棒性是推廣其應用的一個關鍵問題。為了解決這個問題，增加語音增強子產品和模型适應訓練已經研究了很長時間。最近，在統一模組化架構中利用同時訓練降噪和語音識别的多任務聯合學習方案顯示出令人鼓舞的進展，不過目前模型訓練仍高度依賴于成對的幹淨和噪聲資料。為了克服這一限制，研究者開始引進對抗性生成網絡（GAN）和對抗性訓練方法到聲學模型的訓練中，由于無需複雜的前端設計和配對訓練資料，大大簡化了模型訓練過程和要求。盡管對抗性生成網絡在計算機視覺領域發展迅速，但目前隻引進了正常對抗性生成網絡和進行了有限的模型訓練實驗，而且正常對抗性生成網絡存在模式崩潰缺陷常常導緻訓練失敗問題。

在這項工作中，我們采用更先進的循環一緻性對抗性生成網絡（CycleGAN）來解決由于正常對抗性生成網絡模式崩潰導緻的訓練失敗問題，另外，結合最近流行的深度殘差網絡（ResNets），我們進一步将多任務學習方案擴充為多任務多網絡聯合學習方案，以實作更強大的降噪功能和模型自适應訓練功能。

語音頂會Interspeech 論文解讀｜Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robus

基于CHiME-4的單聲道自動語音識别的實驗結果表明，與最先進的聯合學習方法相比（B），我們提出的方法通過實作更低的字錯誤率（WER）顯着提高了自動語音識别系統的噪聲魯棒性。

基于循環一緻性對抗性生成網絡，我們提出的多任務多網絡聯合學習方案較好的解決了模式崩潰問題。

文章摘要

**Robustness of automatic speech recognition (ASR) systems is a critical issue due to noise and reverberations. Speech enhancement and model adaptation have been studied for long time to address this issue. Recently, the developments of multitask joint-learning scheme that addresses noise reduction and ASR criteria in a unified modeling framework show promising improvements, but the model training highly relies on paired clean-noisy data. To overcome this limit, the generative adversarial networks (GANs) and the adversarial training method are deployed, which have greatly simplified the model training process without the requirements of complex front-end design and paired training data. Despite the fast developments of GANs for computer visions, only regular GANs have been adopted for robust ASR. In this work, we adopt a more advanced cycleconsistency GAN (CycleGAN) to address the training failure problem due to mode collapse of regular GANs. Using deep residual networks (ResNets), we further expand the multi-task scheme to a multi-task multi-network joint-learning scheme for more robust noise reduction and model adaptation. Experiment results on CHiME-4 show that our proposed approach significantly improves the noise robustness of the ASR system by achieving much lower word error rates (WERs) than the stateof-the-art joint-learning approaches.

Index Terms: Robust speech recognition, convolutional neural

networks, acoustic model, generative adversarial networks

阿裡雲開發者社群整理

語音頂會Interspeech 論文解讀｜Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robus

點選下載下傳論文

文章解讀

文章摘要

繼續閱讀

Matlab中将二維灰階圖像三維顯示

如何成為一名Java開發者？

車道線檢測

自監督｜「CoCLR」視訊自監督對比學習筆記

webstorm中配置git

郵箱被盜，受到網絡釣魚攻擊，如何甄别規避？

Vue圖檔切換過渡設計

測試工程師面試解析～

2018年不想被web前端開發淘汰，你需要掌握哪些技術？

視訊對象分割（Video Object Segmentation）研究小記任務定義與資料集技術路線分類基于神經網絡的模型總結

opencv視覺跟蹤——消除背景模組化

圖形處理單元(GPU)的演進

2021-09-30三維點雲測量正方形包裹體積

DOG算子

K-近鄰算法以及圖像分類應用

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡