天天看點

arXiv每日推薦-4.8:語音/音頻每日論文速遞

同步公衆号(arXiv每日學術速遞)

【1】 SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement

标題:用于魯棒DNN語音增強的基于SNR的特征和多樣化的訓練資料

作者: Robert Rehr, Timo Gerkmann

連結:https://arxiv.org/abs/2004.03512

【2】 Homophone-based Label Smoothing in End-to-End Automatic Speech Recognition

标題:端到端自動語音識别中基于同音字的标簽平滑

作者: Yi Zheng, Xuyong Dang

連結:https://arxiv.org/abs/2004.03437

【3】 Learning to fool the speaker recognition

标題:學習愚弄說話人識别

作者: Jiguo Li, Wen Gao

備注:Accepted by ICASSP2020

連結:https://arxiv.org/abs/2004.03434

【4】 Universal Adversarial Perturbations Generative Network for Speaker Recognition

标題:用于說話人識别的通用對抗性擾動生成網絡

作者: Jiguo Li, Wen Gao

備注:Accepted by ICME2020

連結:https://arxiv.org/abs/2004.03428

【5】 Direct Speech-to-image Translation

标題:直接語音到圖像翻譯

作者: Jiguo Li, Wen Gao

連結:https://arxiv.org/abs/2004.03413

【6】 Multi-Scale Aggregation Using Feature Pyramid Module for Text-Independent Speaker Verification

标題:使用特征金字塔子產品進行文本無關說話人确認的多尺度聚合

作者: Youngmoon Jung, Hoirin Kim

備注:Submitted to Interspeech 2020

連結:https://arxiv.org/abs/2004.0319

【7】 Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System

标題:基于深層遞歸神經網絡和神經模糊系統的情感視訊到音頻轉換

作者: Gwenaelle Cunha Sergio, Minho Lee

連結:https://arxiv.org/abs/2004.02113

【8】 Simultaneous Denoising and Dereverberation Using Deep Embedding Features

标題:基于深度嵌入特征的同時去噪和去混響

作者: Cunhang Fan, Zhengqi Wen

連結:https://arxiv.org/abs/2004.0242

【9】 Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection

标題:使用生成對抗網絡的臨時感覺上下文模組化用于語音活動檢測

作者: Tharindu Fernando, Clinton Fookes

連結:https://arxiv.org/abs/2004.01546

【10】 Towards democratizing music production with AI-Design of Variational Autoencoder-based Rhythm Generator as a DAW plugin

标題:用AI實作音樂生産的民主化-基于可變自動編碼器的節奏生成器作為DAW插件的設計

作者: Nao Tokui

連結:https://arxiv.org/abs/2004.01525

【11】 Can Machine Learning Be Used to Recognize and Diagnose Coughs?

标題:機器學習可以用來識别和診斷咳嗽嗎?

作者: Charles Bales, Ali Imran

連結:https://arxiv.org/abs/2004.01495

【12】 AI4COVID-19: AI Enabled Preliminary Diagnosis for COVID-19 from Cough Samples via an App

标題:AI4COVID-19:AI通過App從咳嗽樣本中啟用了COVID-19的初步診斷

作者: Ali Imran, Muhammad Nabeel

連結:https://arxiv.org/abs/2004.01275

【13】 Towards Relevance and Sequence Modeling in Language Recognition

标題:語言識别中的相關性和序列模組化

作者: Bharat Padi, Sriram Ganapathy

連結:https://arxiv.org/abs/2004.0122

【14】 Multi-Modal Video Forensic Platform for Investigating Post-Terrorist Attack Scenarios

标題:用于調查後恐怖襲擊場景的多模式視訊驗證平台

作者: Alexander Schindler, Ross King

連結:https://arxiv.org/abs/2004.01023

【15】 Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model

标題:基于LSTM語言模型的混合HMM語音識别的全和譯碼

作者: Wei Zhou, Hermann Ney

備注:accepted at ICASSP 2020

連結:https://arxiv.org/abs/2004.00967

【16】 The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

标題:用于TED-lium Release 2的RWTH ASR系統:使用SpecAugment改進混合HMM

作者: Wei Zhou, Hermann Ney

備注:accepted at ICASSP 2020

連結:https://arxiv.org/abs/2004.00960

【17】 iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

标題:iMetricGAN:使用基于生成性對抗網絡的度量學習增強噪聲中語音的可懂度

作者: Haoyu Li, Junichi Yamagishi

備注:5 pages, Submitted to INTERSPEECH 2020

連結:https://arxiv.org/abs/2004.00932

【18】 Improving auditory attention decoding performance of linear and non-linear methods using state-space model

标題:利用狀态空間模型改善線性和非線性方法的聽覺注意解碼性能

作者: Ali Aroudi, Simon Doclo

連結:https://arxiv.org/abs/2004.0091

【19】 Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset

标題:使用擴充的Groove MIDI資料集提高鼓轉錄的感覺品質

作者: Lee Callender, Jesse Engel

連結:https://arxiv.org/abs/2004.00188

【20】 AM-MobileNet1D: A Portable Model for Speaker Recognition

标題:AM-MobileNet1D:一種可移植的說話人識别模型

作者: João Antônio Chagas Nunes, Cleber Zanchettin

連結:https://arxiv.org/abs/2004.00132

【21】 VaPar Synth – A Variational Parametric Model for Audio Synthesis

标題:VaPar Synth-一種音頻合成的變分參數模型

作者: Krishna Subramani, Alexandre D’Hooge

備注:this https URL , Accepted in ICASSP 2020

連結:https://arxiv.org/abs/2004.0000

【22】 Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

标題:使用自我注意U-Net增強來表征語音對抗執行個體

作者: Chao-Han Huck Yang, Chin-Hui Lee

備注:The first draft was finished in August 2019. Accepted to IEEE ICASSP 2020

連結:https://arxiv.org/abs/2003.1391

繼續閱讀