arXiv每日推薦-4.8:語音/音頻每日論文速遞

同步公衆号(arXiv每日學術速遞)

【1】 SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement

标題：用于魯棒DNN語音增強的基于SNR的特征和多樣化的訓練資料

作者： Robert Rehr, Timo Gerkmann

連結：https://arxiv.org/abs/2004.03512

【2】 Homophone-based Label Smoothing in End-to-End Automatic Speech Recognition

标題：端到端自動語音識别中基于同音字的标簽平滑

作者： Yi Zheng, Xuyong Dang

連結：https://arxiv.org/abs/2004.03437

【3】 Learning to fool the speaker recognition

标題：學習愚弄說話人識别

作者： Jiguo Li, Wen Gao

備注：Accepted by ICASSP2020

連結：https://arxiv.org/abs/2004.03434

【4】 Universal Adversarial Perturbations Generative Network for Speaker Recognition

标題：用于說話人識别的通用對抗性擾動生成網絡

作者： Jiguo Li, Wen Gao

備注：Accepted by ICME2020

連結：https://arxiv.org/abs/2004.03428

【5】 Direct Speech-to-image Translation

标題：直接語音到圖像翻譯

作者： Jiguo Li, Wen Gao

連結：https://arxiv.org/abs/2004.03413

【6】 Multi-Scale Aggregation Using Feature Pyramid Module for Text-Independent Speaker Verification

标題：使用特征金字塔子產品進行文本無關說話人确認的多尺度聚合

作者： Youngmoon Jung, Hoirin Kim

備注：Submitted to Interspeech 2020

連結：https://arxiv.org/abs/2004.0319

【7】 Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System

标題：基于深層遞歸神經網絡和神經模糊系統的情感視訊到音頻轉換

作者： Gwenaelle Cunha Sergio, Minho Lee

連結：https://arxiv.org/abs/2004.02113

【8】 Simultaneous Denoising and Dereverberation Using Deep Embedding Features

标題：基于深度嵌入特征的同時去噪和去混響

作者： Cunhang Fan, Zhengqi Wen

連結：https://arxiv.org/abs/2004.0242

【9】 Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection

标題：使用生成對抗網絡的臨時感覺上下文模組化用于語音活動檢測

作者： Tharindu Fernando, Clinton Fookes

連結：https://arxiv.org/abs/2004.01546

【10】 Towards democratizing music production with AI-Design of Variational Autoencoder-based Rhythm Generator as a DAW plugin

标題：用AI實作音樂生産的民主化-基于可變自動編碼器的節奏生成器作為DAW插件的設計

作者： Nao Tokui

連結：https://arxiv.org/abs/2004.01525

【11】 Can Machine Learning Be Used to Recognize and Diagnose Coughs?

标題：機器學習可以用來識别和診斷咳嗽嗎？

作者： Charles Bales, Ali Imran

連結：https://arxiv.org/abs/2004.01495

【12】 AI4COVID-19: AI Enabled Preliminary Diagnosis for COVID-19 from Cough Samples via an App

标題：AI4COVID-19：AI通過App從咳嗽樣本中啟用了COVID-19的初步診斷

作者： Ali Imran, Muhammad Nabeel

連結：https://arxiv.org/abs/2004.01275

【13】 Towards Relevance and Sequence Modeling in Language Recognition

标題：語言識别中的相關性和序列模組化

作者： Bharat Padi, Sriram Ganapathy

連結：https://arxiv.org/abs/2004.0122

【14】 Multi-Modal Video Forensic Platform for Investigating Post-Terrorist Attack Scenarios

标題：用于調查後恐怖襲擊場景的多模式視訊驗證平台

作者： Alexander Schindler, Ross King

連結：https://arxiv.org/abs/2004.01023

【15】 Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model

标題：基于LSTM語言模型的混合HMM語音識别的全和譯碼

作者： Wei Zhou, Hermann Ney

備注：accepted at ICASSP 2020

連結：https://arxiv.org/abs/2004.00967

【16】 The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

标題：用于TED-lium Release 2的RWTH ASR系統：使用SpecAugment改進混合HMM

作者： Wei Zhou, Hermann Ney

備注：accepted at ICASSP 2020

連結：https://arxiv.org/abs/2004.00960

【17】 iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

标題：iMetricGAN：使用基于生成性對抗網絡的度量學習增強噪聲中語音的可懂度

作者： Haoyu Li, Junichi Yamagishi

備注：5 pages, Submitted to INTERSPEECH 2020

連結：https://arxiv.org/abs/2004.00932

【18】 Improving auditory attention decoding performance of linear and non-linear methods using state-space model

标題：利用狀态空間模型改善線性和非線性方法的聽覺注意解碼性能

作者： Ali Aroudi, Simon Doclo

連結：https://arxiv.org/abs/2004.0091

【19】 Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset

标題：使用擴充的Groove MIDI資料集提高鼓轉錄的感覺品質

作者： Lee Callender, Jesse Engel

連結：https://arxiv.org/abs/2004.00188

【20】 AM-MobileNet1D: A Portable Model for Speaker Recognition

标題：AM-MobileNet1D：一種可移植的說話人識别模型

作者： João Antônio Chagas Nunes, Cleber Zanchettin

連結：https://arxiv.org/abs/2004.00132

【21】 VaPar Synth – A Variational Parametric Model for Audio Synthesis

标題：VaPar Synth-一種音頻合成的變分參數模型

作者： Krishna Subramani, Alexandre D’Hooge

備注：this https URL , Accepted in ICASSP 2020

連結：https://arxiv.org/abs/2004.0000

【22】 Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

标題：使用自我注意U-Net增強來表征語音對抗執行個體

作者： Chao-Han Huck Yang, Chin-Hui Lee

備注：The first draft was finished in August 2019. Accepted to IEEE ICASSP 2020

連結：https://arxiv.org/abs/2003.1391

arXiv每日推薦-4.8:語音/音頻每日論文速遞

繼續閱讀

李宏毅深度學習 Transformer一、Transformer是什麼二、訓練Transformer的Tips

7-FreeSwitch-mrcp-plugin-with-freeswitch（親測可用，自我整理）

百度語音識别SDK使用方法

放肆玩，一起玩！這次鴻蒙4主打一個時尚、智慧、流暢。【設計更年輕更時尚】這應該是鴻蒙視覺層面迄今為止最大幅度更新。雜志化

基于MATLAB的多方法車牌識别識别系統【GUI，多方法，對比，語音播報，出入庫，剩餘車位】...

基于MATLAB的車票識别系統

基于MATLAB的說話人識别系統

基于ASRT中文語音識别系統的優化

2018自然語言研究報告

論文閱讀筆記20.05-第三周：ResNet的多種變種Residual Attention Network for Image ClassificationRes2Net: A New Multi-scale Backbone ArchitectureResNeSt: Split-Attention Networks

【新到車型】雷克薩斯2020款ES200豪華版【上牌時間】2021年3月【行駛裡程】4.7萬KM【4S店指導價】30.9

MATLAB神經網絡手寫數字識别（GUI界面）

語音識别，語義了解一站式解決（android平台&olami sdk）

Android語音識别SDK語義了解與解析方法

語音識别之HTK重了解

電話機器人API接口-空号識别-座席WEBAPI