arXiv每日推荐-5.9:语音/音频每日论文速递

2023-06-24 13:24:44

同步公众号(arXiv每日学术速递)

【1】 The Perceptimatic English Benchmark for Speech Perception Models

标题：言语感知模型的感性英语基准

作者： Juliette Millet, Ewan Dunbar

备注：Accepted to CogSci Conference 2020

链接：https://arxiv.org/abs/2005.03418

【2】 Crop Aggregating for short utterances speaker verification using raw waveforms

标题：使用原始波形的短话语的裁剪聚集说话人验证

作者： Seung-bin Kim, Ha-Jin Yu

链接：https://arxiv.org/abs/2005.03329

【3】 Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data

标题：Coatron：转录导向语音编码器，用于无并行数据的任意对多语音转换

作者： Seung-won Park, Myun-chul Joe

备注：Submitted to Interspeech 2020

链接：https://arxiv.org/abs/2005.03295

【4】 ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

标题：ContextNet：用于全局上下文自动语音识别的改进卷积神经网络

作者： Wei Han, Yonghui Wu

链接：https://arxiv.org/abs/2005.0319

【5】 Study of human phonation in a full body domain

标题：人体全身发声的研究

作者： Shakti Saurabh, Daniel Bodony

链接：https://arxiv.org/abs/2005.02168

【6】 End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Layer-wise Transfer Learning

标题：基于频率加权和分层转移学习的端到端耳语语音识别

作者： Heng-Jui Chang, Lin-shan Lee

备注：submitted to INTERSPEECH 2020

链接：https://arxiv.org/abs/2005.0197

【7】 Study of human phonation in a full body domain

标题：人体全身发声的研究

作者： Shakti Saurabh, Daniel Bodony

链接：https://arxiv.org/abs/2005.02168

【8】 End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Layer-wise Transfer Learning

标题：基于频率加权和分层转移学习的端到端耳语语音识别

作者： Heng-Jui Chang, Lin-shan Lee

备注：submitted to INTERSPEECH 2020

链接：https://arxiv.org/abs/2005.0197

arXiv每日推荐-5.9:语音/音频每日论文速递

继续阅读

李宏毅深度学习 Transformer一、Transformer是什么二、训练Transformer的Tips

7-FreeSwitch-mrcp-plugin-with-freeswitch（亲测可用，自我整理）

百度语音识别SDK使用方法

放肆玩，一起玩！这次鸿蒙4主打一个时尚、智慧、流畅。【设计更年轻更时尚】这应该是鸿蒙视觉层面迄今为止最大幅度升级。杂志化

基于MATLAB的多方法车牌识别识别系统【GUI，多方法，对比，语音播报，出入库，剩余车位】...

基于MATLAB的车票识别系统

基于MATLAB的说话人识别系统

基于ASRT中文语音识别系统的优化

2018自然语言研究报告

论文阅读笔记20.05-第三周：ResNet的多种变种Residual Attention Network for Image ClassificationRes2Net: A New Multi-scale Backbone ArchitectureResNeSt: Split-Attention Networks

【新到车型】雷克萨斯2020款ES200豪华版【上牌时间】2021年3月【行驶里程】4.7万KM【4S店指导价】30.9

MATLAB神经网络手写数字识别（GUI界面）

语音识别，语义理解一站式解决（android平台&olami sdk）

Android语音识别SDK语义理解与解析方法

语音识别之HTK重理解

电话机器人API接口-空号识别-座席WEBAPI