TräumerAI: Dreaming Music with StyleGAN
Dasaem Jeong, Seungheon Doh, Taegyun Kwon
The goal of this paper to generate a visually appealing video that responds to music with a neural network so that each frame of the video reflects the musical characteristics of the corresponding audio clip. To achieve the goal, we propose a neural music visualizer directly mapping deep music embeddings to style embeddings of StyleGAN, named TräumerAI, which consists of a music auto-tagging model using short-chunk CNN and StyleGAN2 pre-trained on WikiArt dataset. Rather than establishing an objective metric between musical and visual semantics, we manually labeled the pairs in a subjective manner. An annotator listened to 100 music clips of 10 seconds long and selected an image that suits the music among the 200 StyleGAN-generated examples. Based on the collected data, we trained a simple transfer function that converts an audio embedding to a style embedding. The generated examples show that the mapping between audio and video makes a certain level of intra-segment similarity and inter-segment dissimilarity.
https://arxiv.org/abs/2102.04680
TräumerAI:借助StyleGAN梦想音乐
本文的目的是生成一个视觉吸引人的视频,该视频通过神经网络对音乐做出响应,从而使视频的每一帧都反映相应音频剪辑的音乐特征。为了实现该目标,我们提出了一种神经音乐可视化程序,将深层音乐嵌入直接映射到StyleGAN的样式嵌入,名为TräumerAI,它由使用短块CNN和StyleGAN2的音乐自动标记模型组成,该模型在WikiArt数据集上进行了预训练。我们没有在音乐和视觉语义之间建立客观的度量标准,而是以一种主观的方式手动标记了这些对。注释者收听了100个10秒长的音乐剪辑,并从200个StyleGAN生成的示例中选择了适合音乐的图像。根据收集的数据,我们训练了一个简单的传递函数,该函数将音频嵌入转换为样式嵌入。生成的示例表明,音频和视频之间的映射使段内相似度和段间不相似度达到一定程度。
[10.pdf]