一、背景
本文实现的模型来自于论文:《Dynamic Memory Networks for Visual and Textual Question Answering》
之前做了两个月杂活,最近该上手实验了,这里先从别人的实验开始学习。这篇是视觉问答实验的第一篇。
实验数据比较多,图片用的是COCO的,文本标注是VQA 1.0的,另外还用到了vgg16,所以需要准备的东西也非常多。
二、论文简介
论文的下载链接为:https://arxiv.org/pdf/1603.01417.pdf
先给出论文的摘要:
Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown wheth