Visual Question Answering

Published: 09 Oct 2015 Category: deep_learning

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

VQA: Visual Question Answering

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Exploring Models and Data for Image Question Answering

Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering

Teaching Machines to Read and Comprehend

Neural Module Networks

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Neural Generative Question Answering

Stacked Attention Networks for Image Question Answering

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

Simple Baseline for Visual Question Answering

MovieQA: Understanding Stories in Movies through Question-Answering

Deeper LSTM+ normalized CNN for Visual Question Answering

A Neural Network for Factoid Question Answering over Paragraphs

Learning to Compose Neural Networks for Question Answering

Generating Natural Questions About an Image

Question Answering on Freebase via Relation Extraction and Textual Evidence

Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus

Character-Level Question Answering with Attention

A Focused Dynamic Attention Model for Visual Question Answering

Visual Question Answering Literature Survey

The DIY Guide to Visual Question Answering

Question Answering via Integer Programming over Semi-Structured Knowledge

Hierarchical Question-Image Co-Attention for Visual Question Answering

Multimodal Residual Learning for Visual QA

Simple Question Answering by Attentive Convolutional Neural Network

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

Simple and Effective Question Answering with Recurrent Neural Networks

Analyzing the Behavior of Visual Question Answering Models

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

Deep Language Modeling for Question Answering using Keras

Interpreting Visual Question Answering Models

The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering

Tutorial on Answering Questions about Images with Deep Learning

  • intro: The tutorial was presented at ‘2nd Summer School on Integrating Vision and Language: Deep Learning’ in Malta, 2016
  • arxiv: https://arxiv.org/abs/1610.01076

Hadamard Product for Low-rank Bilinear Pooling

Open-Ended Visual Question-Answering

Deep Learning for Question Answering

Dual Attention Networks for Multimodal Reasoning and Matching

Leveraging Video Descriptions to Learn Video Question Answering

Dynamic Coattention Networks For Question Answering

State of the art deep learning model for question answering

Zero-Shot Visual Question Answering

Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation

Question Answering through Transfer Learning from Large Fine-grained Supervision Data

Question Answering from Unstructured Text by Retrieval and Comprehension

Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering

Learning to Reason: End-to-End Module Networks for Visual Question Answering

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering

Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks

Learning Convolutional Text Representations for Visual Question Answering

Compact Tensor Pooling for Visual Question Answering

https://arxiv.org/abs/1706.06706

DeepStory: Video Story QA by Deep Embedded Memory Networks

Long-Term Memory Networks for Question Answering

Video Question Answering via Attribute-Augmented Attention Network Learning

Bottom-Up and Top-Down Attention for Image Captioning and VQA

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Structured Attentions for Visual Question Answering

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

  • intro: Winner of the 2017 Visual Question Answering (VQA) Challenge at CVPR
  • intro: The University of Adelaide & Australian National University & Microsoft Research
  • arxiv: https://arxiv.org/abs/1708.02711

MemexQA: Visual Memex Question Answering

Automatic Question-Answering Using A Deep Similarity Neural Network

Question Dependent Recurrent Entity Network for Question Answering

Visual Question Generation as Dual Task of Visual Question Answering

https://arxiv.org/abs/1709.07192

A Read-Write Memory Network for Movie Story Understanding

iVQA: Inverse Visual Question Answering

https://arxiv.org/abs/1710.03370

DCN+: Mixed Objective and Deep Residual Coattention for Question Answering

High-Order Attention Models for Visual Question Answering

Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards

Visual Question Answering as a Meta Learning Task

https://arxiv.org/abs/1711.08105

Embodied Question Answering

Learning by Asking Questions

https://arxiv.org/abs/1712.01238

Interpretable Counting for Visual Question Answering

https://arxiv.org/abs/1712.08697

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

DVQA: Understanding Data Visualizations via Question Answering

Object-based reasoning in VQA

Dual Recurrent Attention Units for Visual Question Answering

https://arxiv.org/abs/1802.00209

Differential Attention for Visual Question Answering

Question Type Guided Attention in Visual Question Answering

https://arxiv.org/abs/1804.02088

Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents

Projects

VQA Demo: Visual Question Answering Demo on pretrained model

deep-qa: Implementation of the Convolution Neural Network for factoid QA on the answer sentence selection task

YodaQA: A Question Answering system built on top of the Apache UIMA framework

insuranceQA-cnn-lstm: tensorflow and theano cnn code for insurance QA(question Answer matching)

Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering

Visual Question Answering with Keras

Deep Learning Models for Question Answering with Keras

GuessWhat?! Visual object discovery through multi-modal dialogue

Deep QA: Using deep learning to answer Aristo’s science questions

Visual Question Answering in Pytorch

https://github.com/Cadene/vqa.pytorch

Dataset

Visual7W: Grounded Question Answering in Images

Resources

Awesome Visual Question Answering