Ask me anything: free-form visual question answering based on knowledge from external sources

Wu, Q.; Wang, P.; Shen, C.; Dick, A.; Van Den Hengel, A.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/108079

Scopus	Web of Science®	Altmetric
Citations
?	?

Type:	Conference paper
Title:	Ask me anything: free-form visual question answering based on knowledge from external sources
Author:	Wu, Q. Wang, P. Shen, C. Dick, A. Van Den Hengel, A.
Citation:	Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, vol.2016-December, pp.4622-4630
Publisher:	IEEE
Issue Date:	2016
Series/Report no.:	IEEE Conference on Computer Vision and Pattern Recognition
ISBN:	9781467388511
ISSN:	1063-6919
Conference Name:	29th IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (26 Jun 2016 - 1 Jul 2016 : Las Vegas, NV)
Statement of Responsibility:	Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, Anton van den Hengel
Abstract:	We propose a method for visual question answering which combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. This allows more complex questions to be answered using the predominant neural network-based approach than has previously been possible. It particularly allows questions to be asked about the contents of an image, even when the image itself does not contain the whole answer. The method constructs a textual representation of the semantic content of an image, and merges it with textual information sourced from a knowledge base, to develop a deeper understanding of the scene viewed. Priming a recurrent neural network with this combined information, and the submitted question, leads to a very flexible visual question answering approach. We are specifically able to answer questions posed in natural language, that refer to information not contained in the image. We demonstrate the effectiveness of our model on two publicly available datasets, Toronto COCO-QA [23] and VQA [1] and show that it produces the best reported results in both cases.
Rights:	© 2016 IEEE
DOI:	10.1109/CVPR.2016.500
Published version:	http://dx.doi.org/10.1109/cvpr.2016.500
Appears in Collections:	Aurora harvest 8 Computer Science publications

Files in This Item:

File	Description	Size	Format
RA_hdl_108079.pdf Restricted Access	Restricted Access	1.69 MB	Adobe PDF	View/Open

Show full item record

Adelaide Research & Scholarship