Mid-level representations for action recognition and zero-shot learning

Qiao, Ruizhi

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/113587

Type:	Theses
Title:	Mid-level representations for action recognition and zero-shot learning
Author:	Qiao, Ruizhi
Issue Date:	2017
School/Discipline:	School of Computer Science
Abstract:	Compared with low-level features, mid-level representations of visual objects contain more discriminative and interpretable information and are beneficial for improving performance of classification and sharing learned information across object categories. These benefits draw tremendous attention of the computer vision communities and lots of breakthroughs have been made for various computer vision tasks with mid-level representations. In this thesis, we focus on the following problems regarding mid-level representations: 1) How to extract discriminative mid-level representations from local features? 2) How to suppress noisy components from mid-level representations? 3) And how to address the issue of visual-semantic discrepancy in mid-level representations? We deal with the first problem in the task of action recognition and the other two problems in the task of zero-shot learning. For the first problem, we devise a representation suitable for characterising human actions on the basis of a sequence of pose estimates generated by an RGB-D sensor. We show that discriminate sequence of poses typically occur over a short time window, and thus we propose a simple-but-effective local descriptor called a trajectorylet to capture the static and kinematic information within this interval. We also show that state of the art recognition results can be achieved by encoding each trajectorylet using a discriminative trajectorylet detector set which is selected from a large number of candidate detectors trained through exemplar-SVMs. The mid-level representation is obtained by pooling trajectorylet encodings. For the second problem, we follow the attractive research topic zero-shot learning and focus on classifying a visual concept merely from its associated online textual source, such as a Wikipedia article. We go further to consider one important factor: the textual representation as a mid-level representation is usually too noisy for the zero-shot learning tasks. We design a simple yet effective zero-shot learning method that is capable of suppressing noise in the text. Specifically, we propose an l₂‚₁-norm based objective function which can simultaneously suppress the noisy signal in the text and learn a function to match the text document and visual features. We also develop an optimization algorithm to efficiently solve the resulting problem. For the third problem, we observe that distributed word embeddings, which become a popular mid-level representation for zero-shot learning due to their easy accessibility, are designed to reflect semantic similarity rather than visual similarity and thus using them in zero-shot learning often leads to inferior performance To overcome this visual-semantic discrepancy, we here re-align the distributed word embedding with visual information by learning a neural network to map it into a new representation called the visually aligned word embedding (VAWE). We further design an objective function to encourage the neighbourhood structure of VAWEs to mirror that in the visual domain. This strategy gives more freedom in learning the mapping function and allows the learned mapping function to generalize to zeroshot learning methods and different visual features.
Advisor:	Shen, Chunhua Liu, Lingqiao van den Hengel, Anton John
Dissertation Note:	Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2018
Keywords:	Image classification action recognition zero-shot learning attribute word embedding
Provenance:	This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at http://www.adelaide.edu.au/legals
Appears in Collections:	Research Theses

Files in This Item:

File	Description	Size	Format
Qiao2018_PhD.pdf		2.28 MB	Adobe PDF	View/Open

Show full item record

Adelaide Research & Scholarship