Please use this identifier to cite or link to this item:
|Title:||Deep learning based RGB-D vision tasks|
|School/Discipline:||School of Computer Science|
|Abstract:||Depth is an important source of information in computer vision. However, depth is usually discarded in most vision tasks. In this thesis, we study the tasks of estimating depth from single monocular images, and incorporating depth for object detection and semantic segmentation. Recently, a significant number of breakthroughs have been introduced to the vision community by deep convolutional neural networks (CNNs). All of our algorithms in this thesis are built upon deep CNNs. The first part of this thesis addresses the task of incorporating depth for object detection and semantic segmentation. The aim is to improve the performance of vision tasks that are only based on RGB data. Two approaches for object detection and two approaches for semantic segmentation are presented. These approaches are based on existing depth estimation, object detection and semantic segmentation algorithms. The second part of this thesis addresses the task of depth estimation. Depth estimation is often formulated as a regression task due to the continuous property of depths. Deep CNNs for depth estimation are trained by iteratively minimizing regression errors between predicted and ground-truth depths. A drawback of regression is that it predicts depths without confidence. In this thesis, we propose to formulate depth estimation as a classification task which naturally predicts depths with confidence. The confidence can be used during training and post-processing. We also propose to exploit ordinal depth relationships from stereo videos to improve the performance of metric depth estimation. By doing so we propose a Relative Depth in Stereo (RDIS) dataset that is densely annotated with relative depths.|
|Dissertation Note:||Thesis (Ph.D.) -- University of Adelaide,School of Computer Science , 2018|
|Provenance:||This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals|
|Appears in Collections:||Research Theses|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.