Data-driven 3D Scene Understanding
|Title||Data-driven 3D Scene Understanding|
|Year of Publication||2020|
Among all digital representations we have for real physical objects, 3D is arguably the most expressive encoding. 3D representations allow storage and manipulation of high-level information as well as low-level features. However, it is still not clear how humans perceive 3D data innately. Replicating these capabilities in a vision-based agent is a challenging problem that can make an impact in several applications such as autonomous driving, robotics, and augmented/virtual reality. Scene understanding, which can be understood as the act of analyzing a scene by considering its geometric and semantic properties, is a long standing problem in computer vision. Recently, with the rise of big data and deep learning, this problem has gained a lot of attention over the past few years. The main issue lies in how to collect and annotate 3D data effectively, while making use of these data to learn a good representation of the world.
In this thesis, a series of topics on data-driven 3D scene understanding is discussed. Under the guiding principle of leveraging large-scale data for scene understanding, my efforts have led to several state-of-the-art algorithms, and the creation of multiple large-scale datasets. First, we describe the process of collecting and annotating an indoor scene dataset. Next, we discuss a method to learn local cross-domain features, which can be applied to several low-level scene understanding tasks. We then take a look at the problem of progressive semantic segmentation and joint semantic-instance segmentation for indoor scenes. Finally, we dabble into the field of outdoor scene understanding by showing the creation of an autonomous driving dataset.