Anisotropic Convolutional Networks for 3D Semantic Scene Completion
Jie Li1Kai Han2Peng Wang3*Yu Liu4Xia Yuan1
1Nanjing University of Science and Technology, China
2University of Oxford, United Kingdom
3University of Wollongong, Australia
4The University of Adelaide, Australia
Paper [CVPR 2020]    Code [PyTorch]




Abstract

As a voxel-wise labeling task, semantic scene completion (SSC) tries to simultaneously infer the occupancy and semantic labels for a scene from a single depth and/or RGB image. The key challenge for SSC is how to effectively take advantage of the 3D context to model various objects or stuffs with severe variations in shapes, layouts and visibility. To handle such variations, we propose a novel module called anisotropic convolution, which properties with flexibility and power impossible for the competing methods such as standard 3D convolution and some of its variations. In contrast to the standard 3D convolution that is limited to afixed 3D receptive field, our module is capable of modeling the dimensional anisotropy voxel-wisely. The basic idea is to enable anisotropic 3D receptive field by decomposing a 3D convolution into three consecutive 1D convolutions, and the kernel size for each such 1D convolution is adaptively determined on the fly. By stacking multiple such anisotropic convolution modules, the voxel-wise modeling capability can be further enhanced while maintaining a controllable amount of model parameters. Extensive experiments on two SSC benchmarks, NYU-Depth-v2 and NYUCAD, show the superior performance of the proposed method.


BibTex

            
    @inproceedings{Li2020aicnet,
      author     = {Jie Li, Kai Han, Peng Wang, Yu Liu, and Xia Yuan},
      title      = {Anisotropic Convolutional Networks for 3D Semantic Scene Completion},
      booktitle  = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
      year       = {2020},
    }

        

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grants 61773210 and 61603184 and the EPSRC Programme Grant Seebibyte EP/M013774/1.

Related

DDRNet (CVPR2019): RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion
PALNet (RAL2019): Depth Based Semantic Scene Completion with Position Importance Aware Loss


Webpage template borrowed from Split-Brain Autoencoders, CVPR 2017.