[1] A. Brock, T. Lim, J. M. Ritchie, and N. Weston. Generative and discriminative voxel modeling with convolutional neural networks. CoRR, abs/1608.04236,2016. [2] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li,S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. ShapeNet:An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012[cs.GR], 2015. [3] R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, 2011. [4] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. Imagenet: A large scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009,Miami, Florida, USA, pages 248{255, 2009. [5] G. Georgakis, M. A. Reza, A. Mousavian, P. Le, and J. Kosecka. Multiview RGB-D dataset for object instance detection. In Fourth International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, October 25-28, 2016, pages 426-434, 2016. [6] R. B. Girshick. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pages 1440-1448, 2015. [7] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pages 580{587, 2014. [8] S. Gupta, R. B. Girshick, P. A. Arbelaez, and J. Malik. Learning rich features from RGB-D images for object detection and segmentation. In Computer Vision- ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII, pages 345-360, 2014. [9] V. Hegde and R. Zadeh. Fusionnet: 3d object classication using multiple data representations. CoRR, abs/1607.05695, 2016. [10] A. Janoch. The berkeley 3d object dataset. Master's thesis, 2012. [11] A. Janoch, S. Karayev, Y. Jia, J. T. Barron, M. Fritz, K. Saenko, and T. Darrell. A category-level 3-d object dataset: Putting the kinect to work. In IEEE International Conference on Computer Vision Workshops, Barcelona, Spain, November 6-13, 2011, pages 1168-1174, 2011. [12] K. Lai, L. Bo, and D. Fox. Unsupervised feature learning for 3d scene labeling. In 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, Hong Kong, China, May 31 - June 7, 2014, pages 3050-3057, 2014. [13] Y. LeCun, L. Bottou, Y. Bengio, and P. Haner. Gradient-based learning applied to document recognition. In Intelligent Signal Processing, pages 306-351. IEEE Press, 2001. [14] J. J. Lim, H. Pirsiavash, and A. Torralba. Parsing IKEA Objects: Fine Pose Estimation. ICCV, 2013. [15] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg. SSD: single shot multibox detector. In Computer Vision - ECCV 2016 -14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I, pages 21{37, 2016. [16] D. Maturana and S. Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2015, Hamburg, Germany, September 28- October 2, 2015, pages 922-928, 2015. [17] C. R. Qi, H. Su, M. Niener, A. Dai, M. Yan, and L. J. Guibas. Volumetric and multi-view cnns for object classication on 3d data. CoRR, 2016. [18] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. You only look once: Unied, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 779-788, 2016. [19] S. Ren, K. He, R. B. Girshick, and J. Sun. Faster R-CNN: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28, pages 91{99, 2015. [20] Z. Ren and E. B. Sudderth. Three-dimensional object detection and layout prediction using clouds of oriented gradients. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. [21] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from RGBD images. In Computer Vision - ECCV 2012 - 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V, pages 746{760, 2012. [22] R. Socher, B. Huval, B. P. Bath, C. D. Manning, and A. Y. Ng. Convolutional recursive deep learning for 3d object classication. In Advances in Neural Information Processing Systems 25, pages 665{673, 2012. [23] S. Song, S. P. Lichtenberg, and J. Xiao. Sun rgb-d: A rgb-d scene understanding benchmark suite. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015. [24] S. Song and J. Xiao. Sliding shapes for 3d object detection in depth images. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI, pages 634-651, 2014. [25] H. Su, S. Maji, E. Kalogerakis, and E. G. Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pages 945-953, 2015. [26] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 1912{1920, 2015. [27] Y. Xiang, W. Kim, W. Chen, J. Ji, C. Choy, H. Su, R. Mottaghi, L. Guibas, and S. Savarese. Objectnet3d: A large scale database for 3d object recognition. In European Conference Computer Vision (ECCV), 2016. [28] Y. Xiang, R. Mottaghi, and S. Savarese. Beyond pascal: A benchmark for 3d object detection in the wild. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2014. [29] J. Xiao, A. Owens, and A. Torralba. SUN3D: A database of big spaces reconstructed using sfm and object labels. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pages 1625-1632, 2013.