Shaodi YOU - Publications


			3D geometry Detail Preserving Depth Estimation from a Single Image using Attention Guided Networks Appeared in 3DV2018 Zhixiang Hao, Yu Li, Shaodi You and Feng Lu Convolutional Neural Networks have demonstrated superior performance on single image depth estimation in recent years. These works usually use stacked spatial pooling or strided convolution to get high-level information which is a common practice in classification task. However, depth estimation is a dense prediction problem and low-resolution features maps always generate blur depth map which is undesirable in application. In order to produce high quality depth map, say clean and accurate, we propose a network consists of a Dense Feature Extractor (DFE) and a Depth Map Generator (DMG). The DFE combines ResNet and dilated convolutions. It extracts multi-scale information from input image while keeping the features map dense. As for DMG, we use attention mechanism to fuse multi-scale features produced in DFE. Our Network is trained end-to-end and does not need any post-processing. Hence, it runs fast and can predict depth map in about 15 fps. Experiment results show that our method is competitive with the state-ofthe-art in quantitative evaluation, but can preserve better structural details of the scene depth. Paper Webpage Multi-view Rectification of Folded Documents IEEE TPAMI, Accepted, DOI: 10.1109/TPAMI.2017.2675980. 2017. Shaodi You, Yasuyuki Matsushita, Sudipta Sinha, Yusuke Bou and Katsushi Ikeuchi Digitally unwrapping paper sheets is a crucial step for document scanning and accurate text recognition. This paper presents a method for automatically rectifying curved or folded paper sheets from a small number of images captured from different viewpoints. Unlike previous techniques that require either an expensive 3D scanner or over-simplified parametric representation of the deformations, our method only uses a few images and is based on general developable surface model that can represent diverse sets of deformation of paper sheets. By exploiting the geometric property of developable surfaces, we develop a robust rectification method based on ridge-aware 3D reconstruction of the paper sheet and L1 conformal mapping. We evaluate the proposed technique quantitatively and qualitatively using a wide variety of input documents, such as receipts, book pages and letters. Paper (3.0MB) Webpage Waterdrop Stereo Shaodi You, Robby T. Tan, Rei Kawakami, Yasuhiro Mukaigawa and Katsushi Ikeuchi This paper introduces depth estimation from water drops. The key idea is that a single water drop adhered to window glass is totally transparent and convex, and thus optically acts like a fisheye lens. If we have more than one water drop in a single image, then through each of them we can see the environment with different view points, similar to stereo. To realize this idea, we need to rectify every water drop imagery to make radially distorted planar surfaces look flat. For this rectification, we consider two physical properties of water drops: (1) A static water drop has constant volume, and its geometric convex shape is determined by the balance between the tension force and gravity. This implies that the 3D geometric shape can be obtained by minimizing the overall potential energy, which is the sum of the tension energy and the gravitational potential energy. (2) The imagery inside a water-drop is determined by the water-drop 3D shape and total reflection at the boundary. This total reflection generates a dark band commonly observed in any adherent water drops. Hence, once the 3D shape of water drops are recovered, we can rectify the water drop images through backward raytracing. Subsequently, we can compute depth using stereo. In addition to depth estimation, we can also apply image refocusing. Experiments on real images and a quantitative evaluation show the effectiveness of our proposed method. To our best knowledge, never before have adherent water drops been used to estimate depth. pdf (3.5MB) Thing Locally, Fit Globally: Robust and Fast 3D Shape Matching via Adaptive Algebraic Fitting Appeared in NeuralComputing, DOI 10.1016/j.neucom.2016.06.086 Shaodi You We propose a novel 3D free form surface matching method based on a novel key-point detector and a novel feature descriptor. The proposed detector is based on algebraic surface fitting. By global smooth fitting, our detector achieved high computational efficiency and robustness against non-rigid deformations. For the feature descriptor, we provide algorithms to compute 3D critical net which generates a meaningful structure on standalone local keypoints. The scale invariant and deformation robust Dual Spin Image descriptor is provided based on the 3D critical net. Our method is proved by solid mathematics. Intensive quantitative experiments demonstrate the robustness, efficiency and accuracy of the proposed method. Paper Manifold Topological Multi-Resolution Analysis Method Appeared in Pattern Recognition Shaodi You and Huimin Ma In this paper, two significant weaknesses of locally linear embedding (LLE) applied to computer vision are addressed: "intrinsic dimension" and "eigenvector meanings". "Topological embedding" and "multi-resolution nonlinearity capture" are introduced based on mathematical analysis of topological manifolds and LLE. The manifold topological analysis (MTA) method is described and is based on "topological embedding". MTA is a more robust method to determine the "intrinsic dimension" of a manifold with typical topology, which is important for tracking and perception understanding. The manifold multi-resolution analysis (MMA) method is based on "multi-resolution nonlinearity capture". MMA defines LLE eigenvectors as features for pattern recognition and dimension reduction. Both MTA and MMA are proved mathematically, and several examples are provided. Applications in 3D object recognition and 3D object viewpoint space partitioning are also described. Paper (3.8MB) A solution for efficient viewpoint space partition in 3D object recognition Oral presentation in ICIG2009 Xiao Yu, Huimin Ma, Shaodi You and Ze Yuan Viewpoint Space Partition based on Aspect Graph is one of the core techniques of 3D object recognition. Projection images obtained from critical viewpoint following this approach can efficiently provide topological information of an object. Computational complexity has been a huge challenge for obtaining the representation viewpoints used in 3D recognition. In this paper, we discuss inefficiency of calculation due to redundant nonexistent visual events; propose a systematic criterion for edge selection involved in EEE events. Pruning algorithm based on concave-convex property is demonstrated. We further introduce intersect relation into our pruning algorithm. These two methods not only enable the calculation of EEE events, but also can be implemented before viewpoint calculation, hence realizes view-independent pruning algorithm. Finally, analysis on simple representative models supports the effectiveness of our methods. Further investigations on Princeton Models, including airplane, automobile, etc, show a two orders of magnitude reduction in the number of EEE events on average. Paper (0.5MB) Full-View Light-Field Optical Flow Computation over Light-Field Super-pixels arXiv preprint Hao Zhu, Xiaoming Sun, Qi Zhang, Qing Wang, Antonio Robles-Kelly, Hongdong Li and Shaodi You In this paper, we present a multi-view optical ow estima tion method for plenoptic imaging. Our method employs the structure delivered by the 4D light-eld over multiple views making use of super-pixels. These super-pixels are four dimensional in nature and can be used to represent the objects in the scene as a set of slanted-planes in the 3D space so as to recover a piecewise rigid depth estimate. Taking advantage of these super-pixels and the corresponding slanted planes, we recover the optical ow and depth maps by using a two-step optimization scheme where the ow is propagated from the central view to the other views in the imagery. We illustrate the utility of our method for depth and flow estimation making use of a dataset of synthetically generated image sequences and real-world imagery captured using a Lytro Illum camera. We also compare our results with those yielded by a number of alternatives elsewhere in the literature. Paper Webpage