Perception Based Vision Perceptual based vision aiming to model and understanding human perception. And later propose computer vision and machine learning algorithms which incorporates such understanding. Specifically, I have been working on the topics: Prothetic vision, decolourization, saliency and manifold learning.
Deep Texture and Structure Aware Filtering Network for Image Smoothing Appeared in ECCV2018 Kaiyue Lu, Shaodi You and Nick Barnes Image smoothing is a fundamental task in computer vision, that aims to retain salient structures and remove insignicant textures. In this paper, we aim to address the fundamental shortcomings of existing image smoothing methods, which cannot properly distinguish textures and structures with similar low-level appearance. While deep learning approaches have started to explore the preservation of structure through image smoothing, existing work does not yet properly address textures. To this end, we generate a large dataset by blending natural textures with clean structure-only images, and then build a texture prediction network (TPN) that predicts the location and magnitude of textures. We then combine the TPN with a semantic structure prediction network (SPN) so that the nal texture and structure aware ltering network (TSAFN) is able to identify the textures to remove (\texture-awareness") and the structures to preserve (\structure-awareness"). The proposed model is easy to understand and implement, and shows excellent performance on real images in the wild as well as our generated dataset. Paper
Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features Appeared in CVPR2018 Xiang Wang, Shaodi You, Huimin Ma and Xi Li Weakly-supervised semantic segmentation under imagelevel
supervision is a challenging task as it directly associates
high-level semantic to low-level appearance. To
bridge this gap, in this paper, we propose an iterative Paper
A Frequency Domain Neural Network for Fast Image Super-resolution Appeared in IJCNN2018 Junxuan Li, Shaodi You and Antonio Robles-Kelly We present a frequency domain neural
network for image super-resolution. The network employs the
convolution theorem so as to cast convolutions in the spatial
domain as products in the frequency domain. Moreover, the
non-linearity in deep nets, often achieved by a rectifier unit, Paper
Semi-supervised and Weakly-supervised Road Detection based on Generative Adversarial Networks Appeared in IEEE Signal Processing Letters Xiaofeng Han , Jianfeng Lu , Chunxia Zhao, Shaodi You , and Hongdong Li Road detection is a key component of autonomous driving; however, most fully supervised learning road detection methods suffer from either insufficient training data or high costs of manual annotation. To overcome these problems, we propose a semisupervised learning (SSL) road detection method based on generative adversarial networks (GANs) and a weakly supervised learning (WSL)method based on conditionalGANs. Specifically, in our SSL method, the generator generates the road detection results of labeled and unlabeled images, and then they are fed into the discriminator, which assigns a label on each input to judge whether it is labeled. Additionally, inWSL method we add another network to predict road shapes of input images and use them in both generator and discriminator to constrain the learning progress. By training under these frameworks, the discriminators can guide a latent annotation process on the unlabeled data; therefore, the networks can learn better representations of road areas and leverage the feature distributions on both labeled and unlabeled data. The experiments are carried out on KITTI ROAD benchmark, and the results show our methods achieve the state-of-the-art performances.
Single Image Action Recognition using Semantic Body Part Actions Appeared in ICCV 2017 Zhichen Zhao, Huimin Ma, Shaodi You In this paper, we propose a novel single image action
recognition algorithm based on the idea of semantic part
actions. Unlike existing part-based methods, we argue that
there exists a mid-level semantic, the semantic part action;
and human action is a combination of semantic part actions
Edge Preserving and Multi-Scale Contextual Neural Network for Salient Object Detection Appeared in TIP 2017 Xiang Wang, Huimin Ma, Shaodi You and Xiaozhi Chen We propose a novel edge preserving
and multi-scale contextual neural network for salient object
detection. The proposed framework is aiming to address two
limits of the existing CNN based methods. First, region-based pdf (4.9MB)
Local Background Enclosure for RGB-D Salient Object Detection Appeared in CVPR 2016, Spotlight presentation David Fung, Nick Barnes, Shaodi You, Chris McCarthy, Recent work in salient object detection has considered the incorporation of depth cues from RGB-D images. In most cases, absolute depth, or depth contrast is used as the main feature. However, regions of high contrast in background regions cause false positives for such methods, as the background frequently contains regions that are highly variable in depth. Here, we propose a novel RGB-D saliency feature. Local background enclosure captures the spread of angular directions which are background with respect to the candidate region and the object that it is part of. We show that our feature improves over state-of-the-art RGB-D saliency approaches as well as RGB methods on the RGBD1000 and NJUS2000 datasets.
Perceptually Consistent Color-to-Gray Image Conversion Shaodi You, Nick Barnes and Janine Walker We propose a color to grayscale image conversion algorithm (C2G) that aims to preserve the perceptual properties of the color image as much as possible. To this end, we propose measures for two perceptual properties based on contemporary research in vision science: brightness and multi-scale contrast. The brightness measurement is based on the idea that the brightness of a grayscale image will affect the perception of the probability of color information. The color contrast measurement is based on the idea that the contrast of a given pixel to its surroundings can be measured as a linear combination of color contrast at different scales. Based on these measures we propose a graph based optimization framework to balance the brightness and contrast measurements. To solve the optimization, an L1-norm based method is provided which converts color discontinuities to brightness discontinuities. To validate our methods, we evaluate against the existing cadik and Color250 datasets, and against NeoColor, a new dataset that improves over existing C2G datasets. NeoColor contains around 300 images from typical C2G scenarios, including: commercial photograph, printing, books, magazines, masterpiece artworks and computer designed graphics. We show improvements in metrics of performance, and further through a user study, we validate the performance of both the algorithm and the metric.
Learning RGB-D Salient Object Detection using background enclosure, depth contrast, and top-down features Appeared in ICCV Workshop on Mutual Benefit of Cognitive and Computer Vision, 2017 Riku Shigematsu, David Feng, Shaodi You and Nick Barnes In human visual saliency, top-down and bottom-up information are ombined as a basis of visual attention. Recently, deep Convolutional Neural etworks (CNN) have demonstrated strong performance on RGB salient object detection, providing an effective mechanism for combining top-down semantic information with low level features. Although depth information has been shown to be important for human perception of salient objects, the use of top-down information and the exploration of CNNs for RGB-D salient object detection remains limited. Here we propose a novel deep CNN architecture for RGB-D salient object detection that utilizes both top-down and bottom-up cues. In order to produce such an architecture, we present novel depth features that capture the ideas of background enclosure, depth contrast and histogram distance in a manner that is suitable for a learned approach. We show improved results compared to state-of-the-art RGB-D salient object detection methods. We also show that the low-level and mid-level depth features both contribute to improvements in results. In particular, the F-Score of our method is 0.848 on RGBD1000, which is 10.7% better than the current best. Paper (2.1MB)
DSD: Depth Structural Descriptor for Edge-Based Assistive Navigation Appeared in ICCV Workshop on Assistive Computer Vision and Robotics David Feng, Shaodi You and Nick Barnes Structural edge detection is the task of finding edges between significant surfaces in a scene. This can underpin many computer vision tasks such as sketch recognition and 3D scene understanding, and is important for conveying scene structure for navigation with assistive vision. Identifying structural edges from a depth image can be challenging because surface structure that differentiates edges is not well represented in this format. We derive a depth input encoding, the Depth Surface Descriptor (DSD), that captures the first order properties of surfaces, allowing for improved classification of surface geometry that corresponds to structural edges. We apply the DSD feature to salient edge detection on RGB-D images using a fully convolutional neural network with deep supervision. We evaluate our method on both a new RGB-D dataset containing prosthetic vision scenarios, and the SUNRGBD dataset, and show that our approach produces improved performance compared to existing methods by 4%. Paper (10.2MB) HOSO: Histogram Of Surface Orientation for Appeared in The International Conference on Digital Image Computing: Techniques and Applications, DICTA 2017 Best Paper Award David Feng, Nick Barnes and Shaodi You Salient object detection using RGB-D data is an emerging field in computer vision. Salient regions are often characterized by an unusual surface orientation profile with respect to the surroundings. To capture such profile, we introduce the histogram of surface orientation (HOSO) feature to measure surface orientation distribution contrast for RGB-D saliency. We propose a new unified model that integrates surface orientation distribution contrast with depth and color contrast across multiple scales. This model is implemented in a multistage saliency computation approach that performs contrast estimation using a kernel density estimator (KDE), estimates object positions from the low-level saliency map, and finally refines the estimated object positions with a graph cut based approach. Our method is evaluated on two RGB-D salient object detection databases, achieving superior performance to previous state-of-the-art methods. Paper (6.2MB) Double-Guided Filtering: Image Smoothing with Structure and Texture Guidance Appeared in The International Conference on Digital Image Computing: Techniques and Applications, DICTA 2017 Kaiyue Lu, Shaodi You, Nick Barnes Image smoothing is a fundamental technology which aims to preserve image structure and remove insignificant texture. Balancing the trade-off between preserving structure and suppressing texture, however, is not a trivial task. This is because existing methods rely on only one guidance to infer structure or texture and assume the other is dependent. However, in many cases, textures are composed of repetitive structures and difficult to be distinguished by only one guidance. In this paper, we aim to better solve the trade-off by applying two independent guidances for structure and texture. Specifically, we adopt semantic edge detection as structure guidance, and texture decomposition as texture guidance. Based on this, we propose a kernel-based image smoothing method called the double-guided filter (DGF). In the paper, for the first time, we introduce the concept of texture guidance, and DGF, the first kernel-based method that leverages structure and texture guidance at the same time to be both 'structure-aware' and 'texture-aware'. We present a number of experiments to show the effectiveness of the proposed filter. Paper (3.8 MB) Manifold Topological Multi-Resolution Analysis Method Appeared in Pattern Recognition Shaodi You and Huimin Ma In this paper, two significant weaknesses of locally linear embedding (LLE) applied to computer vision are addressed: "intrinsic dimension" and "eigenvector meanings". "Topological embedding" and "multi-resolution nonlinearity capture" are introduced based on mathematical analysis of topological manifolds and LLE. The manifold topological analysis (MTA) method is described and is based on "topological embedding". MTA is a more robust method to determine the "intrinsic dimension" of a manifold with typical topology, which is important for tracking and perception understanding. The manifold multi-resolution analysis (MMA) method is based on "multi-resolution nonlinearity capture". MMA defines LLE eigenvectors as features for pattern recognition and dimension reduction. Both MTA and MMA are proved mathematically, and several examples are provided. Applications in 3D object recognition and 3D object viewpoint space partitioning are also described. Paper (3.8MB)
Automatic Generation of Grounded Visual Questions Appeared in IJCAI 2017 Shijie Zhang, Lizhen Qu, Shaodi You, Zhenglu Yang and Jiawan Zhang In this paper, we propose the first model to be able to generate diverse visually grounded questions given the same image. Visual question generation is an emerging topic which links textual questions with visual input. To the best of our knowledge, it lacks automatic methods to generate various and reasonable questions for the same visual input. So far, almost all the textual questions are generated manually, as well as the corresponding answers. To this end, we propose a system that automatically generates visually grounded questions. First, visual input is analyzed with deep caption model. Second, the captions along with VGG-16 features are used as input for our proposed question generator to generate visually grounded questions. Finally, to enable generating of versatile questions, a question type selection module is provided which selects reasonable question types and provide them as parameters for question generation. This is done using a hybrid LSTM with both visual and answer input. Our system is trained using VQA and Visual7W dataset and shows reasonable results on automatically generating of new visual questions. We also propose a quantitative metric for automatic evaluation of the question quality. Paper (1.4 MB)
A solution for efficient viewpoint space partition in 3D object recognition Oral presentation in ICIG2009 Xiao Yu, Huimin Ma, Shaodi You and Ze Yuan Viewpoint Space Partition based on Aspect Graph is one of the core techniques of 3D object recognition. Projection images obtained from critical viewpoint following this approach can efficiently provide topological information of an object. Computational complexity has been a huge challenge for obtaining the representation viewpoints used in 3D recognition. In this paper, we discuss inefficiency of calculation due to redundant nonexistent visual events; propose a systematic criterion for edge selection involved in EEE events. Pruning algorithm based on concave-convex property is demonstrated. We further introduce intersect relation into our pruning algorithm. These two methods not only enable the calculation of EEE events, but also can be implemented before viewpoint calculation, hence realizes view-independent pruning algorithm. Finally, analysis on simple representative models supports the effectiveness of our methods. Further investigations on Princeton Models, including airplane, automobile, etc, show a two orders of magnitude reduction in the number of EEE events on average. Paper (0.5MB)
Deep Clurstering for Weakly-Supervised Semantic Segmentation in Complex Object Clustered Scenes arXiv preprint Xiang Wang, Huimin Ma and Shaodi You Weakly-supervised semantic segmentation under image tags
supervision has drawn lots of attention in recent years. In simple scenes,
such as Pascal VOC dataset, only one or few objects are presented in
images, so the class labels contain signicant information for localizing objects. However, in complex real-world scenes, such as autonomous
driving scenes, the problem becomes much more challenging. Almost all
object classes are presented in every single image and thus the class la
bels contain hardly any information for supervising networks. To address
this issue, in this paper, rst, we propose to take advantages of ImageNet
dataset to train a discriminative classication network and apply it to
our complex scenes to produce initial object localization. Second, in autonomous driving scenes, though the images are much more complex, we
argue that objects within the same class have more similarities as all im Paper
Differentiating Objects by Motion: Joint Detection and Tracking of Small Flying Objects arXiv preprint Ryota Yoshihashi, Tinh Tuan, Rei Kawakami, Shaodi You, Makoto Iida and Takeshi Naemura While generic object detection has achieved large improvements with rich feature hierarchies from deep nets, detecting small objects with poor visual cues remains challenging. Motion cues from multiple frames may be more informative for detecting such hard-to-distinguish objects in each frame. However, how to encode discriminative motion patterns, such as deformations and pose changes that characterize objects, has remained an open question. To learn them and thereby realize small object detection, we present a neural model called the Recurrent Correlational Network, where detection and tracking are jointly performed over a multi-frame representation learned through a single, trainable, and end-to-end network. A convolutional long shortterm memory network is utilized for learning informative appearance change for detection, while learned representation is shared in tracking for enhancing its performance. In experiments with datasets containing images of scenes with small flying objects, such as birds and unmanned aerial vehicles, the proposed method yielded consistent improvements in detection performance over deep single-frame detectors and existing motion-based detectors. Furthermore, our network performs as well as state-of-the-art generic object trackers when it was evaluated as a tracker on the bird dataset. Paper
Top-down Bottom-up Supervision Enhancement for Edge Detection arXiv preprint David Feng, Shaodi You and Nick Barnes The performance of learned edge detectors depends heavily on both the quality and quantity of the training data. Existing high quality edge datasets are small, while larger datasets have inaccurate boundaries. We present a top-down bottom-up model for enhancing a noisy edge supervisory signal during training. Our approach applies a novel loss function for correcting a location-inaccurate supervisory signal using low-level image information. This loss function preserves the top-down information from the annotation while using bottom-up information to infer the exact location of the visual edge. We also introduce a strict supervision paradigm to explicitly enforce network layers to conform to the supervisory signal given the predominantly bottom-up nature of the task. Our approach enables existing edge detection systems to more effectively use the annotations from larger datasets with rough labels, obtaining improved results compared to state-of-the-art methods on the NYUD dataset, as well as showing effective cross-validation on SUNRGBD. Paper
Salient Structure: Validation through Eye-tracking and a Benchmark for Salient Structure Detection arXiv preprint Weixuan Sun, Shaodi You, Janine Walker, Kunming Li and Nick Barnes When humans look at a scene, they focus on some regions more than others. In general, humans pay attentions to some objects that stand out (salient objects) and also some important structures (salient structures). Salient structure play a key role for human to understand the environment, especially in navigation tasks. Recent saliency models have focused on detecting the most salient object, while detection of salient structure has attracted less attention. Thus, in this paper, our concept of salient structure is proposed. To validate our concept, we conduct a comprehensive eye tracking study which indicates that salient structure plays an important role in visual saliency. In addition, in this work, we collect the first available dataset for computer vision tasks in salient structure detection. The dataset contains 200 images. Finally, a benchmark is introduced to assist measuring performance of future salient structure detection models. Paper
Cross-connected Networks for Multi-task Learning of Detection and Segmentation arXiv preprint Seiichiro Fukuda, Ryota Yoshihashi, Rei Kawakami, Shaodi You, Makoto Iida and Takeshi Naemura Multi-task learning improves generalization performance by sharing knowledge among related tasks. Existing models are for task combinations annotated on the same dataset, while there are cases where multiple datasets are available for each task. How to utilize knowledge of successful single-task CNNs that are trained on each dataset has been explored less than multi-task learning with a single dataset. We propose a cross-connected CNN, a new architecture that connects single-task CNNs through convolutional layers, which transfer useful information for the counterpart. We evaluated our proposed architecture on a combination of detection and segmentation using two datasets. Experiments on pedestrians show our CNN achieved a higher detection performance compared to baseline CNNs, while maintaining high quality for segmentation. It is the first known attempt to tackle multi-task learning with different training datasets between detection and segmentation. Experiments with wild birds demonstrate how our CNN learns general representations from limited datasets. Paper
|
|||||
|
|||||