VISAPP 2017 Abstracts


Area 1 - Image Formation and Preprocessing

Full Papers
Paper Nr: 38
Title:

Specialization of a Generic Pedestrian Detector to a Specific Traffic Scene by the Sequential Monte-Carlo Filter and the Faster R-CNN

Authors:

Ala Mhalla, Thierry Chateau, Sami Gazzah and Najoua Essoukri Ben Amara

Abstract: The performance of a generic pedestrian detector decreases significantly when it is applied to a specific scene due to the large variation between the source dataset used to train the generic detector and samples in the target scene. In this paper, we suggest a new approach to automatically specialize a scene-specific pedestrian detector starting with a generic detector in video surveillance without further manually labeling any samples under a novel transfer learning framework. The main idea is to consider a deep detector as a function that generates realizations from the probability distribution of the pedestrian to be detected in the target. Our contribution is to approximate this target probability distribution with a set of samples and an associated specialized deep detector estimated in a sequential Monte Carlo filter framework. The effectiveness of the proposed framework is demonstrated through experiments on two public surveillance datasets. Compared with a generic pedestrian detector and the state-of-the-art methods, our proposed framework presents encouraging results.
Download

Paper Nr: 49
Title:

Fast Scalable Coding based on a 3D Low Bit Rate Fractal Video Encoder

Authors:

Vitor de Lima, Thierry Moreira, Helio Pedrini and William Robson Schwartz

Abstract: Video transmissions usually occur at a fixed or at a small number of predefined bit rates. This can lead to several problems in communication channels whose bandwidth can vary along time (e.g. wireless devices). This work proposes a video encoding method for solving such problems through a fine rate control that can be dynamically adjusted with low overhead. The encoder uses fractal compression and a simple rate distortion heuristic to preprocess the content in order to speed up the process of switching between different bit rates. Experimental results show that the proposed approach can accurately transcode a preprocessed video sequence into a large range of bit rates with a small computational overhead.
Download

Paper Nr: 62
Title:

A Robust Chessboard Detector for Geometric Camera Calibration

Authors:

Mathis Hoffmann, Andreas Ernst, Tobias Bergen, Sebastian Hettenkofer and Jens-Uwe Garbas

Abstract: We introduce an algorithm that detects chessboard patterns in images precisely and robustly for application in camera calibration. Because of the low requirements on the calibration images, our solution is particularly suited for endoscopic camera calibration. It successfully copes with strong lens distortions, partially occluded patterns, image blur, and image noise. Our detector initially uses a sparse sampling method to find some connected squares of the chessboard pattern in the image. A pattern-growing strategy iteratively locates adjacent chessboard corners with a region-based corner detector. The corner detector examines entire image regions with the help of the integral image to handle poor image quality. We show that it outperforms recent solutions in terms of detection rates and performs at least equally well in terms of accuracy.
Download

Paper Nr: 75
Title:

Hierarchical Techniques to Improve Hybrid Point Cloud Registration

Authors:

Ferran Roure, Xavier Lladó, Joaquim Salvi, Tomislav Pribanić and Yago Diez

Abstract: Reconstructing 3D objects by gathering information from multiple spatial viewpoints is a fundamental problem in a variety of applications ranging from heritage reconstruction to industrial image processing. A central issue is known as the ”point set registration or matching” problem. where the two sets being considered are to be rigidly aligned. This is a complex problem with a huge search space that suffers from high computational costs or requires expensive and bulky hardware to be added to the scanning system. To address these issues, a hybrid hardware-software approach was presented in (Pribanic et al., 2016) allowing for fast software registration by using commonly available (smartphone) sensors. In this paper we present hierarchical techniques to improve the performance of this algorithm. Additionally, we compare the performance of our algorithm against other approaches. Experimental results using real data show how the algorithm presented greatly improves the time of the previous algorithm and perform best over all studied algorithms.
Download

Paper Nr: 81
Title:

Development of Real-time HDTV-to-8K TV Upconverter

Authors:

Seiichi Gohshi, Shinichiro Nakamura and Hiroyuki Tabata

Abstract: Recent reports show that 4K and 8K TV systems are expected to replace HDTV in the near future. 4K TV broadcasting has begun commercially and the same for 8K TV is projected to begin by 2018. However, the availability of content for 8K TV is still insufficient, a situation similar to that of HDTV in the 1990s. Upconverting analogue content to HDTV content was important to supplement the insufficient HDTV content. This upconverted content was also important for news coverage as HDTV equipment was heavy and bulky. The current situation for 4K and 8K TV is similar wherein covering news with 8K TV equipment is very difficult as this equipment is much heavier and bulkier than that required for HDTV in the 1990s. The HDTV content available currently is sufficient, and the equipment has also evolved to facilitate news coverage; therefore, an HDTV-to-8K TV upconverter can be a solution to the problems described above . However, upconversion from interlaced HDTV to 8K TV results in an enlargement of the images by a factor of 32, thus making the upconverted images very blurry. An upconverter with super resolution has been proposed in this study in order to fix this issue.
Download

Paper Nr: 119
Title:

Fast Intra Prediction Algorithm with Enhanced Sampling Decision for H.265/HEVC

Authors:

Sio-Kei Im, Mohammad Mahdi Ghandi and Ka-Hou Chan

Abstract: H.265/HEVC is the latest video coding standard, which offers superior compression performance against H.264/AVC at the cost of greater complexity in its encoding process. In the intra coding of HEVC, a Coding Unit (CU) is recursively divided into a quad-tree-based structure from the Largest Coding Unit (LCU). At each level, up to 35 potential intra modes should be checked. However, examining all these modes is very time-consuming. In this paper, an intra mode decision algorithm is proposed that reduces the required computations while having negligible effect on Rate-Distortion (RD) performance. A rough mode decision method based on image component sampling is proposed to reduce the number of candidate modes for rough mode decision and RD optimization. To balance the quality and performance, the decision to reduce the full search is made with a threshold that is dynamically updated based on the Quantization Parameter (QP) and CU size of each recursive step. Experiments show that our algorithm can achieve a reasonable trade-off between encoding quality and efficiency. The saving in encoding time is between 30.0% to 45.0% while BD-RATE may increase by up to 0.5% for H.265/HEVC reference software HM 16.9 under all-intra configuration.
Download

Paper Nr: 120
Title:

Pushing the Limits for View Prediction in Video Coding

Authors:

Jens Ogniewski and Per-Erik Forssén

Abstract: More and more devices have depth sensors, making RGB+D video data increasingly common. Depth images have also been considered for 3D and free-viewpoint video coding. This depth data can be used to render a given scene from different viewpoints, thus making it a useful asset in e.g. view prediction for video coding. In this paper we evaluate a multitude of algorithms for scattered data interpolation, in order to optimize the performance of frame prediction for video coding. Our evaluation uses the depth extension of the Sintel datasets. Using ground-truth sequences is crucial for such an optimization, as it ensures that all errors and artifacts are caused by the prediction itself rather than noisy or erroneous data. We also present a comparison with the commonly used mesh-based projection.
Download

Paper Nr: 125
Title:

Dehazing using Non-local Regularization with Iso-depth Neighbor-Fields

Authors:

Incheol Kim and Min H. Kim

Abstract: Removing haze from a single image is a severely ill-posed problem due to the lack of the scene information. General dehazing algorithms estimate airlight initially using natural image statistics and then propagate the incompletely estimated airlight to build a dense transmission map, yielding a haze-free image. Propagating haze is different from other regularization problems, as haze is strongly correlated with depth according to the physics of light transport in participating media. However, since there is no depth information available in single-image dehazing, traditional regularization methods with a common grid random field often suffer from haze isolation artifacts caused by abrupt changes in scene depths. In this paper, to overcome the haze isolation problem, we propose a non-local regularization method by combining Markov random fields (MRFs) with nearest-neighbor fields (NNFs), based on our insightful observation that the NNFs searched in a hazy image associate patches at the similar depth, as local haze in the atmosphere is proportional to its depth. We validate that the proposed method can regularize haze effectively to restore a variety of natural landscape images, as demonstrated in the results. This proposed regularization method can be used separately with any other dehazing algorithms to enhance haze regularization.
Download

Paper Nr: 134
Title:

Combining Different Reconstruction Kernel Responses as Preprocessing Step for Airway Tree Extraction in CT Scan

Authors:

Samah Bouzidi, Fabien Baldacci, Chokri ben Amar and Pascal Desbarats

Abstract: In this paper, we propose a new preprocessing procedure that combines the responses of different Computed Tomography (CT) reconstruction kernels in order to improve the segmentation of the airway tree. These filters are available in all commercial CT scanner. A broad range of preprocessing techniques have been proposed but all of them operate on images reconstructed using a single reconstruction filter. In this work, the new preprocessing approach is based on a fusion of images reconstructed using different reconstruction kernels and can be included as a preprocessing stage in every segmentation pipeline. Our approach has been applied on various CT scans and an experimental comparison study between state of the art of segmentation approaches results performed on processed and unprocessed data has been made. Results show that the fusion process improves segmentation results and removes false positives.
Download

Paper Nr: 186
Title:

Specularity, Shadow, and Occlusion Removal for Planar Objects in Stereo Case

Authors:

Irina Nurutdinova, Ronny Hänsch, Vincent Mühler, Stavroula Bourou, Alexandra I. Papadaki and Olaf Hellwich

Abstract: Specularities, shadows, and occlusions are phenomena that commonly occur in images and cause a loss of information. This paper addresses the task to detect and remove all these phenomena simultaneously in order to obtain a corrected image with all information visible and recognizable. The proposed (semi-)automatic algorithm utilizes two input images that depict a planar object. The images can be acquired without special equipment (such as flash systems) or restrictions on the spatial camera layout. Experiments were performed for various combinations of objects, phenomena occurring, and capturing conditions. The algorithm perfectly detects and removes specularities in all examined cases. Shadows and occlusions are satisfactorily detected and removed with minimal user intervention in the majority of the performed experiments.
Download

Paper Nr: 200
Title:

CUDA Accelerated Visual Egomotion Estimation for Robotic Navigation

Authors:

Safa Ouerghi, Remi Boutteau, Xavier Savatier and Fethi Tlili

Abstract: Egomotion estimation is a fundamental issue in structure from motion and autonomous navigation for mobile robots. Several camera motion estimation methods from a set of variable number of image correspondances have been proposed. Five-point methods represent the minimal number of required correspondences to estimate the essential matrix, raised special interest for their application in a hypothesize-and-test framework. This algorithm allows relative pose recovery at the expense of a much higher computational time when dealing with higher ratios of outliers. To solve this problem with a certain amount of speedup, we propose in this work, a CUDA-based solution for the essential matrix estimation performed using the Grobner basis version of 5-point algorithm, complemented with robust estimation. The description of the hardware-specific implementation considerations as well as the parallelization methods employed are given in detail. Performance analysis against existing CPU implementation is also given, showing a speedup 4 times faster than the CPU for an outlier ratio e = 0.5, common for the essential matrix estimation from automatically computed point correspondences. More speedup was shown when dealing with higher outlier ratios.
Download

Paper Nr: 213
Title:

Automatic Separation of Basal Cell Carcinoma from Benign Lesions in Dermoscopy Images with Border Thresholding Techniques

Authors:

Nabin K. Mishra, Ravneet Kaur, Reda Kasmi, Serkan Kefel, Pelin Guvenc, Justin G. Cole, Jason R. Hagerty, Hemanth Y. Aradhyula, Robert LeAnder, R. Joe Stanley, Randy H. Moss and William V. Stoecker

Abstract: Basal cell carcinoma (BCC), with an incidence in the US exceeding 2.7 million cases/year, exacts a significant toll in morbidity and financial costs. Earlier BCC detection via automatic analysis of dermoscopy images could reduce the need for advanced surgery. In this paper, automatic diagnostic algorithms are applied to images segmented by five thresholding segmentation routines. Experimental results for five new thresholding routines are compared to expert-determined borders. Logistic regression analysis shows that thresholding segmentation techniques yield diagnostic accuracy that is comparable to that obtained with manual borders. The experimental results obtained with algorithms applied to automatically segmented lesions demonstrate significant potential for the new machine vision techniques.
Download

Paper Nr: 217
Title:

Global Patch Search Boosts Video Denoising

Authors:

Thibaud Ehret, Pablo Arias and Jean-Michel Morel

Abstract: With the increasing popularity of mobile imaging devices and the emergence of HdR video surveillance, the need for fast and accurate denoising algorithms has also increased. Patch-based methods, which are currently state-of-the-art in image and video denoising, search for similar patches in the signal. This search is generally performed locally around each target patch for obvious complexity reasons. We propose here a new and efficient approximate patch search algorithm. It permits for the first time to evaluate the impact of a global search on the video denoising performance. A global search is particularly justified in video denoising, where a strong temporal redundancy is often available. We first verify that the patches found by our new approximate search are far more concentrated than those obtained by exact local search, and are obtained in comparable time. To demonstrate the potential of the global search in video denoising, we take two patch-based image denoising algorithms and apply them to video. While with a classical local search their performance is poor, with the proposed global search they even improve the latest state-of-the-art video denoising methods.
Download

Paper Nr: 235
Title:

Color Edge Detection using Quaternion Convolution and Vector Gradient

Authors:

Nadia BenYoussef and Aicha Bouzid

Abstract: In this paper, a quaternion-based method is proposed for color image edge detection. A pair of quaternion mask is used for horizontal and vertical filter since quaternion convolution is not commutative. The detection procedure consists of two steps: quaternion convolution for edge detection and gradient vector to enhance edge structures. Experimental results demonstrate its capabilities on natural color images.
Download

Short Papers
Paper Nr: 57
Title:

Denoising of Noisy and Compressed Video Sequences

Authors:

A. Buades and J. L. Lisani

Abstract: A novel denoising algorithm is presented for video sequences. The proposed approach takes advantage of the self similarity and redundancy of adjacent frames. The algorithm automatically estimates a signal dependent noise model for each level of a multi-scale pyramid. A variance stabilization transform is applied at each scale and a novel sequence denoising algorithm is used. Experiments show that the new algorithm is able to correctly remove highly correlated noise from dark and compressed movie sequences. Particularly, we illustrate the performance with indoor and lowlight scenes acquired with mobile phones.
Download

Paper Nr: 88
Title:

A Novel 2.5D Feature Descriptor Compensating for Depth Rotation

Authors:

Frederik Hagelskjær, Norbert Krüger and Anders Glent Buch

Abstract: We introduce a novel type of local image descriptor based on Gabor filter responses. Our method operates on RGB-D images. We use the depth information to compensate for perspective distortions caused by out-of-plane rotations. The descriptor contains the responses of a multi-resolution Gabor bank. Contrary to existing methods that rely on a dominant orientation estimate to achieve rotation invariance, we utilize the orientation information in the Gabor bank to achieve rotation invariance during the matching stage. Compared to SIFT and a recent also projective distortion compensating descriptor proposed for RGB-D data, our method achieves a significant increase in accuracy when tested on a wide-baseline RGB-D matching dataset.
Download

Paper Nr: 102
Title:

Image Resolution Enhancement based on Curvelet Transform

Authors:

Zehira Haddad, Adrien Chan Hon Tong and Jaime Lopez Krahe

Abstract: We present an image resolution enhancement method based on Curvelet transform. This transform is used to decompose the input image into different subbands. After this decomposition, a nonlinear function is applied to the Curvelet coefficients in order to enhance the content of the different frequency subbands. These enhanced frequency subbands are then interpolated. We increase the enhancement results by a fusion of the obtained data and the interpolated input image. An image database is used for experiments. The visual results are showing the superiority of the proposed technique compared to two state-of-art image resolution enhancement techniques. These results have been confirmed by quantitative image quality metrics.
Download

Paper Nr: 109
Title:

Edge based Blind Single Image Deblurring with Sparse Priors

Authors:

Khouloud Guemri, Fadoua Drira, Rim Walha, Adel M. Alimi and Frank LeBourgeois

Abstract: Blind image deblurring is the estimation of the blur kernel and the latent sharp image from a blurry image. This makes it a significant ill-posed problem with various investigations looking for adequate solutions. The recourse to image priors have been noticed in recent approaches to improve final results. One of the most interesting results are based on data priors. This has been the starting point to the proposed blind image deblurring system. In particular, this study explores the potential of the sparse representation widely known for its efficiency in several reconstruction tasks. In fact, we propose a sparse representation based iterative deblurring method that exploits sparse constraints of edge based image patches. This process includes the K-SVD algorithm useful for the dictionary definition. Our main contributions are (1) the application of a shock filter as a pre-processing step followed by filter sub-bands applications for an effective contour detection, (2) the use of an online training data-sets with elementary patterns to describe edge-based information and (3) the recourse to an adaptative dictionary training. The experimental study illustrates promising results of the proposed deblurring method compared to the well-known state-of-the-art methods.
Download

Paper Nr: 146
Title:

Nuclei Segmentation using a Level Set Active Contour Method and Spatial Fuzzy C-means Clustering

Authors:

Ravali Edulapuram, R. Joe Stanley, Rodney Long, Sameer Antani, George Thoma, Rosemary Zuna, William V. Stoecker and Jason Hagerty

Abstract: Digitized histology images are analyzed by expert pathologists in one of several approaches to assess pre-cervical cancer conditions such as cervical intraepithelial neoplasia (CIN). Many image analysis studies focus on detection of nuclei features to classify the epithelium into the CIN grades. The current study focuses on nuclei segmentation based on level set active contour segmentation and fuzzy c-means clustering methods. Logical operations applied to morphological post-processing operations are used to smooth the image and to remove non-nuclei objects. On a 71-image dataset of digitized histology images (where the ground truth is the epithelial mask which helps in eliminating the non epithelial regions), the algorithm achieved an overall nuclei segmentation accuracy of 96.47%. We propose a simplified fuzzy spatial cost function that may be generally applicable for any n-class clustering problem of spatially distributed objects.
Download

Paper Nr: 163
Title:

3D Video Multiple Description Coding Considering Region of Interest

Authors:

Ehsan Rahimi and Chris Joslin

Abstract: 3D video is becoming a most favorable video and attracting researcher’s mind to provide robust methods of streaming since packet failure has always been the inseparable characteristic of wired or wireless networks. This paper aims to provide a new multiple description coding for 3D video considering objects existed in the scene. To this end, a low complex algorithm for realizing objects in 3D scene will provided and then a non-identical decimation method with respect to objects will be utilized to produce descriptions of MDC approach. Also, in point of depth map image, a new non-identical MDC algorithm will be be introduced to stream depth map image saving bandwidth without affecting the quality of decoded video in the receiver side.
Download

Paper Nr: 168
Title:

Application of LSD-SLAM for Visualization Temperature in Wide-area Environment

Authors:

Masahiro Yamaguchi, Hideo Saito and Shoji Yachida

Abstract: In this paper, we propose a method to generate a three-dimensional (3D) thermal map by overlaying thermal images onto a 3D surface reconstructed by a monocular RGB camera. In this method, we capture the target scene moving both an RGB camera and a thermal camera, which are mounted on the same zig. From the RGB image sequence, we reconstruct 3D structures of the scene by using Large-Scale Direct Monocular Simultaneous Localization and Mapping (LSD-SLAM), on which temperature distribution captured by the thermal camera is overlaid, thus generate a 3D thermal map. The geometrical relationship between those cameras is calibrated beforehand by using a calibration board that can be detected by both cameras. Since we do not use depth cameras such as Kinect, the depth of the target scene is not limited by the measurement range of the depth camera; any depth range can be captured. To demonstrating this technique, we show synthesized 3D thermal maps for both indoor and outdoor scenes.
Download

Paper Nr: 172
Title:

Single Image Dehazing based on Dark Channel Prior with Different Atmospheric Light

Authors:

Sheng Zhang and Wencang Bai

Abstract: Single image dehazing based on dark channel prior could recover a high-quality haze-free image from non-sky image. However, it does not perform well in bright region such as sky region. This paper proposes a novel method for single image dehazing, which jointly considers the atmospheric lights of sky regions and land surface. In this proposal, we divide the image with sky regions into bright image (such as sky region and artificial light) and dark image (such as natural outdoor scenery and buildings) according to the image saturation, the intensity of pixels and Rayleigh scattering theory. In the recovery processing, bright image and dark image can be recovered separately with different parameters of atmospheric light. The experimental results show that the proposed scheme can obtain a high-quality haze-free image in the images which cover the sky.
Download

Paper Nr: 183
Title:

Automatic Calibration of the Optical System in Passive Component Inspection

Authors:

Sungho Suh and Moonjoo Kim

Abstract: A passive component inspection machine is to obtain a image of a passive component by using a specific lighting and camera, and to detect defects on the image of the component. It inspects all the aspects of the component based on the image which is captured by using the lightings and cameras. The number of the lightings and cameras are proportional to the number of the component aspects. To detect the defects of the component effectively, the difference between the image quality by each camera should be minimized. Even if the light conditions are calibrated automatically, the average intensities of the images are different because of influence of Bayer filter which is used in CCD camera in the passive component inspection machine. Moreover, there is one more problem that the range of the light intensity cannot cover the range of the component reflectance. Sometimes, it is needed to calibrate a gain value and white balance ratios of the camera manually. In order to solve the problems, we propose an automatic calibration method of the optical system in passive component inspection machine. The proposed method minimizes the influence of Bayer filter, does not use any initial camera calibration, and find the optimal values for the overall gain and white balance ratios of red, green, blue colors automatically. To reduce the influence of Bayer filter, we perform to find the optimal values of all colors balance ratio iteratively and formulate a relation between the overall gain and the white balance ratios to control all the parameters automatically. The proposed method is simple and the experimental results show that the proposed method provides faster and more precise than the previous method.
Download

Paper Nr: 216
Title:

Rolling Shutter Camera Synchronization with Sub-millisecond Accuracy

Authors:

Matěj Šmíd and Jiri Matas

Abstract: A simple method for synchronization of video streams with a precision better than one millisecond is proposed. The method is applicable to any number of rolling shutter cameras and when a few photographic flashes or other abrupt lighting changes are present in the video. The approach exploits the rolling shutter sensor property that every sensor row starts its exposure with a small delay after the onset of the previous row. The cameras may have different frame rates and resolutions, and need not have overlapping fields of view. The method was validated on five minutes of four streams from an ice hockey match. The found transformation maps events visible in all cameras to a reference time with a standard deviation of the temporal error in the range of 0.3 to 0.5 milliseconds. The quality of the synchronization is demonstrated on temporally and spatially overlapping images of a fast moving puck observed in two cameras.
Download

Paper Nr: 230
Title:

Action Recognition using the Rf Transform on Optical Flow Images

Authors:

Josep Maria Carmona and Joan Climent

Abstract: The objective of this paper is the automatic recognition of human actions in video sequences. The use of spatio-temporal features for action recognition has become very popular in recent literature Instead of extracting the spatio-temporal features from the raw video sequence, some authors propose to project the sequence to a single template first. As a contribution we propose the use of several variants of the R transform for projecting the image sequences to templates. The R transform projects the whole sequence to a single image, retaining information concerning movement direction and magnitude. Spatio-temporal features are extracted from the template, they are combined using a bag of words paradigm, and finally fed to a SVM for action classification. The method presented is shown to improve the state-of-art results on the standard Weizmann action dataset
Download

Paper Nr: 256
Title:

Image Super Resolution from Alignment Errors of Image Sensors and Spatial Light Modulators

Authors:

Masaki Hashimoto, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a novel method for obtaining super resolution images by using alignment errors between an image sensor and a spatial light modulator, such as LCoS device, in the coded imaging systems. Recently, coded imaging systems are often used for obtaining high dynamic range (HDR) images and for deblurring depth and motion blurs. For obtaining accurate HDR images and unblur images, it is very important to setup the spatial light modulators with cameras accurately, so that the one-to-one correspondences hold between light modulator pixels and camera image pixels. However, the accurate alignment of the light modulator and the image sensor is very difficult in reality. In this paper, we do not adjust light modulators and image sensors accurately. Instead, we use the alignment errors between the light modulators and the image sensors for obtaining high resolution images from low resolution observations in the image sensors.
Download

Paper Nr: 263
Title:

Calibration of a Different Field-of-view Stereo Camera System using an Embedded Checkerboard Pattern

Authors:

Pathum Rathnayaka, Seung-Hae Baek and Soon-Yong Park

Abstract: Knowing the correct relative pose between cameras is considered as the first and foremost important step in a stereo camera system. It has been of the interest in many computer vision related experiments. Much work has been introduced for stereo systems with relatively common field-of-views; where a few number of advanced feature points-based methods have been presented for partially overlapping field-of-view systems. In this paper, we propose a new, yet simplified, method to calibrate a partially overlapping field-of-view heterogeneous stereo camera system using a specially designed embedded planar checkerboard pattern. The embedded pattern is a combination of two differently colored planar patterns with different checker sizes. The heterogeneous camera system comprises a lower focal length wide-angle camera and a higher focal length conventional narrow-angle camera. Relative pose between the cameras is calculated by multiplying transformation matrices. Our proposed method becomes a decent alternative to many advanced feature-based techniques. We show the robustness of our method through re-projection error and comparing point difference values in ’Y’ axis in image rectification results.
Download

Paper Nr: 270
Title:

Segmentation of the LV Wall with Trabeculations

Authors:

Clément Beitone, Christophe Tilmant and Frédéric Chausse

Abstract: The evaluation of cardiac functional parameters for heart disease diagnosis requires to have an accurate segmentation result. We propose a method to efficiently and reliably segment both the endocardial and the epicardial borders of the left ventricle. We use MR short axis images acquired in SSFP mode. Our framework combines a threshold-based approach to produce an estimation of the shape of the cardiac wall and a level set approach that refine it. We assessed our method on two databases built for two MICCAI challenges. Our results would have positioned us at the third place of the 2009 challenges.
Download

Paper Nr: 274
Title:

Medical Image Processing in the Age of Deep Learning - Is There Still Room for Conventional Medical Image Processing Techniques?

Authors:

Jason Hagerty, R. Joe Stanley and William V. Stoecker

Abstract: Deep learning, in particular convolutional neural networks, has increasingly been applied to medical images. Advances in hardware coupled with availability of increasingly large data sets have fueled this rise. Results have shattered expectations. But it would be premature to cast aside conventional machine learning and image processing techniques. All that deep learning comes at a cost, the need for very large datasets. We discuss the role of conventional manually tuned features combined with deep learning. This process of fusing conventional image processing techniques with deep learning can yield results that are superior to those obtained by either learning method in isolation. In this article, we review the rise of deep learning in medical image and the recent onset of fusion of learning methods. We discuss supervision equilibrium point and the factors that favor the role of fusion methods for histopathology and quasi-histopathology modalities.
Download

Paper Nr: 8
Title:

Oil Portrait Snapshot Classification on Mobile

Authors:

Yan Sun and Xiaomu Niu

Abstract: In recent years, several art museums have developed smartphone applications as the e-guide in museums. However few of them provide the function of instant retrieval and identification for a painting snapshot taken by mobile. Therefore in this work we design and implement an oil portrait classification application on smartphone. The accuracy of recognition suffers greatly by aberration, blur, geometric deformation and shrinking due to the unprofessional quality of snapshots. Low-megapixel phone camera is another factor downgrading the classification performance. Carefully studying the nature of such photos, we adopts the SIPH algorithm (Scale-invariant feature transform based Image Perceptual Hashing)) to extract image features and generate image information digests. Instead of popular conventional Hamming method, we applied an effective method to calculate the perceptual distance. Testing results show that the proposed method conducts satisfying performance on robustness and discriminability in portrait snapshot identification and feature indexing.
Download

Paper Nr: 143
Title:

Real-world Pill Segmentation based on Superpixel Merge using Region Adjacency Graph

Authors:

Sudhir Sornapudi, R. Joe Stanley, Jason Hagerty and William V. Stoecker

Abstract: Misidentified or unidentified prescription pills are an increasing challenge for all caregivers, both families and professionals. Errors in pill identification may lead to serious or fatal adverse events. To respond to this challenge, a fast and reliable automated pill identification technique is needed. The first and most critical step in pill identification is segmentation of the pill from the background. The goals of segmentation are to eliminate both false detection of background area and false omission of pill area. Introduction of either type of error can cause errors in color or shape analysis and can lead to pill misidentification. The real-world consumer images used in this research provide significant segmentation challenges due to varied backgrounds and lighting conditions. This paper proposes a color image segmentation algorithm by generating superpixels using the Simple Linear Iterative Clustering (SLIC) algorithm and merging the superpixels by thresholding the region adjacency graphs. Post-processing steps are given to result in accurate pill segmentation. The segmentation accuracy is evaluated by comparing the consumer-quality pill image segmentation masks to the high quality reference pill image masks.
Download

Paper Nr: 144
Title:

Color Feature-based Pillbox Image Color Recognition

Authors:

Peng Guo, Ronald J. Stanley, Justin G. Cole, Jason Hagerty and William V. Stoecker

Abstract: Patients, their families and caregivers routinely examine pills for medication identification. Key pill information includes color, shape, size and pill imprint. The pill can then be identified using an online pill database. This process is time-consuming and error prone, leading researchers to develop techniques for automatic pill identification. Pill color may be the pill feature that contributes most to automatic pill identification. In this research, we investigate features from two color planes: red, green and blue (RGB), and hue saturation and value (HSV), as well as chromaticity and brightness features. Color-based classification is explored using MatLab over 2140 National Library of Medicine (NLM) Pillbox reference images using 20 feature descriptors. The pill region is extracted using image processing techniques including erosion, dilation and thresholding. Using a leave-one-image-out approach for classifier training/testing, a support vector machine (SVM) classifier yielded an average accuracy over 12 categories as high as 97.90%.
Download

Paper Nr: 147
Title:

Multi-view ToF Fusion for Object Detection in Industrial Applications

Authors:

Inge Coudron and Toon Goedemé

Abstract: The use of time-of-flight (ToF) cameras in industrial applications has become increasingly popular due to the camera’s reduced cost and its ability to provide real-time depth information. Still, one of the main drawbacks of these cameras has been their limited field of view. We therefore propose a technique to fuse the views of multiple ToF cameras. By mounting two cameras side by side and pointing them away from each other, the horizontal field of view can be artificially extended. The combined views can then be used for object detection. The main advantages of our technique is that the calibration is fully automatic and only one shot of the calibration target is needed. Furthermore, no overlap between the views is required.
Download

Paper Nr: 219
Title:

High-speed Motion Detection using Event-based Sensing

Authors:

Jose A. Boluda, Fernando Pardo and Francisco Vegara

Abstract: Event-based vision emerges as an alternative to conventional full-frame image processing. In event-based systems there is a vision sensor which delivers visual events asynchronously, typically illumination level changes. The asynchronous nature of these sensors makes it difficult to process the corresponding data stream. It might be possible to have few events to process if there are minor changes in the scene, or conversely, to have an untreatable explosion of events if the whole scene is changing quickly. A Selective Change-Driven (SCD) sensing system is a special event-based sensor which only delivers, in a synchronous manner and ordered by the magnitude of its change, those pixels that have changed most since the last time they have been read-out. To prove this concept, a processing architecture for high-speed motion analysis, based on the processing of the SCD pixel stream has been developed and implemented into a Field Programmable Gate-Array (FPGA). The system measures average distances using a laser line projected into moving objects. The acquisition, processing and delivery of distance takes less than 2 us. To obtain a similar result using a conventional frame-based camera it would be required a device working at more than 500 Kfps, which is not practical in embedded and limited-resource systems. The implemented system is small enough to be mounted on an autonomous platform.
Download

Paper Nr: 220
Title:

Detecting Non-lambertian Materials in Video

Authors:

Seyed Mahdi Javadi, Yongmin Li and Xiaohui Liu

Abstract: This paper describes a novel method to identify and distinguish shiny and glossy materials in videos automatically. The proposed solution works by analyzing the logarithm of chromaticity of sample pixels from various materials over a period of time to differentiate between shiny and matt textures. The Lambertian materials have different reflectance model and the distribution of their chromaticity is not the same as non-Lambertian texture. We will use this to detect shiny materials. This system has many application in texture and object recognition, water leakage and oil spillage detection systems.
Download

Paper Nr: 227
Title:

Subjective Assessment Method for Multiple Displays with and without Super Resolution

Authors:

Chinatsu Mori and Seiichi Gohshi

Abstract: At present, although 4K TV sets are available in the market, the provision of 4K TV content is still not sufficient. Almost all TV content is in high-definition television (HDTV) broadcasting, and images/videos with insufficient resolution are up-converted to the resolution of the display. Thus, almost all 4K TV sets are equipped with super-resolution (SR) technology to improve the resolution of the content. However, the performance of SR on TV sets has not been guaranteed. Although the capability of SR needs to be assessed, there has been no standard method for such an assessment. In this paper, a subjective assessment method for multiple displays is proposed. Subjective assessment experiments of displays with and without SR are conducted to confirm the ability of an SR method. As the results of statistical analysis, the superiority of the SR in resolution quality is proved by the significant differences indicating the reproducible results. As the reproducible results are obtainable, the proposed method is useful to assess multiple displays. In this paper, the methodology of the proposed assessment method is described and the experimental results are presented.
Download

Paper Nr: 247
Title:

Exploratory Multimodal Data Analysis with Standard Multimedia Player - Multimedia Containers: A Feasible Solution to Make Multimodal Research Data Accessible to the Broad Audience

Authors:

Julius Schöning, Anna L. Gert, Alper Açık, Tim C. Kietzmann, Gunther Heidemann and Peter König

Abstract: The analysis of multimodal data comprised of images, videos and additional recordings, such as gaze trajectories, EEG, emotional states, and heart rate is presently only feasible with custom applications. Even exploring such data requires compilation of specific applications that suit a specific dataset only. This need for specific applications arises since all corresponding data are stored in separate files in custom-made distinct data formats. Thus accessing such datasets is cumbersome and time-consuming for experts and virtually impossible for non-experts. To make multimodal research data easily shareable and accessible to a broad audience, like researchers from diverse disciplines and all other interested people, we show how multimedia containers can support the visualization and sonification of scientific data. The use of a container format allows explorative multimodal data analyses with any multimedia player as well as streaming the data via the Internet. We prototyped this approach on two datasets, both with visualization of gaze data and one with additional sonification of EEG data. In a user study, we asked expert and non-expert users about their experience during an explorative investigation of the data. Based on their statements, our prototype implementation, and the datasets, we discuss the benefit of storing multimodal data, including the corresponding videos or images, in a single multimedia container. In conclusion, we summarize what is necessary for having multimedia containers as a standard for storing multimodal data and give an outlook on how artificial networks can be trained on such standardized containers.
Download

Paper Nr: 250
Title:

Single Image Marine Snow Removal based on a Supervised Median Filtering Scheme

Authors:

Fahimeh Farhadifard, Martin Radolko and Uwe Freiherr von Lukas

Abstract: Underwater image processing has attracted a lot of attention due to the special difficulties at capturing clean and high quality images in this medium. Blur, haze, low contrast and color cast are the main degradations. In an underwater image noise is mostly considered as an additive noise (e.g. sensor noise), although the visibility of underwater scenes is distorted by another source, termed marine snow. This signal disturbs image processing methods such as enhancement and segmentation. Therefore removing marine snow can improve image visibility while helping advanced image processing approaches such as background subtraction to yield better results. In this article, we propose a simple but effective filter to eliminate these particles from single underwater images. It consists of different steps which adapt the filter to fit the characteristics of marine snow the best. Our experimental results show the success of our algorithm at outperforming the existing approaches by effectively removing this phenomenon and preserving the edges as much as possible.
Download

Area 2 - Image and Video Analysis

Full Papers
Paper Nr: 25
Title:

Segmentation-based Multi-scale Edge Extraction to Measure the Persistence of Features in Unorganized Point Clouds

Authors:

Dena Bazazian, Josep R. Casas and Javier Ruiz-Hidalgo

Abstract: Edge extraction has attracted a lot of attention in computer vision. The accuracy of extracting edges in point clouds can be a significant asset for a variety of engineering scenarios. To address these issues, we propose a segmentation-based multi-scale edge extraction technique. In this approach, different regions of a point cloud are segmented by a global analysis according to the geodesic distance. Afterwards, a multi-scale operator is defined according to local neighborhoods. Thereupon, by applying this operator at multiple scales of the point cloud, the persistence of features is determined. We illustrate the proposed method by computing a feature weight that measures the likelihood of a point to be an edge, then detects the edge points based on that value at both global and local scales. Moreover, we evaluate quantitatively and qualitatively our method. Experimental results show that the proposed approach achieves a superior accuracy. Furthermore, we demonstrate the robustness of our approach in noisier real-world datasets.
Download

Paper Nr: 30
Title:

Remote Respiration Rate Determination in Video Data - Vital Parameter Extraction based on Optical Flow and Principal Component Analysis

Authors:

Christian Wiede, Julia Richter, Manu Manuel and Gangolf Hirtz

Abstract: Due to the steadily ageing society, the determination of vital parameters, such as the respiration rate, has come into focus of research in recent years. The respiration rate is an essential parameter to monitor a person’s health status. This study presents a robust method to remotely determine a person’s respiration rate with an RGB camera. In our approach, we detected four subregions on a person’s chest, tracked features over time with optical flow, applied a principal component analysis (PCA) and several frequency determination techniques. Furthermore, this method was evaluated in various recorded scenarios. Overall, the results show that this method is applicable in the field Ambient Assisted Living (AAL).
Download

Paper Nr: 37
Title:

Towards a Diminished Reality System that Preserves Structures and Works in Real-time

Authors:

Hugo Álvarez, Jon Arrieta and David Oyarzun

Abstract: This paper presents a Diminished Reality system that is able to propagate textures as well as structures with a low computational cost, almost in real-time. An existing inpainting algorithm is optimized in order to reduce the high computational cost by implementing some Computer Vision techniques. Although some of the presented optimizations can be applied to a single static image directly, the global system is mainly oriented to video sequences, where temporal coherence ideas can be applied. Given that, a novel pipeline is proposed to maintain the visual quality of the reconstructed image area without the need of calculating everything again despite slow camera motions. To the best of our knowledge, the prototype presented in this paper is the only Diminished Reality system focused on structure propagation that works near real-time. Apart from the technical description, this paper presents an extensive experimental study of the system, which evaluates the optimizations in terms of time and quality.
Download

Paper Nr: 68
Title:

Color-based and Rotation Invariant Self-similarities

Authors:

Xiaohu Song, Damien Muselet and Alain Tremeau

Abstract: One big challenge in computer vision is to extract robust and discriminative local descriptors. For many applications such as object tracking, image classification or image matching, there exist appearance-based descriptors such as SIFT or learned CNN-features that provide very good results. But for some other applications such as multimodal image comparison (infra-red versus color, color versus depth, ...) these descriptors failed and people resort to using the spatial distribution of self-similarities. The idea is to inform about the similarities between local regions in an image rather than the appearances of these regions at the pixel level. Nevertheless, the classical self-similarities are not invariant to rotation in the image space, so that two rotated versions of a local patch are not considered as similar and we think that many discriminative information is lost because of this weakness. In this paper, we present a method to extract rotation-invariant self similarities. In this aim, we propose to compare color descriptors of the local regions rather than the local regions themselves. Furthermore, since this comparison informs us about the relative orientations of the two local regions, we incorporate this information in the final image descriptor in order to increase the discriminative power of the system. We show that the self similarities extracted by this way are very discriminative.
Download

Paper Nr: 80
Title:

Towards a Videobronchoscopy Localization System from Airway Centre Tracking

Authors:

Carles Sánchez, Antonio Esteban Lansaque, Agnès Borràs, Marta Diez-Ferrer, Antoni Rosell and Debora Gil

Abstract: Bronchoscopists use fluoroscopy to guide flexible bronchoscopy to the lesion to be biopsied without any kind of incision. Being fluoroscopy an imaging technique based on X-rays, the risk of developmental problems and cancer is increased in those subjects exposed to its application, so minimizing radiation is crucial. Alternative guiding systems such as electromagnetic navigation require specific equipment, increase the cost of the clinical procedure and still require fluoroscopy. In this paper we propose an image based guiding system based on the extraction of airway centres from intra-operative videos. Such anatomical landmarks are matched to the airway centreline extracted from a pre-planned CT to indicate the best path to the nodule. We present a feasibility study of our navigation system using simulated bronchoscopic videos and a multi-expert validation of landmarks extraction in 3 intra-operative ultrathin explorations.
Download

Paper Nr: 95
Title:

Face Presentation Attack Detection using Biologically-inspired Features

Authors:

Aristeidis Tsitiridis, Cristina Conde, Isaac Martín De Diego and Enrique Cabello

Abstract: A person intentionally concealing or faking their identity from biometric security systems is known to perform a ‘presentation attack’. Efficient presentation attack detection poses a challenging problem in modern biometric security systems. Sophisticated presentation attacks may successfully spoof a person’s face and therefore, disrupt accurate biometric authentication in controlled areas. In this work, a presentation attack detection technique which processes biologically-inspired facial features is introduced. The main goal of the proposed method is to provide an alternative foundation for biometric detection systems. In addition, such a system can be used for future generation biometric systems capable of carrying out rapid facial perception tasks in complex and dynamic situations. The newly-developed model was tested against two different databases and classifiers. Presentation attack detection results have shown promise, exceeding 94% detection accuracy on average for the investigated databases. The proposed model can be enriched with future enhancements that can further improve its effectiveness and complexity in more diverse situations and sophisticated attacks in the real world.
Download

Paper Nr: 138
Title:

Artery/vein Classification of Blood Vessel Tree in Retinal Imaging

Authors:

Joaquim de Moura, Jorge Novo, Marcos Ortega, Noelia Barreira and Pablo Charlón

Abstract: Alterations in the retinal microcirculation are signs of relevant diseases such as hypertension, arteriosclerosis, or diabetes. Specifically, arterial constriction and narrowing were associated with early stages of hypertension. Moreover, retinal vasculature abnormalities may be useful indicators for cerebrovascular and cardiovascular diseases. The Arterio-Venous Ratio (AVR), that measures the relation between arteries and veins, is one of the most referenced ways of quantifying the changes in the retinal vessel tree. Since these alterations affect differently arteries and veins, a precise characterization of both types of vessels is a key issue in the development of automatic diagnosis systems. In this work, we propose a methodology for the automatic vessel classification between arteries and veins in eye fundus images. The proposal was tested and validated with 19 near-infrared reflectance retinographies. The methodology provided satisfactory results, in a complex domain as is the retinal vessel tree identification and classification.
Download

Paper Nr: 149
Title:

A Robust Descriptor for Color Texture Classification Under Varying Illumination

Authors:

Tamiris Negri, Fang Zhou, Zoran Obradovic and Adilson Gonzaga

Abstract: Classifying color textures under varying illumination sources remains challenging. To address this issue, this paper introduces a new descriptor for color texture classification, which is robust to changes in the scene illumination. The proposed descriptor, named Color Intensity Local Mapped Pattern (CILMP), incorporates relevant information about the color and texture patterns from the image in a multiresolution fashion. The CILMP descriptor explores the color features by comparing the magnitude of the color vectors inside the RGB cube. The proposed descriptor is evaluated on nine experiments over 50,048 images of raw food textures acquired under 46 lighting conditions. The experimental results have shown that CILMP performs better than the state-of-the-art methods, reporting an increase (up to $20.79) in the classification accuracy, compared to the second-best descriptor. In addition, we concluded from the experimental results that the multiresolution analysis improves the robustness of the descriptor and increases the classification accuracy.
Download

Paper Nr: 176
Title:

Volume-based Human Re-identification with RGB-D Cameras

Authors:

Serhan Cosar, Claudio Coppola and Nicola Bellotto

Abstract: This paper presents an RGB-D based human re-identification approach using novel biometrics features from the body’s volume. Existing work based on RGB images or skeleton features have some limitations for real-world robotic applications, most notably in dealing with occlusions and orientation of the user. Here, we propose novel features that allow performing re-identification when the person is facing side/backward or the person is partially occluded. The proposed approach has been tested for various scenarios including different views, occlusion and the public BIWI RGBD-ID dataset.
Download

Paper Nr: 207
Title:

A Multiscale Circum-ellipse Area Representation for Planar Shape Retrieval

Authors:

Taha Faidi, Faten Chaieb and Faouzi Ghorbel

Abstract: In this paper, we propose a new Multiscale Circum-ellipse Area Representation (MCAR) for planar contours. The proposed representation deals with a multiscale shape signature defined from the local area delimited by the circumscribed ellipse of the triangle formed by three contour points and the contour. This shape signature describes, at each scale level, the concavity/convexity at each contour point. Then, Fourier descriptors are obtained by applying Fourier transform to the proposed multiscale signature. Thus, the proposed MCAR based Fourier Descriptors handle the local and global shape characteristics. Furthermore, it is invariant to affine transformation and robust to local deformations. The performance of our proposed method was evaluated through the precision recall and bull’s-eye tests on the two well-known databases (MCD and MPEG7-setB). Obtained results indicate that our method outperforms the shape signatures based Fourier descriptor proposed in the literature
Download

Paper Nr: 246
Title:

Change Detection in Crowded Underwater Scenes - Via an Extended Gaussian Switch Model Combined with a Flux Tensor Pre-segmentation

Authors:

Martin Radolko, Fahimeh Farhadifard and Uwe von Lukas

Abstract: In this paper a new approach for change detection in videos of crowded scenes is proposed with the extended Gaussian Switch Model in combination with a Flux Tensor pre-segmentation. The extended Gaussian Switch Model enhances the previous method by combining it with the idea of the Mixture of Gaussian approach and an intelligent update scheme which made it possible to create more accurate background models even for difficult scenes. Furthermore, a foreground model was integrated and could deliver valuable information in the segmentation process. To deal with very crowded areas in the scene – where the background is not visible most of the time – we use the Flux Tensor to create a first coarse segmentation of the current frame and only update areas that are almost motionless and therefore with high certainty should be classified as background. To ensure the spatial coherence of the final segmentations, the N2Cut approach is added as a spatial model after the background subtraction step. The evaluation was done on an underwater change detection datasets and showed significant improvements over previous methods, especially in the crowded scenes.
Download

Short Papers
Paper Nr: 32
Title:

Thresholding of Histopathological Images of Oral Mucosa for Identification of Precancerous Oral Submucous Fibrosis (OSF) Cells: A Novel Entropy based Approach

Authors:

Saptarashmi Bandyopadhyay, Soumyadeep Basu, R. R. Paul and Ajoy Kumar Ray

Abstract: The problem of early detection of Oral Submucous fibrosis (OSF) has received paramount importance in recent times. OSF is a chronic, irreversible and high risk pre-cancerous state of the oral mucosa. This kind of inflammatory and progressive fibrosis of the submucosal tissues is linked to oral cancers. This state results from chewing of areca nut which is prevalent in large parts of the Indian subcontinent. The current work presents an approach for the analysis of dysplastic epithelial cells from OSF, based on nuclear-cytoplasmic (N:C) ratio which is one of the most important morphological features to distinguish between normal and dysplastic epithelial cells. The proposed approach uses MATLAB to analyse the OSF biopsy images. This may help pathologists in identification of pre-cancer affected cells and in prevention and treatment of oral cancer. The methodology presented here can also be used for identification of epithelial atypia, an important light microscopic criteria that differentiates between normal and pre-malignant / malignant status of the oral mucosa.

Paper Nr: 73
Title:

Improving Bayesian Mixture Models for Colour Image Segmentation with Superpixels

Authors:

Thorsten Wilhelm and Christian Wöhler

Abstract: The large computational demand is one huge drawback of Bayesian Mixture Models in image segmentation tasks. We describe a novel approach to reduce the computational demand in this scenario and increase the performance by using superpixels. Superpixels provide a natural approach to the reduction of the computational complexity and to build a texture model in the image domain. Instead of relying on a Gaussian mixture model as segmentation model, we propose to use a more robust model: a mixture of multiple scaled t-distributions. The parameters of the novel mixture model are estimated with Markov chain Monte Carlo in order to surpass local minima during estimation and to gain insight into the uncertainty of the resulting segmentation. Finally, an evaluation of the proposed segmentation is performed on the publicly available Berkeley Segmentation database (BSD500), compared to competing methods, and the benefit of including texture is emphasised.
Download

Paper Nr: 100
Title:

Model-based Segmentation of 3D Point Clouds for Phenotyping Sunflower Plants

Authors:

William Gelard, Michel Devy, Ariane Herbulot and Philippe Burger

Abstract: This article presents a model-based segmentation method applied to 3D data acquired on sunflower plants. Our objective is the quantification of the plant growth using observations made automatically from sensors moved around plants. Here, acquisitions are made on isolated plants: a 3D point cloud is computed using Structure from Motion with RGB images acquired all around a plant. Then the proposed method is applied in order to segment and label the plant leaves, i.e. to split up the point cloud in regions corresponding to plant organs: stem, petioles, and leaves. Every leaf is then reconstructed with NURBS and its area is computed from the triangular mesh. Our segmentation method is validated comparing these areas with the ones measured manually using a planimeter: it is shown that differences between automatic and manual measurements are less than 10%. The present results open interesting perspectives in direction of high-throughput sunflower phenotyping.
Download

Paper Nr: 105
Title:

A Multiclass Anisotropic Mumford-Shah Functional for Segmentation of D-dimensional Vectorial Images

Authors:

J. F. Garamendi and E. Schiavi

Abstract: We present a general model for multi-class segmentation of multi-channel digital images. It is based on the minimization of an anisotropic version of the Mumford-Shah energy functional in the class of piecewise constant functions. In the framework of geometric measure theory we use the concept of common interphases between regions (classes) and the value of the jump discontinuities of the (weak) solution between adjacent regions in order to define a minimal partition energy functional. The resulting problem is non-smooth and non-convex. Non-smoothness is dealt with highlighting the relationship of the proposed model with the well known Rudin, Osher and Fatemi model for image denoising when piecewise constant solutions (i.e partitions) are considered. Non-convexity is tackled with an optimal threshold of the ROF solution which we which generalize to multi-channel images through a probabilistic clustering. The optimal solution is then computed with a fixed point iteration. The resulting algorithm is described and results are presented showing the successful application of the method to Light Field (LF) images.
Download

Paper Nr: 106
Title:

LBP Histogram Selection based on Sparse Representation for Color Texture Classification

Authors:

Vinh Truong Hoang, Alice Porebski, Nicolas Vandenbroucke and Denis Hamad

Abstract: In computer vision fields, LBP histogram selection techniques are mainly applied to reduce the dimension of color texture space in order to increase the classification performances. This paper proposes a new histogram selection score based on Jeffrey distance and sparse similarity matrix obtained by sparse representation. Experimental results on three benchmark texture databases show that the proposed method improves the performance of color texture classification represented in different color spaces.
Download

Paper Nr: 108
Title:

Spatio-temporal Road Detection from Aerial Imagery using CNNs

Authors:

Belén Luque, Josep Ramon Morros and Javier Ruiz-Hidalgo

Abstract: The main goal of this paper is to detect roads from aerial imagery recorded by drones. To achieve this, we propose a modification of SegNet, a deep fully convolutional neural network for image segmentation. In order to train this neural network, we have put together a database containing videos of roads from the point of view of a small commercial drone. Additionally, we have developed an image annotation tool based on the watershed technique, in order to perform a semi-automatic labeling of the videos in this database. The experimental results using our modified version of SegNet show a big improvement on the performance of the neural network when using aerial imagery, obtaining over 90% accuracy.
Download

Paper Nr: 129
Title:

Fast Free Floor Detection for Range Cameras

Authors:

Izaak Van Crombrugge, Luc Mertens and Rudi Penne

Abstract: A robust and fast free floor detection algorithm is indispensable in autonomous or assisted navigation as it labels the drivable surface and marks obstacles. In this paper we propose a simple and fast method to segment the free floor surface in range camera data by calculating the Euclidean distance between every measured point of the point cloud and the ground plane. This method is accurate for planar motion, i.e. as long as the camera stays at a fixed height and angle above the ground plane. This is most often the case in driving mobile platforms in an indoor environment. Given this condition, the ground plane stays invariant in camera coordinates. Obstacles as low as 40mm are reliably detected. The detection works correct even when ’multipath’ errors are present, a typical phenomenon of distance overestimation in corners when using time-of-flight range cameras. To demonstrate the application of our segmentation method, we implemented it to create a simple but accurate navigation map.
Download

Paper Nr: 141
Title:

Probabilistic Background Modelling for Sports Video Segmentation

Authors:

Nikolas Ladas, Paris Kaimakis and Yiorgos Chrysanthou

Abstract: This paper introduces a segmentation algorithm based on the probabilistic modelling of the background color using a Lambertian formulation of the scene’s appearance. Central in our formulation is the computation of the degree of light visibility at the scene location depicted by each pixel. Because our approach specifically models the formation of shadows, segmentation results are of high accuracy. The quality of our results is further boosted by utilizing key observations about scene appearance. A qualitative and quantitative evaluation indicates that the proposed method performs better than commonly used segmentation algorithms, both for sports as well as for generic datasets.
Download

Paper Nr: 189
Title:

Segmenting High-quality Digital Images of Stomata using the Wavelet Spot Detection and the Watershed Transform

Authors:

Kauê T. N. Duarte, Marco A. G. de Carvalho and Paulo S. Martins

Abstract: Stomata are cells mostly found in plant leaves, stems and other organs. They are responsible for controlling the gas exchange process, i.e. the plant absorbs air and water vapor is released through transpiration. Therefore, stomata characteristics such as size and shape are important parameters to be taken into account. In this paper, we present a method (aiming at improved efficiency) to detect and count stomata based on the analysis of the multi-scale properties of the Wavelet, including a spot detection task working in the CIELab colorspace. We also segmented stomata images using the Watershed Transform, assigning each spot initially detected as a marker. Experiments with real and high-quality images were conducted and divided in two phases. In the first, the results were compared to both manual enumeration and another recent method existing in the literature, considering the same dataset. In the second, the segmented results were compared to a gold standard provided by a specialist using the F-Measure. The experimental results demonstrate that the proposed method results in better effectiveness for both stomata detection and segmentation.
Download

Paper Nr: 196
Title:

Automatic Object Shape Completion from 3D Point Clouds for Object Manipulation

Authors:

Rui Figueiredo, Plinio Moreno and Alexandre Bernardino

Abstract: 3D object representations should be able to model the shape at different levels, considering both low-level and high-level shape descriptions. In robotics applications, is difficult to compute the shape descriptors in self-occluded point clouds while solving manipulation tasks. In this paper we propose an object completion method that under some assumptions works well for a large set of kitchenware objects, based on Principal Component Analysis (PCA). In addition, object manipulation in robotics must consider not only the shape but the of actions that an agent may perform. Thus, shape-only descriptions are limited because do not consider where the object is located with respect to others and the type of constraints associated to manipulation actions. In this paper, we define a set of semantic parts (i.e. bounding boxes) that consider grasping constraints of kitchenware objects, and how to segment the object into those parts. The semantic parts provide a general representation across object categories, which allows to reduce the grasping hypotheses. Our algorithm is able to find the semantic parts of kitchenware objects in and efficient way
Download

Paper Nr: 199
Title:

Impact of Feature Extraction and Feature Selection Techniques on Extended Attribute Profile-based Hyperspectral Image Classification

Authors:

Rania Zaatour, Sonia Bouzidi and Ezzeddine Zagrouba

Abstract: Extended multiattribute profiles (EMAPs) are morphological profiles built on the features of a HSI reduced using a Feature Extraction (FE) technique, Principal Component Analysis (PCA) in most cases. In this paper, we propose to replace PCA with other Dimensionality Reduction (DR) techniques. First, we replace it with Local Fisher Discriminant Analysis (LFDA), a supervised locality preserving DR method. Second, we replace it with two Feature Selection (FS) techniques: \textit{ICAbs}, an Independent Component Analysis (ICA) based band selection, and its modified version that we propose in this article and which we are calling \textit{mICAbs}. In the experimental part of this paper, we compare the accuracies of classifying the sparse representations of the EMAPs built on features obtained using each of the aforementioned DR techniques. Our experiments reveal that LFDA gives, amongst all, the best classification accuracies. Besides, our proposed modification gives comparable to higher accuracies.
Download

Paper Nr: 222
Title:

Estimating Coarse 3D Shape and Pose from the Bounding Contour

Authors:

Paria Mehrani and James H. Elder

Abstract: Single-view reconstruction of a smooth 3D object is an ill-posed problem. Surface cues such as shading and texture provide local constraints on shape, but these cues can be weak, making it a challenge to recover globally correct models. The bounding contour can play an important role in constraining this global integration. Here we focus in particular on information afforded by the overall elongation (aspect ratio) of the bounding contour. We hypothesize that the tendency of objects to be relatively compact and the generic view assumption together induce a statistical dependency between the observed elongation of the object boundary and the coarse 3D shape of the solid object, a dependency that could potentially be exploited by single-view methods. To test this hypothesis we assemble a new dataset of solid 3D shapes and study the joint statistics of ellipsoidal approximations to these shapes and elliptical approximations of their orthographically projected boundaries. Optimal estimators derived from these statistics confirm our hypothesis, and we show that these estimators can be used to generate coarse 3D shape-pose estimates from the bounding contour that are significantly and substantially superior to competing methods.
Download

Paper Nr: 255
Title:

Linear Photometric Stereo using Close Lighting Images based on Intensity Differential

Authors:

Zennichiro Sasaki, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a new linear photometric stereo method from images taken under close light sources. When an images are taken under close light source, we can obtain not only surface normal but also shape from the images. However, relationship between observed intensity and object shape is not linear, and then, we have to use non-linear optimization to estimate object shape. In order to estimate object shape by just linear estimation, we focus not only direct observed intensities, but also differentials of the intensities in this paper. By using the set of observed intensity and its differentials, we can represent relationship between object shape and intensities linearly. By this linear representation, linear estimation of object shape achieved even if obtained images are taken under close light sources. Experimental results show our proposed method can reconstruct object shape by only linear estimation efficiently and accurately.
Download

Paper Nr: 260
Title:

3D Human Shapes Correspondence using the Principal Curvature Fields on a Local Surface Parametrization

Authors:

Ilhem Sboui, Majdi Jribi and Faouzi Ghorbel

Abstract: In this paper, we address the problem of the correspondence between 3D non-rigid human shapes. We propose a local surface description around the 3D human body extremities. It is based on the mean of principal curvature fields values on the intrinsic Darcyan parametrization constructed around these points. The similarity between the resulting descriptors is, then, measured in the sense of the L2 distance. Experiments on a several human objects from the TOSCA dataset confirm the accuracy of the proposed approach.
Download

Paper Nr: 262
Title:

ThermoFlowScan: Automatic Thermal Flow Analysis of Machines from Infrared Video

Authors:

Arindam Saha, Jitender Maurya, Sushovan Mukherjee and Ranjan Dasgupta

Abstract: Periodic maintenance is a primitive task for preventive maintenance of machines. Abnormal heat generation or heat flow is one of initial indicators of probable future failure. In this paper we presented an autonomous inspection system for heat flow monitoring and measurement of machines in non-invasive way. The proposed system uses infrared (IR) imaging to capture thermal properties of machines that generate heat. Heat sources are segmented and all segmented regions are tracked until heat is present to record the changes in heat pattern. Every hot segment is broken into multiple buckets that aligned towards a specific direction. Heat propagation towards every direction is analysed. The outcome of the presented analysis calculates rate of thermal flow along every directions. Presented results showed the proposed method is capable of measuring heat flow accurately for different type of machines and analysis has the usability for predictive maintenance for machines.
Download

Paper Nr: 265
Title:

Robust Statistical Prior Knowledge for Active Contours - Prior Knowledge for Active Contours

Authors:

Mohamed Amine Mezghich, Ines Sakly, Slim Mhiri and Faouzi Ghorbel

Abstract: We propose in this paper a new method of active contours with statistical shape prior. The presented approach is able to manage situations where the prior knowledge on shape is unknown in advance and we have to construct it from the available training data. Given a set of several shape clusters, we use a set of complete, stable and invariants shape descriptors to represent shape. A Linear Discriminant Analysis (LDA), based on Patrick-Fischer criterion, is then applied to form a distinct clusters in a low dimensional feature subspace. Feature distribution is estimated using an Estimation-Maximization (EM) algorithm. Having a currently detected front, a Bayesian classifier is used to assign it to the most probable shape cluster. Prior knowledge is then constructed based on it’s statistical properties. The shape prior is then incorporated into a level set based active contours to have satisfactory segmentation results in presence of partial occlusion, low contrast and noise.
Download

Paper Nr: 271
Title:

Saliency Sandbox - Bottom-up Saliency Framework

Authors:

David Geisler, Wolfgang Fuhl, Thiago Santini and Enkelejda Kasneci

Abstract: Saliency maps are used to predict the visual stimulus raised from a certain region in a scene. Most approaches to calculate the saliency in a scene can be divided into three consecutive steps: extraction of feature maps, calculation of activation maps, and the combination of activation maps. In the past two decades, several new saliency estimation approaches have emerged. However, most of these approaches are not freely available as source code, thus requiring researchers and application developers to reimplement them. Moreover, others are freely available but use different platforms for their implementation. As a result, employing, evaluating, and combining existing approaches is time consuming, costly, and even error-prone (e.g., when reimplementation is required). In this paper, we introduce the Saliency Sandbox, a framework for the fast implementation and prototyping of saliency maps, which employs a flexible architecture that allows designing new saliency maps by combining existing and new approaches such as Itti & Koch, GBVS, Boolean Maps and many more. The Saliency Sandbox comes with a large set of implemented feature extractors as well as some of the most popular activation approaches. The framework core is written in C++; nonetheless, interfaces for Matlab and Simulink allow for fast prototyping and integration of already existing implementations. Our source code is available at: www.ti.uni-tuebingen.de/perception.
Download

Paper Nr: 24
Title:

Hellinger Kernel-based Distance and Local Image Region Descriptors for Sky Region Detection from Fisheye Images

Authors:

Y. El Merabet, Y. Ruichek, S. Ghaffarian, Z. Samir, T. Boujiha, R. Touahni, R. Messoussi and A. Sbihi

Abstract: Characterizing GNSS signals reception environment using fisheye camera oriented to the sky is one of the relevant approaches which have been proposed to compensate the lack of performance of GNSS occurring when operating in constrained environments (dense urbain areas). This solution consists, after classification of acquired images into two regions (sky and not-sky), in identifying satellites as line-of-sight (LOS) satellites or non-line-of-sight (NLOS) satellites by repositioning the satellites in the classified images. This paper proposes a region-based image classification method through local image region descriptors and Hellinger kernel-based distance. The objective is to try to improve results obtained previously by a state of the art method. The proposed approach starts by simplifying the acquired image with a suitable couple of colorimetric invariant and exponential transform. After that, a segmentation step is performed in order to extract from the simplified image regions of interest using Statistical Region Merging method. The next step consists of characterizing the obtained regions with local RGB color and a number of local color texture descriptors using image quantization. Finally, the characterized regions are classified into sky and non sky regions by using supervised MSRC (Maximal Similarity Based Region Classification) method through Hellinger kernel-based distance. Extensive experiments have been performed to prove the effectiveness of the proposed approach.
Download

Paper Nr: 70
Title:

Bark Recognition to Improve Leaf-based Classification in Didactic Tree Species Identification

Authors:

Sarah Bertrand, Guillaume Cerutti and Laure Tougne

Abstract: In this paper, we propose a botanical approach for tree species classification through automatic bark analysis. The proposed method is based on specific descriptors inspired by the characterization keys used by botanists, from visual bark texture criteria. The descriptors and the recognition system are developed in order to run on a mobile device, without any network access. Our obtained results show a similar rate when compared to the state of the art in tree species identification from bark images with a small feature vector. Furthermore, we also demonstrate that the consideration of the bark identification significantly improves the performance of tree classification based on leaf only.
Download

Paper Nr: 76
Title:

Bimodal Model-based 3D Vision and Defect Detection for Free-form Surface Inspection

Authors:

Christophe Simler, Dirk Berndt and Christian Teutsch

Abstract: This paper presents a 3D vision sensor and its algorithms aiming at automatically detect a large variety of defects in the context of industrial surface inspection of free-form metallic pieces of cars. Photometric stereo (surface normal vectors) and stereo vision (dense 3D point cloud) are combined in order to respectively detect small and large defects. Free-form surfaces introduce natural edges which cannot be discriminated from our defects. In order to handle this problem, a background subtraction via measurement simulation (point cloud and normal vectors) from the CAD model of the object is suggested. This model-based pre-processing consists in subtracting real and simulated data in order to build two complementary “difference” images, one from photometric stereo and one from stereo vision, highlighting respectively small and large defects. These images are processed in parallel by two algorithms, respectively optimized to detect small and large defects and whose results are merged. These algorithms use geometrical information via image segmentation and geometrical filtering in a supervised classification scheme of regions.
Download

Paper Nr: 107
Title:

Pattern Width Description through Disk Cover - Application to Digital Font Recognition

Authors:

Nikita Lomov and Leonid Mestetskiy

Abstract: We consider the concept of "the width of a figure" for objects of complex shapes in order to use it as an integral morphological descriptor in image recognition tasks. In this article we propose a new approach to the description of this concept on the basis of the figures covering by disks of a certain size. The area of the disk cover as a function of the covering disc size is a shape descriptor. Original method for analytical calculation the area of disk cover of polygonal shapes is presented. The method is universal because there is always the possibility of polygonal approximating of complex digital binary images and geometric objects with nonlinear boundary. The method is based on the medial representation of the polygonal figure as a skeleton and a radial function. Our approach ensures high accuracy and computational efficiency calculate the area of disk cover. The effectiveness of the proposed approach is demonstrated for applications in computer font’s recognition problem.
Download

Paper Nr: 117
Title:

Image Segmentation using Local Probabilistic Atlases Coupled with Topological Information

Authors:

Gaetan Galisot, Thierry Brouard, Jean-Yves Ramel and Elodie Chaillou

Abstract: Atlas-based segmentation is a widely used method for Magnetic Resonance Imaging (MRI) segmentation. It is also a very efficient method for the automatic segmentation of brain structures. In this paper, we propose a more adaptive and interactive atlas-based method. The proposed model allows to combine several local probabilistic atlases with a topological graph. Local atlases can provide more precise information about the structure’s shape and the spatial relationships between each of these atlases are learned and stored inside a graph representation. In this way, local registrations need less computational time and image segmentation can be guided by the user in an incremental way. Pixel classification is achieved with the help of a hidden Markov random field that is able to integrate the a priori information with the intensities coming from different modalities. The proposed method was tested on the OASIS dataset, used in the MICCAI’12 challenge for multi-atlas labeling.
Download

Paper Nr: 150
Title:

An Unsupervised Bayesian Approach to Lung Nodule Segmentation in Computed Tomography

Authors:

Matthew Sprague and Suman K. Mitra

Abstract: The volumetric analysis of lung nodules provides the basis for both the detection of the presence of lung cancer as well as a measurement of the nodules response to treatment over time. A new unsupervised method of lung nodule segmentation for computed tomography (CT) is presented in this paper. The method employs a Bayesian sampling-resampling approach to approximate prior probabilities and provide a 3-D intensity based segmentation. A static morphological opening is used to remove non-nodule structures from the segmentation. The method is tested on 438 CT scans of lung nodules from the publically available LIDC-IDRI dataset. The method is able to attain a segmentation performance of 58.7% overlap with radiologist provided segmentation truth, a similar high performance as previously published methods. The approach is promising in its ability to adaptively provide an initial intensity based segmentation for further processing without the burden of supervised learning.

Paper Nr: 181
Title:

Robust System for Partially Occluded People Detection in RGB Images

Authors:

Marcos Baptista-Ríos, Marta Marrón-Romera, Cristina Losada-Gutiérrez, José Angel Cruz-Lozano and Antonio del Abril

Abstract: This work presents a robust system for people detection in RGB images. The proposal increases the robustness of previous approaches against partial occlusions, and it is based on a bank of individual detectors whose results are combined using a multimodal association algorithm. Each individual detector is trained for a different body part (full body, half top, half bottom, half left and half right body parts). It consists of two elements: a feature extractor that obtains a Histogram of Oriented Gradients (HOG) descriptor, and a Support Vector Machine (SVM) for classification. Several experimental tests have been carried out in order to validate the proposal, using INRIA and CAVIAR datasets, that have been widely used by the scientific community. The obtained results show that the association of all the body part detections presents a better accuracy that any of the parts individually. Regarding the body parts, the best results have been obtained for the full body and half top body.
Download

Paper Nr: 194
Title:

Braid Hairstyle Recognition based on CNNs

Authors:

Chao Sun and Won-Sook Lee

Abstract: In this paper, we present a novel braid hairstyle recognition system based on Convolutional Neural Networks (CNNs). We first build a hairstyle patch dataset that is composed of braid hairstyle patches and non-braid hairstyle patches (straight hairstyle patches, curly hairstyle patches, and kinky hairstyle patches). Then we train our hairstyle recognition system via transfer learning on a pre-trained CNN model in order to extract the features of different hairstyles. Our hairstyle recognition CNN model achieves the accuracy of 92.7% on image patch dataset. Then the CNN model is used to perform braid hairstyle detection and recognition in full-hair images. The experiment results shows that the patch-level trained CNN model can successfully detect and recognize braid hairstyle in image-level.
Download

Paper Nr: 195
Title:

Robust People Detection and Tracking from an Overhead Time-of-Flight Camera

Authors:

Alvaro Fernandez-Rincon, David Fuentes-Jimenez, Cristina Losada-Gutierrez, Marta Marron-Romera, Carlos A. Luna, Javier Macias-Guarasa and Manuel Mazo

Abstract: In this paper we describe a system for robust detection of people in a scene, by using an overhead Time of Flight (ToF) camera. The proposal addresses the problem of robust detection of people, by three means: a carefully designed algorithm to select regions of interest as candidates to belong to people; the generation of a robust feature vector that efficiently model the human upper body; and a people classification stage, to allow robust discrimination of people and other objects in the scene. The proposal also includes a particle filter tracker to allow people identification and tracking. Two classifiers are evaluated, based on Principal Component Analysis (PCA), and Support Vector Machines (SVM). The evaluation is carried out on a subset of a carefully designed dataset with a broad variety of conditions, providing results comparing the PCA and SVM approaches, and also the performance impact of the tracker, with satisfactory results.
Download

Paper Nr: 198
Title:

Evaluation of Deep Convolutional Neural Network-Based Representations for Cross Dataset Person Re-Identification

Authors:

Alper Ulu and Hazım Kemal Ekenel

Abstract: Video surveillance systems have great importance to ensure public safety. Today, these kind of systems not only capture and distribute video but also have various smart applications. Person re-identification is one of the most important of these applications. In this work, we have exploited deep convolutional neural network-based representations for cross dataset person re-identification problem. We have selected well-known deep convolutional neural network models, namely, AlexNet, VGG-16, and GoogLeNet, and fine-tuned them with the largest publicly available person re-identification datasets. We have employed cosine similarity metric to calculate the similarity between extracted features. CUHK03 and Market-1501 datasets were used as the training sets and the proposed method has been tested on the VIPeR dataset. Superior results have been obtained with the proposed method compared to the state-of-the-art methods in the field.

Paper Nr: 209
Title:

Segmentation Technique based on Information Redundancy Minimization

Authors:

Dmitry Murashov

Abstract: In this paper, a problem of image segmentation quality is considered. The problem of segmentation quality is viewed as selecting the best segmentation from a set of images generated by segmentation algorithm at different parameter values. We use superpixel algorithm SLIC supplemented with the simple post-processing procedure for generating a set of partitioned images with different number of segments. A technique for selecting the best segmented image is proposed. We propose to use information redundancy measure as a criterion for optimizing segmentation quality. It is shown that proposed method for constructing the redundancy measure provides it with extremal properties. Computing experiment was conducted using the images from the Berkeley Segmentation Dataset. The experiment confirmed that the segmented image corresponding to a minimum of redundancy measure produces the suitable dissimilarity when compared with the original image. The segmented image that was selected using the proposed criterion, gives the highest similarity with the ground-truth segmentations, available in the database.
Download

Paper Nr: 215
Title:

RGB-D Sensor based Obstacle Detection and Feedback Strategy for the Visual Impaired

Authors:

Hong Liu, Lizhen Tang, Jun Wang, Xiangdong Wang and Yueliang Qian

Abstract: Obstacle detection and feedback method are important for safe work to the visual impaired. Most of existing visual sensors based obstacle detection methods are just concerning the current frame, which are difficult to deal with false negatives and false positives caused by noise.This paper proposes a novel obstacle detection method and a feedback strategy based on depth data of RGB-D sensor for the visual impaired. Firstly, a fast and robust ground detection method is introduced based on multiscale-voxel plane segmentation and geometric constraints. Secondly, the regions besides the ground area are segmented by region growing. Then an area division based obstacle detection algorithm is proposed, which combines the analysis of context information and historical data. Finally, a multi-level voice feedback strategy is proposed to help the user know the situation of the front scene and a wearable prototype system is designed. The experimental results show the proposed methods can detect obstacle information fast and robustly and feedback the information of front scene simply and friendly.

Paper Nr: 234
Title:

Can a Driver Assistance System Determine if a Driver is Perceiving a Pedestrian? - Consideration of the Driver’s Visual Adaptation to Illumination Change

Authors:

Yuki Imaeda, Takatsugu Hirayama, Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide and Hiroshi Murase

Abstract: We propose an estimation method of pedestrian detectability considering the driver’s visual adaptation to illumination change. Since it is important for driver assistance systems to determine if a driver is perceiving a pedestrian or not, estimation of pedestrian detectability by the driver is required. However, previous studies do not consider drastic illumination changes that degrades the detection performance by the driver. We assumed that driver’s visual characteristics change in proportion to the adaptation period after illumination change. Therefore we constructed estimators corresponding to different adaptation periods, and estimated the pedestrian detectability by switching them according to the period. To evaluate the proposed method, we constructed an experimental environment to present a subject with illumination changes and conducted an experiment to measure and estimate the pedestrian detectability according to the adaptation period. Results showed that the proposed method could estimate the pedestrian detectability accurately after the illumination changed drastically.
Download

Paper Nr: 238
Title:

Real-time Vision-based UAV Navigation in Fruit Orchards

Authors:

Dries Hulens, Maarten Vandersteegen and Toon Goedemé

Abstract: Unmanned Aerial Vehicles (UAV) enable numerous agricultural applications such as terrain mapping, monitor crop growth, detecting areas with diseases and so on. For these applications a UAV flies above the terrain and has a global view of the plants. When the individual fruits or plants have to be examined, an oblique view is better, e.g. via an inspection-camera mounted on expensive all-terrain wheeled robots that drive through the orchard. However, in this paper we aim to autonomously navigate through the orchard with a low-cost UAV and cheap sensors (e.g. a webcam). Evidently, this is challenging since every orchard or even every corridor looks different. For this we developed a vision-based system that detects the center and end of the corridor to autonomously navigate the UAV towards the end of the orchard without colliding with the trees. Furthermore extensive experiments were performed to prove that our algorithm is able to navigate through the orchard with high accuracy and in real-time, even on embedded hardware. A connection with a ground station is thus unnecessary which makes the UAV fully autonomous.
Download

Paper Nr: 269
Title:

Transforms of Hough Type in Abstract Feature Space: Generalized Precedents

Authors:

Elena Nelyubina, Vladimir Ryazanov and Alexander Vinogradov

Abstract: In this paper the role of intrinsic and introduced data structures in constructing efficient data analysis algorithms is analyzed. We investigate the concept of generalized precedent and based on its use transforms of Hough type for search dependencies in data, reduction of dimension, and improvement of decision rule.
Download

Area 3 - Image and Video Understanding

Full Papers
Paper Nr: 9
Title:

Detection and Orientation Estimation for Cyclists by Max Pooled Features

Authors:

Wei Tian and Martin Lauer

Abstract: In this work we propose a new kind of HOG feature which is built by the max pooling operation over spatial bins and orientation channels in multilevel and can efficiently deal with deformation of objects in images. We demonstrate its invariance against both translation and rotation in feature levels. Experimental results show a great precision gain on detection and orientation estimation for cyclists by applying this new feature on classical cascaded detection frameworks. In combination of the geometric constraint, we also show that our system can achieve a real time performance for simultaneous cyclist detection and its orientation estimation.
Download

Paper Nr: 36
Title:

Fully Convolutional Crowd Counting on Highly Congested Scenes

Authors:

Mark Marsden, Kevin McGuinness, Suzanne Little and Noel E. O’Connor

Abstract: In this paper we advance the state-of-the-art for crowd counting in high density scenes by further exploring the idea of a fully convolutional crowd counting model introduced by (Zhang et al., 2016). Producing an accurate and robust crowd count estimator using computer vision techniques has attracted significant research interest in recent years. Applications for crowd counting systems exist in many diverse areas including city planning, retail, and of course general public safety. Developing a highly generalised counting model that can be deployed in any surveillance scenario with any camera perspective is the key objective for research in this area. Techniques developed in the past have generally performed poorly in highly congested scenes with several thousands of people in frame (Rodriguez et al., 2011). Our approach, based largely on the work of (Zhang et al., 2016), consists of the following steps: (1) A training set augmentation scheme that minimises redundancy among training samples to improve model generalisation and overall counting performance; (2) a deep, single column, fully convolutional network (FCN); (3) a multi-scale averaging step during inference. The developed technique can analyse images of any resolution or aspect ratio and achieves state-of-the-art counting performance on the Shanghaitech Part_B and UCF_CC_50 datasets as well as competitive performance on Shanghaitech Part_A.
Download

Paper Nr: 44
Title:

Human Activity Recognition using Deep Neural Network with Contextual Information

Authors:

Li Wei and Shishir K. Shah

Abstract: Human activity recognition is an important yet challenging research topic in the computer vision community. In this paper, we propose context features along with a deep model to recognize the individual subject activity in the videos of real-world scenes. Besides the motion features of the subject, we also utilize context information from multiple sources to improve the recognition performance. We introduce the scene context features that describe the environment of the subject at global and local levels. We design a deep neural network structure to obtain the high-level representation of human activity combining both motion features and context features. We demonstrate that the proposed context feature and deep model improve the activity recognition performance by comparing with baseline approaches. We also show that our approach outperforms state-of-the-art methods on 5-activities and 6-activities versions of the Collective Activities Dataset.
Download

Paper Nr: 45
Title:

Object Detection Oriented Feature Pooling for Video Semantic Indexing

Authors:

Kazuya Ueki and Tetsunori Kobayashi

Abstract: We propose a new feature extraction method for video semantic indexing. Conventional methods extract features densely and uniformly across an entire image, whereas the proposed method exploits the object detector to extract features from image windows with high objectness. This feature extraction method focuses on ``objects.'' Therefore, we can eliminate the unnecessary background information, and keep the useful information such as the position, the size, and the aspect ratio of a object. Since these object detection oriented features are complementary to features from entire images, the performance of video semantic indexing can be further improved. Experimental comparisons using large-scale video dataset of the TRECVID benchmark demonstrated that the proposed method substantially improved the performance of video semantic indexing.
Download

Paper Nr: 60
Title:

Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading

Authors:

Adriana Fernandez-Lopez and Federico M. Sukno

Abstract: Speech is the most common communication method between humans and involves the perception of both auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, but it has been demonstrated that video can provide information that is complementary to the audio. Thus, the study of automatic lip-reading is important and is still an open problem. One of the key challenges is the definition of the visual elementary units (the visemes) and their vocabulary. Many researchers have analyzed the importance of the phoneme to viseme mapping and have proposed viseme vocabularies with lengths between 11 and 15 visemes. These viseme vocabularies have usually been manually defined by their linguistic properties and in some cases using decision trees or clustering techniques. In this work, we focus on the automatic construction of an optimal viseme vocabulary based on the association of phonemes with similar appearance. To this end, we construct an automatic system that uses local appearance descriptors to extract the main characteristics of the mouth region and HMMs to model the statistic relations of both viseme and phoneme sequences. To compare the performance of the system different descriptors (PCA, DCT and SIFT) are analyzed. We test our system in a Spanish corpus of continuous speech. Our results indicate that we are able to recognize approximately 58% of the visemes, 47% of the phonemes and 23% of the words in a continuous speech scenario and that the optimal viseme vocabulary for Spanish is composed by 20 visemes.
Download

Paper Nr: 86
Title:

Spontaneous Facial Expression Recognition using Sparse Representation

Authors:

Dawood Al Chanti and Alice Caplier

Abstract: Facial expression is the most natural means for human beings to communicate their emotions. Most facial expression analysis studies consider the case of acted expressions. Spontaneous facial expression recognition is significantly more challenging since each person has a different way to react to a given emotion. We consider the problem of recognizing spontaneous facial expression by learning discriminative dictionaries for sparse representation. Facial images are represented as a sparse linear combination of prototype atoms via Orthogonal Matching Pursuit algorithm. Sparse codes are then used to train an SVM classifier dedicated to the recognition task. The dictionary that sparsifies the facial images (feature points with the same class labels should have similar sparse codes) is crucial for robust classification. Learning sparsifying dictionaries heavily relies on the initialization process of the dictionary. To improve the performance of dictionaries, a random face feature descriptor based on the Random Projection concept is developed. The effectiveness of the proposed method is evaluated through several experiments on the spontaneous facial expressions DynEmo database. It is also estimated on the well-known acted facial expressions JAFFE database for a purpose of comparison with state-of-the-art methods.
Download

Paper Nr: 111
Title:

Joint Semantic and Motion Segmentation for Dynamic Scenes using Deep Convolutional Networks

Authors:

Nazrul Haque, Dinesh Reddy and K. Madhava Krishna

Abstract: Dynamic scene understanding is a challenging problem and motion segmentation plays a crucial role in solving it. Incorporating semantics and motion enhances the overall perception of the dynamic scene. For applications of outdoor robotic navigation, joint learning methods have not been extensively used for extracting spatiotemporal features or adding different priors into the formulation. The task becomes even more challenging without stereo information being incorporated. This paper proposes an approach to fuse semantic features and motion clues using CNNs, to address the problem of monocular semantic motion segmentation. We deduce semantic and motion labels by integrating optical flow as a constraint with semantic features into dilated convolution network. The pipeline consists of three main stages i.e Feature extraction, Feature amplification and Multi Scale Context Aggregation to fuse the semantics and flow features. Our joint formulation shows significant improvements in monocular motion segmentation over the state of the art methods on challenging KITTI tracking dataset.
Download

Paper Nr: 126
Title:

Pedestrian Counting using Deep Models Trained on Synthetically Generated Images

Authors:

Sanjukta Ghosh, Peter Amon, Andreas Hutter and André Kaup

Abstract: Counting pedestrians in surveillance applications is a common scenario. However, it is often challenging to obtain sufficient annotated training data, especially so for creating models using deep learning which require a large amount of training data. To address this problem, this paper explores the possibility of training a deep convolutional neural network (CNN) entirely from synthetically generated images for the purpose of counting pedestrians. Nuances of transfer learning are exploited to train models from a base model trained for image classification. A direct approach and a hierarchical approach are used during training to enhance the capability of the model for counting higher number of pedestrians. The trained models are then tested on natural images of completely different scenes captured by different acquisition systems not experienced by the model during training. Furthermore, the effectiveness of the cross entropy cost function and the squared error cost function are evaluated and analyzed for the scenario where a model is trained entirely using synthetic images. The performance of the trained model for the test images from the target site can be improved by fine-tuning using the image of the background of the target site.
Download

Paper Nr: 162
Title:

Generative vs. Discriminative Deep Belief Netwok for 3D Object Categorization

Authors:

Nabila Zrira, Mohamed Hannat, El Houssine Bouyakhf and Haris Ahmad khan

Abstract: Object categorization has been an important task of computer vision research in recent years. In this paper, we propose a new approach for representing and learning 3D object categories. First, We extract the Viewpoint Feature Histogram (VFH) descriptor from point clouds and then we learn the resulting features using deep learning architectures. We evaluate the performance of both generative and discriminative deep belief network architectures (GDBN/DDBN) for object categorization task. GDBN trains a sequence of Restricted Boltzmann Machines (RBMs) while DDBN uses a new deep architecture based on RBMs and the joint density model. Our results show the power of discriminative model for object categorization and outperform state-of-the-art approaches when tested on the Washington RGBD dataset.
Download

Paper Nr: 182
Title:

Towards View-point Invariant Person Re-identification via Fusion of Anthropometric and Gait Features from Kinect Measurements

Authors:

Athira Nambiar, Alexandre Bernardino, Jacinto C. Nascimento and Ana Fred

Abstract: In this work, we present view-point invariant person re-identification (Re-ID) by multi-modal feature fusion of 3D soft biometric cues. We exploit the MS Kinect sensor v.2, to collect the skeleton points from the walking subjects and leverage both the anthropometric features and the gait features associated with the person. The key proposals of the paper are two fold: First, we conduct an extensive study of the influence of various features both individually and jointly (by fusion technique), on the person Re-ID. Second, we present an actual demonstration of the view-point invariant Re-ID paradigm, by analysing the subject data collected in different walking directions. Focusing the latter, we further analyse three different categories which we term as pseudo, quasi and full view-point invariant scenarios, and evaluate our system performance under these various scenarios. Initial pilot studies were conducted on a new set of 20 people, collected at the host laboratory. We illustrate, for the first time, gait-based person re-identification with truly view-point invariant behaviour, i.e. the walking direction of the probe sample being not represented in the gallery samples.
Download

Paper Nr: 237
Title:

Zero-shot Object Prediction using Semantic Scene Knowledge

Authors:

Rene Grzeszick and Gernot A. Fink

Abstract: This work focuses on the semantic relations between scenes and objects for visual object recognition. Semantic knowledge can be a powerful source of information especially in scenarios with few or no annotated training samples. These scenarios are referred to as zero-shot or few-shot recognition and often build on visual attributes. Here, instead of relying on various visual attributes, a more direct way is pursued: after recognizing the scene that is depicted in an image, semantic relations between scenes and objects are used for predicting the presence of objects in an unsupervised manner. Most importantly, relations between scenes and objects can easily be obtained from external sources such as large scale text corpora from the web and, therefore, do not require tremendous manual labeling efforts. It will be shown that in cluttered scenes, where visual recognition is difficult, scene knowledge is an important cue for predicting objects.
Download

Paper Nr: 273
Title:

Trained 3D Models for CNN based Object Recognition

Authors:

Kripasindhu Sarkar, Kiran Varanasi and Didier Stricker

Abstract: We present a method for 3D object recognition in 2D images which uses 3D models as the only source of the training data. Our method is particularly useful when a 3D CAD object or a scan needs to be identified in a catalogue form a given query image; where we significantly cut down the overhead of manual labeling. We take virtual snapshots of the available 3D models by a computer graphics pipeline and fine-tune existing pretrained CNN models for our object categories. Experiments show that our method performs better than the existing local-feature based recognition system in terms of recognition recall.
Download

Short Papers
Paper Nr: 29
Title:

People Detection in Fish-eye Top-views

Authors:

Meltem Demirkus, Ling Wang, Michael Eschey, Herbert Kaestle and Fabio Galasso

Abstract: Is the detection of people in top views any easier than from the much researched canonical fronto-parallel views (e.g. Caltech and INRIA pedestrian datasets)? We show that in both cases people appearance variability and false positives in the background limit performance. Additionally, we demonstrate that the use of fish-eye lenses further complicates the top-view people detection, since the person viewpoint ranges from nearly-frontal, at the periphery of the image, to perfect top-views, in the image center, where only the head and shoulder top profiles are visible. We contribute a new top-view fish-eye benchmark, we experiment with a state-of-the-art person detector (ACF) and evaluate approaches which balance less variability of appearance (grid of classifiers) with the available amount of data for training. Our results indicate the importance of data abundance over the model complexity and additionally stress the importance of an exact geometric understanding of the problem, which we also contribute here.
Download

Paper Nr: 53
Title:

Wheelchair-user Detection Combined with Parts-based Tracking

Authors:

Ukyo Tanikawa, Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase and Ryo Kawai

Abstract: In recent years, there has been an increasing demand for automatic wheelchair-user detection from a surveillance video to support wheelchair users. However, it is difficult to detect them due to occlusions by surrounding pedestrians in a crowded scene. In this paper, we propose a detection method of wheelchair users robust to such occlusions. Concretely, in case the detector cannot a detect wheelchair user, the proposed method estimates his/her location by parts-based tracking based on parts relationship through time. This makes it possible to detect occluded wheelchair users even though he/she is heavily occluded. As a result of an experiment, the detection of wheelchair users with the proposed method achieved the highest accuracy in crowded scenes, compared with comparative methods.
Download

Paper Nr: 56
Title:

Near Real-time Object Detection in RGBD Data

Authors:

Ronny Hänsch, Stefan Kaiser and Olaf Helwich

Abstract: Most methods of object detection with RGBD cameras set hard constraints on their operational area. They only work with specific objects, in specific environments, or rely on time consuming computations. In the context of home robotics, such hard constraints cannot be made. Specifically, an autonomous home robot shall be equipped with an object detection pipeline that runs in near real-time and produces reliable results without restricting object type and environment. For this purpose, a baseline framework that works on RGB data only is extended by suitable depth features that are selected on the basis of a comparative evaluation. The additional depth data is further exploited to reduce the computational cost of the detection algorithm. A final evaluation of the enhanced framework shows significant improvements compared to its original version and state-of-the-art methods in terms of both, detection performance and real-time capability.
Download

Paper Nr: 63
Title:

From Depth Data to Head Pose Estimation: A Siamese Approach

Authors:

Marco Venturelli, Guido Borghi, Roberto Vezzani and Rita Cucchiara

Abstract: The correct estimation of the head pose is a problem of the great importance for many applications. For instance, it is an enabling technology in automotive for driver attention monitoring. In this paper, we tackle the pose estimation problem through a deep learning network working in regression manner. Traditional methods usually rely on visual facial features, such as facial landmarks or nose tip position. In contrast, we exploit a Convolutional Neural Network (CNN) to perform head pose estimation directly from depth data. We exploit a Siamese architecture and we propose a novel loss function to improve the learning of the regression network layer. The system has been tested on two public datasets, Biwi Kinect Head Pose and ICT-3DHP database. The reported results demonstrate the improvement in accuracy with respect to current state-of-the-art approaches and the real time capabilities of the overall framework.
Download

Paper Nr: 83
Title:

Fast Fingerprint Classification with Deep Neural Networks

Authors:

Daniel Michelsanti, Andreea-Daniela Ene, Yanis Guichi, Rares Stef, Kamal Nasrollahi and Thomas B. Moeslund

Abstract: Reducing the number of comparisons in automated fingerprint identification systems is essential when dealing with a large database. Fingerprint classification allows to achieve this goal by dividing fingerprints into several categories, but it presents still some challenges due to the large intra-class variations and the small inter-class variations. The vast majority of the previous methods uses global characteristics, in particular the orientation image, as features of a classifier. This makes the feature extraction stage highly dependent on preprocessing techniques and usually computationally expensive. In this work we evaluate the performance of two pre-trained convolutional neural networks fine-tuned on the NIST SD4 benchmark database. The obtained results show that this approach is comparable with other results in the literature, with the advantage of a fast feature extraction stage.
Download

Paper Nr: 84
Title:

Unsupervised Discovery of Normal and Abnormal Activity Patterns in Indoor and Outdoor Environments

Authors:

Dario Dotti, Mirela Popa and Stylianos Asteriadis

Abstract: In this paper we propose an adaptive system for monitoring indoor and outdoor environments using movement patterns. Our system is able to discover normal and abnormal activity patterns in absence of any prior knowledge. We employ several feature descriptors, by extracting both spatial and temporal cues from trajectories over a spatial grid. Moreover, we improve the initial feature vectors by applying sparse autoencoders, which help at obtaining optimized and compact representations and improved accuracy. Next, activity models are learnt in an unsupervised manner using clustering techniques. The experiments are performed on both indoor and outdoor datasets. The obtained results prove the suitability of the proposed system, achieving an accuracy of over 98% in classifying normal vs. abnormal activity patterns for both scenarios. Furthermore, a semantic interpretation of the most important regions of the scene is obtained without the need of human labels, highlighting the flexibility of our method.
Download

Paper Nr: 90
Title:

Increasing the Stability of CNNs using a Denoising Layer Regularized by Local Lipschitz Constant in Road Understanding Problems

Authors:

Hamed H. Aghdam, Elnaz J. Heravi and Domenec Puig

Abstract: One of the challenges in problems related to road understanding is to deal with noisy images. Especially, recent studies have revealed that ConvNets are sensitive to small perturbations in the input. One solution for dealing with this problem is to generate many noisy images during training a ConvNet. However, this approach is very costly and it is not a certain solution. In this paper, we propose an objective function regularized by the local Lipschitz constant and train a ReLU layer for restoring noisy images. Our experiments on the GTSRB and the Caltech-Pedestrian datasets show that this lightweight approach not only increases the accuracy of the classification ConvNets on the clean datasets but it also increases the stability of the ConvNets against noise. Comparing our method with similar approaches shows that it produces more stable ConvNets while it is computationally similar or more efficient than these methods.
Download

Paper Nr: 91
Title:

Explaining Adversarial Examples by Local Properties of Convolutional Neural Networks

Authors:

Hamed H. Aghdam, Elnaz J. Heravi and Domenec Puig

Abstract: Vulnerability of ConvNets to adversarial examples have been mainly studied by devising a solution for generating adversarial examples. Early studies suggested that sensitivity of ConvNets to adversarial examples are due to their non-linearity. Most recent studies explained that instability of ConvNet to these examples are because of their linear nature. In this work, we analyze some of local properties of ConvNets that are directly related to their unreliability to adversarial examples. We shows that ConvNets are not locally isotropic and symmetric. Also, we show that Mantel score of distance matrices in the input and output of a ConvNet is very low showing that topology of points located at a very close distance to a samples might significantly change by ConvNets. We also explain that non-linearity of topology changes in ConvNet are because they apply an affine transformation in each layer. Furthermore, we explain that despite the fact that global Lipschitz constant of a ConvNet might be greater than 1, it is locally less than 1 in most of adversarial examples.
Download

Paper Nr: 103
Title:

Consistent Optical Flow Maps for Full and Micro Facial Expression Recognition

Authors:

Benjamin Allaert, Ioan Marius Bilasco and Chabane Djeraba

Abstract: A wide variety of face models have been used in the recognition of full or micro facial expressions in image sequences. However, the existing methods only address one family of expression at a time, as micro-expressions are quite different from full-expressions in terms of facial movement amplitude and/or texture changes. In this paper we address the detection of micro and full-expression with a common facial model characterizing facial movements by means of consistent Optical Flow estimation. Optical Flow extracted from the face is generally noisy and without specific processing it can hardly cope with expression recognition requirements especially for micro-expressions. Direction and magnitude statistical profiles are jointly analyzed in order to filter out noise and obtain and feed consistent Optical Flows in a face motion model framework. Experiments on CK+ and CASME2 facial expression databases for full and micro expression recognition show the benefits brought by the proposed approach in the filed of facial expression recognition.
Download

Paper Nr: 110
Title:

Deep Learning with Sparse Prior - Application to Text Detection in the Wild

Authors:

Adleni Mallek, Fadoua Drira, Rim Walha, Adel M. Alimi and Frank LeBourgeois

Abstract: Text detection in the wild remains a very challenging task in computer vision. According to the state-of-the-art, no text detector system, robust whatever the circumstances, exists up to date. For instance, the complexity and the diversity of degradations in natural scenes make traditional text detection methods very limited and inefficient. Recent studies reveal the performance of texture-based approaches especially including deep models. Indeed, the main strengthens of these models is the availability of a learning framework coupling feature extraction and classifier. Therefore, this study focuses on developing a new texture-based approach for text detection that takes advantage of deep learning models. In particular, we investigate sparse prior in the structure of PCANet; the convolution neural network known for its simplicity and rapidity and based on a cascaded principal component analysis (PCA). The added-value of the sparse coding is the representation of each feature map via coupled dictionaries to migrate from one level-resolution to an adequate lower-resolution. The specificity of the dictionary is the use of oriented patterns well-suited for textual pattern description. The experimental study performed on the standard benchmark, ICDAR 2003, proves that the proposed method achieves very promising results.
Download

Paper Nr: 112
Title:

Evaluation of Deep Image Descriptors for Texture Retrieval

Authors:

Bojana Gajic, Eduard Vazquez and Ramon Baldrich

Abstract: The increasing complexity learnt in the layers of a Convolutional Neural Network has proven to be of great help for the task of classification. The topic has received great attention in recently published literature. Nonetheless, just a handful of works study low-level representations, commonly associated with lower layers. In this paper, we explore recent findings which conclude, counterintuitively, the last layer of the VGG convolutional network is the best to describe a low-level property such as texture. To shed some light on this issue, we are proposing a psychophysical experiment to evaluate the adequacy of different layers of the VGG network for texture retrieval. Results obtained suggest that, whereas the last convolutional layer is a good choice for a specific task of classification, it might not be the best choice as a texture descriptor, showing a very poor performance on texture retrieval. Intermediate layers show the best performance, showing a good combination of basic filters, as in the primary visual cortex, and also a degree of higher level information to describe more complex textures.
Download

Paper Nr: 115
Title:

Fast Eye Tracking and Feature Measurement using a Multi-stage Particle Filter

Authors:

Radu Danescu, Adrian Sergiu Darabant and Diana Borza

Abstract: Eye trackers – systems that measure the activity of the eyes – are nowadays used in creative ways into a variety of domains: medicine, psychology, automotive industry, marketing etc. This paper presents a real time method for tracking and measuring eye features (iris position, eye contour, blinks) in video frames based on particle filters. We propose a coarse-to-fine approach to solve the eye tracking problem: a first particle filter is used to roughly estimate the position of the iris centers. Next, this estimate is analysed to decide the state of the eyes: opened or half-opened/closed. If the eyes are opened, two independent particles filters are used to determine the contour of each eye. Our algorithm takes less than 11 milliseconds on a regular PC.
Download

Paper Nr: 124
Title:

Video-based Feedback for Assisting Physical Activity

Authors:

Renato Baptista, Michel Antunes, Djamila Aouada and Björn Ottersten

Abstract: In this paper, we explore the concept of providing feedback to a user moving in front of a depth camera so that he is able to replicate a specific template action. This can be used as a home based rehabilitation system for stroke survivors, where the objective is for patients to practice and improve their daily life activities. Patients are guided in how to correctly perform an action by following feedback proposals. These proposals are presented in a human interpretable way. In order to align an action that was performed with the template action, we explore two different approaches, namely, Subsequence Dynamic Time Warping and Temporal Commonality Discovery. The first method aims to find the temporal alignment and the second one discovers the interval of the subsequence that shares similar content, after which standard Dynamic Time Warping can be used for the temporal alignment. Then, feedback proposals can be provided in order to correct the user with respect to the template action. Experimental results show that both methods have similar accuracy rate and the computational time is a decisive factor, where Subsequence Dynamic Time Warping achieves faster results.
Download

Paper Nr: 130
Title:

Optimized 4D DPM for Pose Estimation on RGBD Channels using Polisphere Models

Authors:

Enrique Martinez, Oliver Nina, Antonio J. Sánchez and Carlos Ricolfe

Abstract: The Deformable Parts Model (DPM) is a standard method to perform human pose estimation on RGB images, 3 channels. Although there has been much work to improve such method, little work has been done on utilizing DPM on other types of imagery such as RGBD data. In this paper, we describe a formulation of the DPM model that makes use of depth information channels in order to improve joint detection and pose estimation using 4 channels. In order to offset the time complexity and overhead added to the model due to extra channels to process, we propose an optimization for the proposed algorithm based on solving direct and inverse kinematic equations, that form we can reduce the interested points reducing, at the same time, the time complexity. Our results show a significant improvement on pose estimation over the standard DPM model on our own RGBD dataset and on the public CAD60 dataset.
Download

Paper Nr: 139
Title:

Parking Space Occupancy Verification - Improving Robustness using a Convolutional Neural Network

Authors:

Troels H. P. Jensen, Helge T. Schmidt, Niels D. Bodin, Kamal Nasrollahi and Thomas B. Moeslund

Abstract: With the number of privately owned cars increasing, the issue of locating an available parking space becomes apparant. This paper deals with the problem of verifying if a parking space is vacant, using a vision based system overlooking parking areas. In particular the paper proposes a binary classifier system, based on a Con- volutional Neural Network, that is capable of determining if a parking space is occupied or not. A benchmark database consisting of images captured from different parking areas, under different weather and illumina- tion conditions, has been used to train and test the system. The system shows promising performance on the database with an overall accuracy of 99.71 %
Download

Paper Nr: 153
Title:

Mobile Tutoring System in Facial Expression Perception and Production for Children with Autism Spectrum Disorder

Authors:

Sergey Anishchenko, Alexandra Sarelaynen, Konstantin Kalinin, Anastasia Popova, Natalia Malygina-Lastovka and Kira Mesnyankina

Abstract: Children with autism spectrum disorder are impaired in their ability to produce and recognize facial expressions and unable to interpret the social meaning of facial cues. Human interventionists can effectively train autistic individuals on facial expressions perception and production, they may benefit even more from computer-based intervention. In this study the tablet PC application was developed for learning facial expression perception and production. It uses newly designed computer vision algorithm for facial expression analysis which allows to estimate if the posed expression correct or not and guide users. Clinical trial was done with participation of 19 volunteer subjects from 6 to 12 years old. It was shown that after intervention subjects skills in emotion recognition were improved. Ability to transfer newly developed skills to children’s everyday life was also investigated. Parent’s questioning performed in 6 months after the intervention demonstrates that 10 out of 19 children were able to recognize emotion and respectively change their behavior in everyday life.
Download

Paper Nr: 166
Title:

Primitive Shape Recognition via Superquadric Representation using Large Margin Nearest Neighbor Classifier

Authors:

Ryo Hachiuma, Yuko Ozasa and Hideo Saito

Abstract: It is known that humans recognize objects using combinations and positional relations of primitive shapes. The first step of such recognition is to recognize 3D primitive shapes. In this paper, we propose a method for primitive shape recognition using superquadric parameters with a metric learning method, large margin nearest neighbor (LMNN). Superquadrics can represent various types of primitive shapes using a single equation with few parameters. These parameters are used as the feature vector of classification. The real objects of primitive shapes are used in our experiment, and the results show the effectiveness of using LMNN for recognition based on superquadrics. Compared to the previous methods, which used k-nearest neighbors (76.5%) and Support Vector Machines (73.5%), our LMNN method has the best performance (79.5%).
Download

Paper Nr: 170
Title:

Automated Multimodal Volume Registration based on Supervised 3D Anatomical Landmark Detection

Authors:

Rémy Vandaele, François Lallemand, Philippe Martinive, Akos Gulyban, Sébastien Jodogne, Philippe Coucke, Pierre Geurts and Raphaël Marée

Abstract: We propose a new method for automatic 3D multimodal registration based on anatomical landmark detection. Landmark detectors are learned independantly in the two imaging modalities using Extremely Randomized Trees and multi-resolution voxel windows. A least-squares fitting algorithm is then used for rigid registration based on the landmark positions as predicted by these detectors in the two imaging modalities. Experiments are carried out with this method on a dataset of pelvis CT and CBCT scans related to 45 patients. On this dataset, our fully automatic approach yields results very competitive with respect to a manually assisted state-of-the-art rigid registration algorithm.
Download

Paper Nr: 187
Title:

Evaluating Deep Convolutional Neural Networks for Material Classification

Authors:

Grigorios Kalliatakis, Georgios Stamatiadis, Shoaib Ehsan, Ales Leonardis, Juergen Gall, Anca Sticlaru and Klaus D. McDonald-Maier

Abstract: Determining the material category of a surface from an image is a demanding task in perception that is drawing increasing attention. Following the recent remarkable results achieved for image classification and object detection utilising Convolutional Neural Networks (CNNs), we empirically study material classification of everyday objects employing these techniques. More specifically, we conduct a rigorous evaluation of how state-of-the art CNN architectures compare on a common ground over widely used material databases. Experimental results on three challenging material databases show that the best performing CNN architectures can achieve up to 94.99% mean average precision when classifying materials.
Download

Paper Nr: 223
Title:

Studying Stability of Different Convolutional Neural Networks Against Additive Noise

Authors:

Hamed H. Aghdam, Elnaz J. Heravi and Domenec Puig

Abstract: Understanding internal process of ConvNets is commonly done using visualization techniques. However, these techniques do not usually provide a tool for estimating stability of a ConvNet against noise. In this paper, we show how to analyze a ConvNet in the frequency domain. Using the frequency domain analysis, we show the reason that a ConvNet might be sensitive to a very low magnitude additive noise. Our experiments on a few ConvNets trained on different datasets reveals that convolution kernels of a trained ConvNet usually pass most of the frequencies and they are not able to effectively eliminate the effect of high frequencies.They also show that a convolution kernel with more concentrated frequency response is more stable against noise. Finally, we illustrate that augmenting a dataset with noisy images can compress the frequency response of convolution kernels.
Download

Paper Nr: 225
Title:

Part-driven Visual Perception of 3D Objects

Authors:

Frithjof Gressmann, Timo Lüddecke, Tatyana Ivanovska, Markus Schoeler and Florentin Wörgötter

Abstract: During the last years, approaches based on convolutional neural networks (CNN) had substantial success in visual object perception. CNNs turned out to be capable of extracting high-level features of objects, which allow for fine-grained classification. However, some object classes exhibit tremendous variance with respect to their instances appearance. We believe that considering object parts as an intermediate representation could be helpful in these cases. In this work, a part-driven perception of everyday objects with a rotation estimation is implemented using deep convolution neural networks. The used network is trained and tested on artificially generated RGB-D data. The approach has a potential to be used for part recognition of realistic sensor recordings in present robot systems.
Download

Paper Nr: 275
Title:

EyeLad: Remote Eye Tracking Image Labeling Tool - Supportive Eye, Eyelid and Pupil Labeling Tool for Remote Eye Tracking Videos

Authors:

Wolfgang Fuhl, Thiago Santini, David Geisler, Thomas Kübler and Enkelejda Kasneci

Abstract: Ground truth data is an important prerequisite for the development and evaluation of many algorithms in the area of computer vision, especially when these are based on convolutional neural networks or other machine learning approaches that unfold their power mostly by supervised learning. This learning relies on ground truth data, which is laborious, tedious, and error prone for humans to generate. In this paper, we contribute a labeling tool (EyeLad) specifically designed for remote eye-tracking data to enable researchers to leverage machine learning based approaches in this field, which is of great interest for the automotive, medical, and human-computer interaction applications. The tool is multi platform and supports a variety of state-of-theart detection and tracking algorithms, including eye detection, pupil detection, and eyelid coarse positioning. Furthermore, the tool provides six types of point-wise tracking to automatically track the labeled points. The software is openly and freely available at: www.ti.uni-tuebingen.de/perception.

Paper Nr: 277
Title:

Graph Navigation for Exploring Very Large Image Collections

Authors:

Kai Uwe Barthel and Nico Hezel

Abstract: We present a new approach to visually browse very large sets of untagged images. In this paper we describe how to generate high quality image descriptors/features using transformed activations of a convolutional neural network. These features are used to model image similarities, which again are used to build a hierarchical image graph. We show how such an image graph can be constructed efficiently. After investigating several browsing and visualization concepts, we found best user experience and ease of usage is achieved by projecting sub-graphs onto a regular 2D-image map. This allows users to explore the image graph similar to navigation services.
Download

Paper Nr: 278
Title:

Hyperspectral Terrain Classification for Ground Vehicles

Authors:

Christian Winkens, Florian Sattler and Dietrich Paulus

Abstract: Hyperspectral imaging increases the amount of information incorporated per pixel in comparison to normal RGB color cameras. Conventional spectral cameras as used in satellite imaging use spatial or spectral scanning during acquisition which is only suitable for static scenes. In dynamic scenarios, such as in autonomous driving applications, the acquisition of the entire hyperspectral cube at the same time is mandatory. We investigate the eligibility of novel snapshot hyperspectral cameras. It captures an entire hyperspectral cube without requiring moving parts or line-scanning. The sensor is tested in a driving scenario in rough terrain with dynamic scenes. Captured hyperspectral data is used for terrain classification utilizing machine learning techniques. The multi-class classification is evaluated against a novel hyperspectral ground truth dataset specifically created for this purpose.
Download

Paper Nr: 51
Title:

Noise-resistant Unsupervised Object Segmentation in Multi-view Indoor Point Clouds

Authors:

Dmytro Bobkov, Sili Chen, Martin Kiechle, Sebastian Hilsenbeck and Eckehard Steinbach

Abstract: 3D object segmentation in indoor multi-view point clouds (MVPC) is challenged by a high noise level, varying point density and registration artifacts. This severely deteriorates the segmentation performance of state-of-the- art algorithms in concave and highly-curved point set neighborhoods, because concave regions normally serve as evidence for object boundaries. To address this issue, we derive a novel robust criterion to detect and remove such regions prior to segmentation so that noise modelling is not required anymore. Thus, a significant number of inter-object connections can be removed and the graph partitioning problem becomes simpler. After initial segmentation, such regions are labelled using a novel recovery procedure. Our approach has been experimentally validated within a typical segmentation pipeline on multi-view and single-view point cloud data. To foster further research, we make the labelled MVPC dataset public (Bobkov et al., 2017).
Download

Paper Nr: 52
Title:

Can We Detect Pedestrians using Low-resolution LIDAR? - Integration of Multi-frame Point-clouds

Authors:

Yoshiki Tatebe, Daisuke Deguchi, Yasutomo Kawanishi, Ichiro Ide, Hiroshi Murase and Utsushi Sakai

Abstract: In recent years, demand for pedestrian detection using inexpensive low-resolution LIDAR (LIght Detection And Ranging) is increasing, as it can be used to prevent traffic accidents involving pedestrians. However, it is difficult to detect pedestrians from a low-resolution (sparse) point-cloud obtained by a low-resolution LIDAR. In this paper, we propose multi-frame features calculated by integrating point-clouds over multiple frames for increasing the point-cloud resolution, and extracting their temporal changes. By combining these features, the accuracy of the pedestrian detection from low-resolution point-clouds can be improved. We conducted experiments using LIDAR data obtained in actual traffic environments. Experimental results showed that the proposed method could detect pedestrians accurately from low-resolution LIDAR data.
Download

Paper Nr: 54
Title:

Deep Manifold Embedding for 3D Object Pose Estimation

Authors:

Hiroshi Ninomiya, Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase, Norimasa Kobori and Yusuke Nakano

Abstract: Recently, 3D object pose estimation is being focused. The Parametric Eigenspace method is known as one of the fundamental methods for this. It represents the appearance change of an object caused by pose change with a manifold embedded in a low-dimensional subspace. It obtains features by Principal Component Analysis (PCA), which maximizes the appearance variation. However, there is a problem that it cannot handle a pose change with slight appearance change since there is not always a correlation between pose change and appearance change. In this paper, we propose a method that introduces “Deep Manifold Embedding” which maximizes the pose variation directly. We construct a manifold from features extracted from Deep Convolutional Neural Networks (DCNNs) trained with pose information. Pose estimation with the proposed method achieved the best accuracy in experiments using a public dataset.
Download

Paper Nr: 58
Title:

Geometrical and Visual Feature Quantization for 3D Face Recognition

Authors:

Walid Hariri, Hedi Tabia, Nadir Farah, David Declercq and Abdallah Benouareth

Abstract: In this paper, we present an efficient method for 3D face recognition based on vector quantization of both geometrical and visual proprieties of the face. The method starts by describing each 3D face using a set of orderless features, and use then the Bag-of-Features paradigm to construct the face signature. We analyze the performance of three well-known classifiers: the Naïve Bayes, the Multilayer perceptron and the Random forests. The results reported on the FRGCv2 dataset show the effectiveness of our approach and prove that the method is robust to facial expression.
Download

Paper Nr: 121
Title:

Dense Semantic Stereo Labelling Architecture for In-Campus Navigation

Authors:

Jorge Beltrán, Carlos Jaraquemada, Basam Musleh, Arturo De La Escalera and Jose María Armingol

Abstract: Interest on autonomous vehicles has rapidly increased in the last few years, due to recent advances in the field and the appearance of semi-autonomous solutions in the market. In order to reach fully autonomous navigation, a precise understanding of the vehicle surroundings is required. This paper presents a novel ROS-based architecture for stereo-vision-based semantic scene labelling. The objective is to provide the necessary information to a path planner in order to perform autonomous navigation around the university campus. The output of the algorithm contains the classification of the obstacles in the scene into four different categories: traversable areas, garden, static obstacles, and pedestrians. Validation of the labelling method is accomplished by means of a hand-labelled ground truth, generated from a stereo sequence captured in the university campus. The experimental results show the high performance of the proposed approach.
Download

Paper Nr: 131
Title:

Detection of Human Rights Violations in Images: Can Convolutional Neural Networks Help?

Authors:

Grigorios Kalliatakis, Shoaib Ehsan, Maria Fasli, Ales Leonardis, Juergen Gall and Klaus D. McDonald-Maier

Abstract: After setting the performance benchmarks for image, video, speech and audio processing, deep convolutional networks have been core to the greatest advances in image recognition tasks in recent times. This raises the question of whether there are any benefit in targeting these remarkable deep architectures with the unattempted task of recognising human rights violations through digital images. Under this perspective, we introduce a new, well-sampled human rights-centric dataset called Human Rights Understanding (HRUN). We conduct a rigorous evaluation on a common ground by combining this dataset with different state-of-the-art deep convolutional architectures in order to achieve recognition of human rights violations. Experimental results on the HRUN dataset have shown that the best performing CNN architectures can achieve up to 88.10% mean average precision. Additionally, our experiments demonstrate that increasing the size of the training samples is crucial for achieving an improvement on mean average precision principally when utilising very deep networks.
Download

Paper Nr: 135
Title:

Incorporation of High Level Information in Images Retrieval

Authors:

Farzaneh Saadati, Parvin Razzaghi and Farideh Saadati

Abstract: Content Based Image retrieval (CBIR) is one of the active research areas in computer vision. CBRI searches for similar images from large collections of database images, which belong to the same category of the query image. CBIR is an unsupervised approach that only uses the visual content of an image to retrieve similar images. The main contribution of this paper is to utilize high-level information as well as low-level information to retrieve images. The proposed approach has two steps: (i) a first retrieval set of similar images are obtained using low-level information (ii) for the images of the first retrieval set, high-level information are extracted and then images are reordered. To extract high level knowledge, some candidate objects from each image are obtained. Then each candidate object is described using CNN. In our approach, to define similarity measure, corresponding objects between two images are found and then OMDSL distance metric is applied to compute similarity of corresponded objects. We used MSRC-21 and Caltech256 datasets for evaluating the proposed approach. The obtained results show that our approach outperforms comparable state-of-the-art approaches.

Paper Nr: 137
Title:

Skeleton-based Human Action Recognition - A Learning Method based on Active Joints

Authors:

Ahmad K. N. Tehrani, Maryam Asadi Aghbolaghi and Shohreh Kasaei

Abstract: A novel method for human action recognition from the sequence of skeletal data is presented in this paper. The proposed method is based on the idea that some of body joints are inactive and do not have any physical meaning during performing an action. In other words, regardless of the subjects that perform an action, for each action only a certain set of joints are meaningfully involved. Consequently, extracting features from inactive joints is a time-consuming task. To cope with this problem, in this paper, only the dynamic of active joints is modeled. To consider the local temporal information, a sliding window is used to divide the trajectory of active joints into some consecutive windows. Feature extraction is then applied on all windows of active joints’ trajectories and then by using the K-means clustering all features are quantized. Since each action has its own active joints, in this paper one-vs-all classification strategy is exploited. Finally, to take into account the global motion information, the consecutive quantized features of the samples of an action are fed into the hidden Markov model (HMM) of that action. The experimental results show that using active joints can get 96% of maximum reachable accuracy from using all joints.
Download

Paper Nr: 184
Title:

Evaluation of Hardware Oriented MRCoHOG using Logic Simulation

Authors:

Yuta Yamasaki, Shiryu Ooe, Akihiro Suzuki, Kazuhiro Kuno, Hideo Yamada, Shuichi Enokida and Hakaru Tamukoh

Abstract: Human detection require high speed and high accuracy processing. One of the high performance techniques of the detection is multi-resolution co-occurrence histogram of oriented gradients (MRCoHOG). Since the calculation of co-occurrence requires a huge amount of processing resources, it is difficult to realize real-time human detection with MRCoHOG. Accordingly, hardware implementation is considered to be effective. In this paper, a hardware oriented MRCoHOG is proposed. In the proposed method, we simplify complicated calculation such as multiplications and square root operation for efficient hardware implementation. Experimental results show that the proposed method achieves better human detection rate than the ordinary method. Moreover, MRCoHOG is implemented in a digital circuit with the proposed method. According to logic simulation of the proposed circuit, the processing speed of the hardware implementation is 466 times higher than the software implementation.
Download

Paper Nr: 208
Title:

3D Region Proposals For Selective Object Search

Authors:

Sheetal Reddy, Vineet Gandhi and Madhava Krishna

Abstract: The advent of indoor personal mobile robots has clearly demonstrated their utility in assisting humans at various places such as workshops, offices, homes, etc. One of the most important cases in such autonomous scenarios is where the robot has to search for certain objects in large rooms. Exploring the whole room would prove to be extremely expensive in terms of both computing power and time. To address this issue, we demonstrate a fast algorithm to reduce the search space by identifying possible object locations as two classes, namely - Support Structures and Clutter. Support Structures are plausible object containers in a scene such as tables, chairs, sofas, etc. Clutter refers to places where there seem to be several objects but cannot be clearly distinguished. It can also be identified as unorganized regions which can be of interest for tasks such as robot grasping, fetching and placing objects. The primary contribution of this paper is to quickly identify potential object locations using a Support Vector Machine(SVM) learnt over the features extracted from the depth map and the RGB image of the scene, which further culminates into a densely connected Conditional Random Field(CRF) formulated over the image of the scene. The inference over the CRF leads to assignment of the labels - support structure, clutter, others to each pixel.There have been reliable outcomes even during challenging scenarios such as the support structures being far from the robot. The experiments demonstrate the efficacy and speed of the algorithm irrespective of alterations to camera angles, modifications to appearance change, lighting and distance from locations etc.
Download

Paper Nr: 233
Title:

On Efficient Computation of Tensor Subspace Kernels for Multi-dimensional Data

Authors:

Bogusław Cyganek and Michał Woźniak

Abstract: In pattern classification problems kernel based methods and multi-dimensional methods have shown many advantages. However, since the well-known kernel functions are defined over one-dimensional vector spaces, it is not straightforward to join these two domains. Nevertheless, there are attempts to develop kernel functions which can directly operate with multi-dimensional patterns, such as the recently proposed kernels operating on Grassmannian manifolds. These are based on the concept of the principal angles between the orthogonal spaces rather than simple distances between vectors. An example is the chordal kernel operating on the subspaces obtained after tensor unfolding. However, a real problem with these methods are their high computational demands. In this paper we address the problem of efficient implementation of the chordal kernel for operation with tensors in classification tasks of real computer vision problems. The paper extends our previous works in this field. The proposed method was tested in the problems of object recognition in computer vision. The experiments show good accuracy and accelerated performance.
Download

Paper Nr: 239
Title:

Human Skeleton Detection from Semi-constrained Environment Video

Authors:

Palwasha Afsar, Paulo Cortez and Henrique Santos

Abstract: The correct classification of human skeleton from video is a key issue for the recognition of human actions and behavior. In this paper, we present a computational system for a passive detection of human star skeleton from raw video. The overall system is based on two main modules: segmentation and star skeleton detection. For each module, several computer vision methods were adjusted and tested under a comparative analysis that used a challenging video dataset (e.g., different daylight and weather conditions). The obtained results show that our system is capable of detecting human skeletons in most situations.
Download

Paper Nr: 240
Title:

Hierarchical Feature Extraction using Partial Least Squares Regression and Clustering for Image Classification

Authors:

Ryoma Hasegawa and Kazuhiro Hotta

Abstract: In this paper, we propose an image classification method using Partial Least Squares regression (PLS) and clustering. PLSNet is a simple network using PLS for image classification and obtained high accuracies on the MNIST and CIFAR-10 datasets. It crops a lot of local regions from training images as explanatory variables, and their class labels are used as objective variables. Then PLS is applied to those variables, and some filters are obtained. However, there are a variety of local regions in each class, and intra-class variance is large. Therefore, we consider that local regions in each class should be divided and handled separately. In this paper, we apply clustering to local regions in each class and make a set from a cluster of all classes. There are some sets whose number is the number of clusters. Then we apply PLSNet to each set. By doing the processes, we obtain some feature vectors per image. Finally, we train SVM for each feature vector and classify the images by voting the result of SVM. Our PLSNet obtained 82.42% accuracy on the CIFAR-10 dataset. This accuracy is 1.69% higher than PLSNet without clustering and an attractive result of the methods without CNN.
Download

Paper Nr: 243
Title:

Improving Open Source Face Detection by Combining an Adapted Cascade Classification Pipeline and Active Learning

Authors:

Steven Puttemans, Can Ergun and Toon Goedemé

Abstract: Computer vision has almost solved the issue of in the wild face detection, using complex techniques like convolutional neural networks. On the contrary many open source computer vision frameworks like OpenCV have not yet made the switch to these complex techniques and tend to depend on well established algorithms for face detection, like the cascade classification pipeline suggested by Viola and Jones. The accuracy of these basic face detectors on public datasets like FDDB stays rather low, mainly due to the high number of false positive detections. We propose several adaptations to the current existing face detection model training pipeline of OpenCV. We improve the training sample generation and annotation procedure, and apply an active learning strategy. These boost the accuracy of in the wild face detection on the FDDB dataset drastically, closing the gap towards the accuracy gained by CNN-based face detectors. The proposed changes allow us to provide an improved face detection model to OpenCV, achieving a remarkably high precision at an acceptable recall, two critical requirements for further processing pipelines like person identification, etc.
Download

Area 4 - Applications and Services

Full Papers
Paper Nr: 35
Title:

Measuring Human-made Corner Structures with a Robotic Total Station using Support Points, Lines and Planes

Authors:

Christoph Klug, Dieter Schmalstieg and Clemens Arth

Abstract: Measuring non-planar targets with a total station in reflectorless mode is a challenging and error-prone task. Any accurate 3D point measurement requires a fully reflected laser beam of the electronic distance meter and proper orientation of the pan-tilt unit. Prominent structures like corners and edges often cannot fulfill these requirements and cannot be measured reliably. We present three algorithms and user interfaces for simple and efficient construction-side measurement corrections of the systematic error, using additional measurements close to the non-measurable target. Post-processing of single-point measurements is not required with our methods, and our experiments prove that using a 3D point, a 3D line or a 3D plane support can lower the systematic error by almost a order of magnitude.
Download

Paper Nr: 79
Title:

Towards an Electronic Orientation Table: Using Features Extracted From the Image to Register Digital Elevation Model

Authors:

Leo Nicolle, Julien Bonneton, Hubert Konik, Damien Muselet and Laure Tougne

Abstract: The generation of a virtual representation of the bones and fragments is an artificial step required in order to obtain helpful models to work with in a simulation. Nowadays, the Marching Cubes algorithm is a de facto standard for the generation of geometric models from medical images. However, bone fragments models generated by Marching Cubes are huge and contain many unconnected geometric elements inside the bone due to the trabecular tissue. The development of new methods to generate geometrically simple 3D models from CT image stacks that preserve the original information extracted from them would be of great interest. In order to achieve that, a preliminary study for the development of a new method to generate triangle meshes from segmented medical images is presented. The method does not modify the points extracted from CT images, and avoid generating triangles inside the bone. The aim of this initial study is to analyse if a spatial decomposition may help in the process of generating a triangle mesh by using a divide-and-conquer approach. The method is under development and therefore this paper only presents some initial results and exposes the detected issues to be improved.
Download

Paper Nr: 118
Title:

Automatic Generation and Detection of Visually Faultless Facial Morphs

Authors:

Andrey Makrushin, Tom Neubert and Jana Dittmann

Abstract: This paper introduces an approach to automatic generation of visually faultless facial morphs along with a proposal on how such morphs can be automatically detected. It is endeavored that the created morphs cannot be recognized as such with the naked eye and a reference automatic face recognition (AFR) system produces high similarity scores while matching a morph against faces of persons who participated in morphing. Automatic generation of morphs allows for creating abundant experimental data, which is essential (i) for evaluating the performance of AFR systems to reject morphs and (ii) for training forensic systems to detect morphs. Our first experiment shows that human performance to distinguish between morphed and genuine face images is close to random guessing. In our second experiment, the reference AFR system has verified 11.78% of morphs against any of genuine images at the decision threshold of 1% false acceptance rate. These results indicate that facial morphing is a serious threat to access control systems aided by AFR and establish the need for morph detection approaches. Our third experiment shows that the distribution of Benford features extracted from quantized DCT coefficients of JPEG-compressed morphs is substantially different from that of genuine images enabling the automatic detection of morphs.
Download

Paper Nr: 252
Title:

Dynamic Subtitle Placement Considering the Region of Interest and Speaker Location

Authors:

Wataru Akahori, Tatsunori Hirai and Shigeo Morishima

Abstract: This paper presents a subtitle placement method that reduces unnecessary eye movements. Although methods that vary the position of subtitles have been discussed in a previous study, subtitles may overlap the region of interest (ROI). Therefore, we propose a dynamic subtitling method that utilizes eye-tracking data to avoid the subtitles from overlapping with important regions. The proposed method calculates the ROI based on the eye-tracking data of multiple viewers. By positioning subtitles immediately under the ROI, the subtitles do not overlap the ROI. Furthermore, we detect speakers in a scene based on audio and visual information to help viewers recognize the speaker by positioning subtitles near the speaker. Experimental results show that the proposed method enables viewers to watch the ROI and the subtitle in longer duration than traditional subtitles, and is effective in terms of enhancing the comfort and utility of the viewing experience.
Download

Short Papers
Paper Nr: 2
Title:

Fully Automated Lung Volume Assessment from MRI in a Population-based Child Cohort Study

Authors:

Tatyana Ivanovska, Pierluigi Ciet, Adria Perez-Rovira, Anh Nguyen, Harm Tiddens, Liesbeth Duijts, Marleen de Bruijne and Florentin Wöergöetter

Abstract: In this work, a framework for fully automated lung extraction from magnetic resonance imaging (MRI) inspiratory data that have been acquired within a on-going epidemiological child cohort study is presented. The method’s main steps are intensity inhomogeneity correction, denoising, clustering, airway extraction and lung region refinement. The presented approach produces highly accurate results (Dice coefficients ≥ 95%), when compared to semi-automatically obtained masks, and has potential to be applied to the whole study data.
Download

Paper Nr: 31
Title:

Skin Temperature Measurement based on Human Skeleton Extraction and Infra-red Thermography - An Application of Sensor Fusion Methods in the Field of Physical Training

Authors:

Julia Richter, Christian Wiede, Sascha Kaden, Martin Weigert and Gangolf Hirtz

Abstract: Skin temperature measurements play a vital role in the diagnosis of diseases. This topic is also increasingly investigated for applications in the field of physical training. One of the limitations of state-of-the-art methods is the manual, time-consuming way to measure the temperature. Moreover, extant literature gives only little insight into the skin temperature behaviour after the training. The aim of this study was to design an automatic method to measure the skin temperature during and after training sessions for the biceps brachii. For this purpose, we fused thermal images and skeleton data to locate this muscle. We could successfully demonstrate the working principle and observed a temperature increase even several minutes after the end of the training. This study therefore contributes to the automation of skin temperature measurements. A transfer of our approach could be beneficial for other application fields, such as medical diagnostics, as well.
Download

Paper Nr: 47
Title:

User Calibration-free Method using Corneal Surface Image for Eye Tracking

Authors:

Sara Suda, Kenta Yamagishi and Kentaro Takemura

Abstract: Various calibration methods to determine the point-of-regard have been proposed for eye tracking. Although user calibration can be performed for experiments carried out in the laboratory, it is unsuitable when applying an eye-tracker in user interfaces and in public displays. Therefore, we propose a novel calibration-free approach for users that is based on the use of the corneal surface image. As the environmental information is reflected on the corneal surface, we extracted the unwarped image around the point-of-regard from the cornea. The point-of-regard is estimated on the screen by using the unwarped image, and the regression formula is solved using these points without user calibration. We implemented the framework of the algorithm, and we confirmed the feasibility of the proposed method through experiments.
Download

Paper Nr: 231
Title:

EyeRecToo: Open-source Software for Real-time Pervasive Head-mounted Eye Tracking

Authors:

Thiago Santini, Wolfgang Fuhl, David Geisler and Enkelejda Kasneci

Abstract: Head-mounted eye tracking offers remarkable opportunities for research and applications regarding pervasive health monitoring, mental state inference, and human computer interaction in dynamic scenarios. Although a plethora of software for the acquisition of eye-tracking data exists, they often exhibit critical issues when pervasive eye tracking is considered, e.g., closed source, costly eye tracker hardware dependencies, and requiring a human supervisor for calibration. In this paper, we introduce EyeRecToo, an open-source software for real-time pervasive head-mounted eye-tracking. Out of the box, EyeRecToo offers multiple real-time state-ofthe- art pupil detection and gaze estimation methods, which can be easily replaced by user implemented algorithms if desired. A novel calibration method that allows users to calibrate the system without the assistance of a human supervisor is also integrated. Moreover, this software supports multiple head-mounted eye-tracking hardware, records eye and scene videos, and stores pupil and gaze information, which are also available as a real-time stream. Thus, EyeRecToo serves as a framework to quickly enable pervasive eye-tracking research and applications. Available at: www.ti.uni-tuebingen.de/perception.
Download

Paper Nr: 82
Title:

Evaluation of the Degree of Malignancy of Lung Nodules in Computed Tomography Images

Authors:

L. Gonçalves, J. Novo, A. Cunha and A. Campilho

Abstract: In lung cancer diagnosis, the design of robust Computer Aided Diagnosis (CAD) systems needs to include an adequate differentiation of benign from malignant nodules. This paper presents a CAD system for the classification of lung nodules in chest Computed Tomography (CT) scans as the way to diagnose lung cancer. The proposed method measures a set of 295 heterogeneous characteristics, including morphology, intensity or texture features, that were used as input of different KNN and SVM classifiers. The system was modeled and trained using a groundtruth provided by specialists taken from a public lung image dataset, the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI). This image dataset includes chest CT scans with lung nodule location together with information about the degree of malignancy, among other properties, provided by multiple expert clinicians. In particular, the computed degree of malignancy try to follow the manual labeling by the different radiologists. Promising results were obtained with a first order SVM with an exponential kernel achieving an area under the receiver operating characteristic curve of 96.2 ± 0.5% when compared with the groundtruth provided in the public CT lung image dataset.
Download

Paper Nr: 174
Title:

Open Implementation of DICOM for Whole-Slide Microscopic Imaging

Authors:

Sébastien Jodogne, Éric Lenaerts, Lara Marquet, Charlotte Erpicum, Roland Greimers, Pierre Gillet, Roland Hustinx and Philippe Delvenne

Abstract: This paper introduces an open implementation of DICOM for whole-slide microscopic imaging, following Supplement 145 of the DICOM standard. The software is divided into two parts: (a) a command-line tool to convert an whole-slide image to the DICOM format, and (b) a zero-footprint Web interface to display such DICOM images. The software architecture leverages the DICOM server Orthanc. The entire framework is available as free and open-source software. The existence of this software supports the development of digital pathology and telepathology in clinical environments, featuring a smooth integration with existing EHR and PACS solutions.
Download

Paper Nr: 206
Title:

Online Eye Status Detection in the Wild with Convolutional Neural Networks

Authors:

Essa R. Anas, Pedro Henriquez and Bogdan J. Matuszewski

Abstract: A novel eye status detection method is proposed. Contrary to the most of the previous methods, this new method is not based on an explicit eye appearance model. Instead, the detection is based on a deep learning methodology, where the discriminant function is learned from a large set of exemplar images of eyes at different state, appearance, and 3D position. The technique is based on the Convolutional Neural Network (CNN) architecture. To assess the performance of the proposed method, it has been tested against two techniques, namely: SVM with SURF Bag of Features and Adaboost with HOG and LBP features. It has been shown that the proposed method outperforms these with a considerable margin on a two-class problem, with the two classes defined as “opened” and “closed”. Subsequently the CNN architecture was further optimised on a three-class problem with “opened”, “closed”, and “partially-opened” classes. It has been demonstrated that it is possible to implement a real-time eye status detection working with a large variability of head poses, appearances and illumination conditions. Additionally, it has been shown that an eye blinking estimation based on the proposed technique is at least comparable with the current state-of-the-art on standard eye blinking datasets.
Download

Paper Nr: 266
Title:

Investigating Natural Interaction in Augmented Reality Environments using Motion Qualities

Authors:

Manuela Chessa and Nicoletta Noceti

Abstract: The evaluation of the users experience when interacting with virtual environments is a challenging task in Human-Machine Interaction. Its relevance is expected to further grow in the near future, when the availability of low-cost portable virtual reality tools will favour a shift – already started – from conventional interaction controllers to a larger use of Natural User Interfaces, where people are asked to use their own body to interact with the device. In this paper, we propose the use of motion qualities to analyze reaching movements, as indicators of the naturalness of the users’ actions in augmented reality scenarios. By using such an approach, we compare three different interaction modalities with virtual scenarios, with the goal of identifying the solution that mostly resembles the interaction in a real-world environment.
Download

Area 5 - Motion, Tracking and Stereo Vision

Full Papers
Paper Nr: 23
Title:

Trade-off Between GPGPU based Implementations of Multi Object Tracking Particle Filter

Authors:

Petr Jecmen, Frederic Lerasle and Alhayat Ali Mekonnen

Abstract: In this work, we present the design, analysis and implementation of a decentralized particle filter (DPF) for multiple object tracking (MOT) on a graphics processing unit (GPU). We investigate two variants of the implementation, their advantages and caveats in terms of scaling with larger particle numbers and performance on several datasets. First we compare the precision of our GPU implementation with standard CPU version. Next we compare performance of the GPU variants under different scenarios. The results show the GPU variant leads to a five fold speedup on average (in best cases the speedup reaches a factor of 18) over the CPU variant while keeping similar tracking accuracy and precision.
Download

Paper Nr: 33
Title:

Confidence-Aware Probability Hypothesis Density Filter for Visual Multi-Object Tracking

Authors:

Nicolai Wojke and Dietrich Paulus

Abstract: The Probability Hypothesis Density Filter (PHD) filter is an efficient recursive multi-object state estimator that systematically deals with data association uncertainty. In this paper, we apply the PHD filter in a tracking-bydetection framework. In order to mimic state-dependent false alarms, we introduce an adapted PHD recursion that defines clutter generators in state space. Further, we integrate detector confidence scores into the measurement likelihood. This extension is quite effective yet simple, which means that it requires few changes to the original PHD recursion, that it has the same computational complexity, and that there exist few parameters that must be adapted to the individual tracking scenario. Our evaluation on a popular pedestrian tracking dataset demonstrates results that are competitive with the state-of-the-art.
Download

Paper Nr: 34
Title:

Ensemble Kalman Filter based on the Image Structures

Authors:

Dominique Béréziat, Isabelle Herlin and Yann Lepoittevin

Abstract: One major limitation of the motion estimation methods that are available in the literature concerns the availability of the uncertainty on the result. This is however assessed by a number of filtering methods, such as the ensemble Kalman filter (EnKF). The paper consequently discusses the use of a description of the displayed structures in an ensemble Kalman filter, which is applied for estimating motion on image acquisitions. An example of such structure is a cloud on meteorological satellite acquisitions. Compared to the Kalman filter, EnKF does not require propagating in time the error covariance matrix associated to the estimation, resulting in reduced computational requirements. However, EnKF is also known for exhibiting a shrinking effect when taking into account the observations on the studied system at the analysis step. Methods are available in the literature for correcting this shrinking effect, but they do not involve the spatial content of images and more specifically the structures that are displayed on the images. Two solutions are described and compared in the paper, which are first a dedicated localization function and second an adaptive domain decomposition. Both methods proved being well suited for fluid flows images, but only the domain decomposition is suitable for an operational setting. In the paper, the two methods are applied on synthetic data and on satellite images of the atmosphere, and the results are displayed and evaluated.
Download

Paper Nr: 41
Title:

Nonlocal Regularizing Constraints in Variational Optical Flow

Authors:

Joan Duran and Antoni Buades

Abstract: Optical flow methods try to estimate a dense correspondence field describing the motion of the objects in an image sequence. We introduce novel nonlocal regularizing constraints for variational optical flow computation. While the use of similarity weights has been restricted to the regularization term so far, the proposed data terms permit to implicitly use the image geometry in order to regularize the flow and better locate motion discontinuities. The experimental results illustrate the superiority of the new constraints with respect to the classical brightness constancy assumption as well as to nonlocal regularization strategies.
Download

Paper Nr: 66
Title:

3D Plane Labeling Stereo Matching with Content Aware Adaptive Windows

Authors:

Luis Horna and Robert B. Fisher

Abstract: In this paper, we present an algorithm that exploits both the underlying 3D structure and image entropy to generate an adaptive matching window. The presented algorithm estimates real valued disparity maps by smartly exploring a 3D search space using a novel hypothesis generation approach that acts like a propagation scheduler. The proposed approach is among the top performing results when evaluated in the Middlebury, KITTI 2015 benchmarks.
Download

Paper Nr: 72
Title:

Joint Large Displacement Scene Flow and Occlusion Variational Estimation

Authors:

Roberto P. Palomares, Gloria Haro and Coloma Ballester

Abstract: This paper presents a novel variational approach for the joint estimation of scene flow and occlusions. Our method does not assume that a depth sensor is available. Instead, we use a stereo sequence and exploit the fact that points that are occluded in time, might be visible from the other view and thus the 3D geometry can be densely reinforced in an appropriate manner through a simultaneous motion occlusion characterization. Moreover, large displacements are correctly captured thanks to an optimization strategy that uses a set of sparse image correspondences to guide the minimization process. We include qualitative and quantitative experimental results on several datasets illustrating that both proposals help to improve the baseline results.
Download

Paper Nr: 78
Title:

Restoration of Temporal Image Sequence from a Single Image Captured by a Correlation Image Sensor

Authors:

Kohei Kawade, Akihiro Wakita, Tastuya Yokota, Hidekata Hontani and Shigeru Ando

Abstract: We propose a method that restores a temporal image sequence, which describes how a scene temporally changed during the exposure period, from a given still image captured by a correlation image sensor (CIS). The restored images have higher spatial resolutions than the original still image, and the restored temporal sequence would be useful for motion analysis in applications such as landmark tracking and video labeling. The CIS is different from conventional image sensors because each pixel of the CIS can directly measure the Fourier coefficients of the temporal change of the light intensity observed during the exposure period. Given a single image captured by the CIS, hence, one can restore the temporal image sequence by computing the Fourier series of the temporal change of the light strength at each pixel. Through this temporal sequence restoration, one can also reduce motion blur. The proposed method improves the performance of motion blur reduction by estimating the Fourier coefficients of the frequencies higher than the measured ones. In this work, we show that the Fourier coefficients of the higher frequencies can be estimated based on the optical flow constraint. Some experimental results with images captured by the CIS are demonstrated.
Download

Paper Nr: 85
Title:

Long-term Correlation Tracking using Multi-layer Hybrid Features in Dense Environments

Authors:

Nathanael L. Baisa, Deepayan Bhowmik and Andrew Wallace

Abstract: Tracking a target of interest in crowded environments is a challenging problem, not yet successfully addressed in the literature. In this paper, we propose a new long-term algorithm, learning a discriminative correlation filter and using an online classifier, to track a target of interest in dense video sequences. First, we learn a translational correlation filter using a multi-layer hybrid of convolutional neural networks (CNN) and traditional hand-crafted features. We combine the advantages of both the lower convolutional layer which retains better spatial detail for precise localization, and the higher convolutional layer which encodes semantic information for handling appearance variations. This is integrated with traditional features formed from a histogram of oriented gradients (HOG) and color-naming. Second, we include a re-detection module for overcoming tracking failures due to long-term occlusions by training an incremental (online) SVM on the most confident frames using hand-engineered features. This re-detection module is activated only when the correlation response of the object is below some pre-defined threshold to generate high score detection proposals. Finally, we incorporate a Gaussian mixture probability hypothesis density (GM-PHD) filter to temporally filter high score detection proposals generated from the learned online SVM to find the detection proposal with the maximum weight as the target position estimate by removing the other detection proposals as clutter. Extensive experiments on dense data sets show that our method significantly outperforms state-of-the-art methods.
Download

Paper Nr: 96
Title:

An Integrated System based on Binocular Learned Receptive Fields for Saccade-vergence on Visually Salient Targets

Authors:

Daniele Re, Agostino Gibaldi, Silvio P. Sabatini and Michael W. Spratling

Abstract: The human visual system uses saccadic and vergence eyes movements to foveate interesting objects with both eyes, and thus exploring the visual scene. To mimic this biological behavior in active vision, we proposed a bio-inspired integrated system able to learn a functional sensory representation of the environment, together with the motor commands for binocular eye coordination, directly by interacting with the environment itself. The proposed architecture, rather than sequentially combining different functionalities, is a robust integration of different modules that rely on a front-end of learned binocular receptive fields to specialize on different sub-tasks. The resulting modular architecture is able to detect salient targets in the scene and perform precise binocular saccadic and vergence movement on it. The performances of the proposed approach has been tested on the iCub Simulator, providing a quantitative evaluation of the computational potentiality of the learned sensory and motor resources.
Download

Paper Nr: 104
Title:

Occlusion Robust Symbol Level Fusion for Multiple People Tracking

Authors:

Nyan Bo Bo, Peter Veelaert and Wilfried Philips

Abstract: In single view visual target tracking, an occlusion is one of the most challenging problems since target’s features are partially/fully covered by other targets as occlusion occurred. Instead of a limited single view, a target can be observed from multiple viewpoints using a network of cameras to mitigate the occlusion problem. However, information coming from different views must be fused by relying less on views with heavy occlusion and relying more on views with no/small occlusion. To address this need, we proposed a new fusion method which fuses the locally estimated positions of a person by the smart cameras observing from different viewpoints while taking into account the occlusion in each view. The genericity and scalability of the proposed fusion method is high since it needs only the position estimates from the smart cameras. Uncertainty for each local estimate is locally computed in a fusion center from the simulated occlusion assessment based on the camera’s projective geometry. These uncertainties together with the local estimates are used to model the probabilistic distributions required for the Bayesian fusion of the local estimates. The performance evaluation on three challenging video sequences shows that our method achieves higher accuracy than the local estimates as well as the tracking results using a classical triangulation method. Our method outperforms two state-ofthe-art trackers on a publicly available multi-camera video sequence.
Download

Paper Nr: 116
Title:

P-HAF: Homography Estimation using Partial Local Affine Frames

Authors:

Daniel Barath

Abstract: We propose an algorithm, called P-HAF, to estimate planar homographies using partially known local affine transformations. This general theory is able to exploit the affine components obtained by the commonly used partially affine covariant detectors, such as SIFT or SURF, in a real time capable way. P-HAF as a minimal solver can estimate the homography using two SIFT correspondences, moreover, it can deal with any number of point pairs as an overdetermined system. It is validated both on synthesized and publicly available datasets that exploiting all information leads to more accurate estimates and makes multi-homography estimation less ambiguous.
Download

Paper Nr: 123
Title:

Coupled 2D and 3D Analysis for Moving Objects Detection with a Moving Camera

Authors:

Marie-Neige Chapel, Erwan Guillou and Saida Bouakaz

Abstract: The detection of moving objects in the video stream of a moving camera is a complex task. Static objects appear moving in the video stream as moving objects. Thus, it is difficult to identify motions that belong to moving objects because they are hidden by those of static objects. To detect moving objects we propose a novel geometric constraint based on 2D and 3D information. A sparse reconstruction of the visible part of the scene is performed in order to detect motions in the 3D space where the scene perception is not deformed by the camera motion. A first labeling estimation is performed in the 3D space and then apparent motions in the video stream of the moving camera are used to validate the estimation. Labels are computed from confidence values which are updated at each frame according to the geometric constraint. Our method can detect several moving objects in complex scenes with high parallax.
Download

Paper Nr: 128
Title:

Anatomical Landmark Tracking by One-shot Learned Priors for Augmented Active Appearance Models

Authors:

Oliver Mothes and Joachim Denzler

Abstract: For animal bipedal locomotion analysis, an immense amount of recorded image data has to be evaluated by biological experts. During this time-consuming evaluation single anatomical landmarks have to be annotated in each image. In this paper we reduce this effort by automating the annotation with a minimum level of user interaction. Recent approaches, based on Active Appearance Models, are improved by priors based on anatomical knowledge and an online tracking method, requiring only a single labeled frame. However, the limited search space of the online tracker can lead to a template drift in case of severe self-occlusions. In contrast, we propose a one-shot learned tracking-by-detection prior which overcomes the shortcomings of template drifts without increasing the number of training data. We evaluate our approach based on a variety of real-world X-ray locomotion datasets and show that our method outperforms recent state-of-the-art concepts for the task at hand.
Download

Paper Nr: 159
Title:

Line-based SLAM Considering Directional Distribution of Line Features in an Urban Environment

Authors:

Kei Uehara, Hideo Saito and Kosuke Hara

Abstract: In this paper, we propose a line-based SLAM from an image sequence captured by a vehicle in consideration with the directional distribution of line features that detected in an urban environments. The proposed SLAM is based on line segments detected from objects in an urban environment, for example, road markings and buildings, that are too conspicuous to be detected. We use additional constraints regarding the line segments so that we can improve the accuracy of the SLAM. We assume that the angle of the vector of the line segments to the vehicle’s direction of travel conform to four-component Gaussian mixture distribution. We define a new cost function considering the distribution and optimize the relative camera pose, position, and the 3D line segments by bundle adjustment. In addition, we make digital maps from the detected line segments. Our method increases the accuracy of localization and revises tilted lines in the digital maps. We implement our method to both the single-camera system and the multi-camera system. The accuracy of SLAM, which uses a single-camera system with our constraint, works just as well as a method that uses a multi-camera system without our constraint.
Download

Paper Nr: 179
Title:

W-PnP Method: Optimal Solution for the Weak-Perspective n-Point Problem and Its Application to Structure from Motion

Authors:

Levente Hajder

Abstract: Camera calibration is a key problem in 3D computer vision since the late 80's. Most of the calibration methods deal with the (perspective) pinhole camera model. This is not a simple goal: the problem is nonlinear due to the perspectivity. The strategy of these methods is to estimate the intrinsic camera parameters first; then the extrinsic ones are computed by the so-called PnP method. Finally, the accurate camera parameters are obtained by slow numerical optimization. In this paper, we show that the weak-perspective camera model can be optimally calibrated without numerical optimization if the $L_2$ norm is used. The solution is given by a closed-form formula, thus the estimation is very fast. We call this method as the Weak-Perspective n-Point (W-PnP) algorithm. Its advantage is that it simultaneously estimates the two intrinsic weak-perspective camera parameters and the extrinsic ones. We show that the proposed calibration method can be utilized as the solution for a subproblem of 3D reconstruction with missing data. An alternating least squares method is also defined that optimizes the camera motion using the proposed optimal calibration method.
Download

Paper Nr: 188
Title:

Deep Part Features Learning by a Normalised Double-Margin-Based Contrastive Loss Function for Person Re-Identification

Authors:

María José Gómez-Silva, José María Armingol and Arturo de la Escalera

Abstract: The selection of discriminative features that properly define a person appearance is one of the current challenges for person re-identification. This paper presents a three-dimensional representation to compare person images, which is based on the similarity, independently measured for the head, upper body, and legs from two images. Three deep Siamese neural networks have been implemented to automatically find salient features for each body part. One of the main problems in the learning of features for re-identification is the presence of intra-class variations and inter-class ambiguities. This paper proposes a novel normalized double-margin-based contrastive loss function for the training of Siamese networks, which not only improves the robustness of the learned features against the mentioned problems but also reduce the training time. A comparative evaluation over the challenging PRID 2011 dataset has been conducted, resulting in a remarkable enhancement of the single-shot re-identification performance thanks to the use of our descriptor based on deeply learned features in comparison with the employment of low-level features. The obtained results also show the improvements generated by our normalized double-margin-based function with respect to the traditional contrastive loss function.
Download

Paper Nr: 202
Title:

Pose Interpolation for Rolling Shutter Cameras using Non Uniformly Time-Sampled B-splines

Authors:

Bertrand Vandeportaele, Philippe-Antoine Gohard, Michel Devy and Benjamin Coudrin

Abstract: Rolling Shutter (RS) cameras are predominant in the tablet and smartphone market due to their low cost and small size. However, these cameras require specific geometric models when either the camera or the scene is in motion to account for the sequential exposure of the different lines of the image. This paper proposes to improve a state-of-the-art model for RS cameras through the use of Non Uniformly Time-Sampled B-splines. This allows to interpolate the pose of the camera taking into account the varying dynamic of the motion by adding more control points where needed while keeping a low number of control points where the motion is smooth. Two methods are proposed to determine adequate distributions for the control points, using either an IMU sensor or an iterative reprojection error minimization. Results on simple synthetic data sets are shown to prove the concept and future works are introduced that should lead to the integration of our model in a SLAM algorithm.
Download

Paper Nr: 203
Title:

Towards Non-rigid Reconstruction - How to Adapt Rigid RGB-D Reconstruction to Non-rigid Movements?

Authors:

Oliver Wasenmüller, Benjamin Schenkenberger and Didier Stricker

Abstract: Human body reconstruction is a very active field in recent Computer Vision research. The challenge is the moving human body while capturing, even when trying to avoid that. Thus, algorithms which explicitly cope with non-rigid movements are indispensable. In this paper, we propose a novel algorithm to extend existing rigid RGB-D reconstruction pipelines to handle non-rigid transformations. The idea is to store in addition to the model also the non-rigid transformation nrt of the current frame as a sparse warp field in the image space. We propose an algorithm to incrementally update this transformation nrt. In the evaluation we show that the novel algorithm provides accurate reconstructions and can cope with non-rigid movements of up to 5cm.
Download

Paper Nr: 210
Title:

Efficient Resource Allocation for Sparse Multiple Object Tracking

Authors:

Rui Figueiredo, João Avelino, Atabak Dehban, Alexandre Bernardino, Pedro Lima and Helder Araújo

Abstract: In this work we address the multiple person tracking problem with resource constraints, which plays a fundamental role in the deployment of efficient mobile robots for real-time applications involved in Human Robot Interaction. We pose the multiple target tracking as a selective attention problem in which the perceptual agent tries to optimize the overall expected tracking accuracy. More specifically, we propose a resource constrained Partially Observable Markov Decision Process (POMDP) formulation that allows for real-time on-line planning. Using a transition model, we predict the true state from the current belief for a finite-horizon, and take actions to maximize future expected belief-dependent rewards. These rewards are based on the anticipated observation qualities, which are provided by an observation model that accounts for detection errors due to the discrete nature of a state-of-the-art pedestrian detector. Finally, a Monte Carlo Tree Search method is employed to solve the planning problem in real-time. The experiments show that directing the attentional focci to relevant image sub-regions allows for large detection speed-ups and improvements on tracking precision.
Download

Paper Nr: 249
Title:

Reference Plane based Fisheye Stereo Epipolar Rectification

Authors:

Nobuyuki Kita and Yasuyo Kita

Abstract: When a humanoid robot walks through or performs a task in a very narrow space, it sometimes touches the environment with its hand or arm to retain its balance. To do this the robot must identify a flat surface of appropriate size with which it can make sufficient contact; the surface must also be within reach of robot's upper body. Using fisheye stereo vision, it is possible to obtain image information for a field of view wider than that of a hemisphere whose central axis is the optical axes; thus, three dimensional distances to the possible contact spaces can be evaluated at a glance. To realize it, stereo correspondence is crucial. However, the short distance between the stereo cameras and the target space causes differences in the apparent shapes of the targets in the left and right images, which can make stereo correspondence difficult. Therefore, we propose a novel method which rectifies stereo images so that the targets have the same apparent shapes in the left and right images when the targets are close to a reference plane. Actual fisheye stereo image pairs were rectified, and three dimensional measurements were performed. Better results were obtained using the proposed rectification method than using other rectification methods.
Download

Paper Nr: 279
Title:

Detection and Classification of Holes in Point Clouds

Authors:

Nader H. Aldeeb and Olaf Hellwich

Abstract: Structure from Motion (SfM) is the most popular technique behind 3D image reconstruction. It is mainly based on matching features between multiple views of the target object. Therefore, it gives good results only if the target object has enough texture on its surface. If not, virtual holes are caused in the estimated models. But, not all holes that appear in the estimated model are virtual, i.e. correspond to a failure of the reconstruction. There could be a real physical hole in the structure of the target object being reconstructed. This presents ambiguity when applying a hole-filling algorithm. That is, which hole should be filled and which must be left as it is. In this paper, we first propose a simple approach for the detection of holes in point sets. Then we investigate two different measures for automatic classification of these detected holes in point sets. According to our knowledge, hole-classification has not been addressed beforehand. Experiments showed that all holes in 3D models are accurately identified and classified.
Download

Short Papers
Paper Nr: 5
Title:

A Convex Approach for Non-rigid Structure from Motion Via Sparse Representation

Authors:

Junjie Hu and Terumasa Aoki

Abstract: This paper presents a convex solution for simultaneously recovering 3D non-rigid structures and camera motions from 2D image sequences based on sparse representation. Most existing methods rely on low rank assumption. However, it will lead to poor reconstruction for objects with strong local deformation. Also, when camera motion is unknown, there is no convex solution for non-rigid structure from motion (NRSfM). In order to solve this problem, we estimate non-rigid structures by sparse representation. In this paper, we estimate camera motions through a sparse spectral-norm minimization approach, and then a fast l1-norm minimization algorithm is introduced to reconstruct 3D structures. Both of them are convex, therefore, our method gives a global optimum. Our method can handle objects with strong local deformation and also doesn’t need low rank prior. Experimental results show that our method achieves state-of-the-art reconstruction performance on CMU benchmark dataset.
Download

Paper Nr: 7
Title:

Visual Odometry from Two Point Correspondences and Initial Automatic Camera Tilt Calibration

Authors:

Mårten Wadenbäck, Martin Karlsson, Anders Heyden, Anders Robertsson and Rolf Johansson

Abstract: Ego-motion estimation is an important step towards fully autonomous mobile robots. In this paper we propose the use of an initial but automatic camera tilt calibration, which transforms the subsequent motion estimation to a 2D rigid body motion problem. This transformed problem is solved $\ell_2$-optimally using RANSAC and a two-point method for rigid body motion. The method is experimentally evaluated using a camera mounted onto a mobile platform. The results are compared to measurements from a highly accurate external camera positioning system which are used as gold standard. The experiments show promising results on real data.
Download

Paper Nr: 50
Title:

Cost Adaptive Window for Local Stereo Matching

Authors:

J. Navarro and A. Buades

Abstract: We present a novel stereo block-matching algorithm which uses adaptive windows. The shape of the window is selected to minimize the matching cost. Such a window might be the less distorted by the disparity function and thus the optimal one for matching. Moreover, we introduce a coarse-to-fine strategy to limit the number of ambiguous matches and reduce the computational cost. The proposed approach performs as state of the art local matching methods.
Download

Paper Nr: 55
Title:

Simultaneous Estimation of Optical Flow and Its Boundaries based on the Dynamical System Model

Authors:

Yuya Michishita, Noboru Sebe, Shuichi Enokida and Eitaku Nobuyama

Abstract: Optical flow is a velocity vector which represents the motion of objects in video images. Optical flow estimation is difficult in the neighborhood of flow boundary. To resolve this problem, Sasagawa (2014) proposes a modified dynamical system model in which one assumes that, in the neighborhood of flow boundaries, the brightness flows in the perpendicular direction, and considers the resulting corrections to the brightness constancy constraint. However, in that model, the correction is occurred even in place where the flow is continuous. We propose a new model, which switches the conventional model and the proposed model in Sasagawa (2014). As a result, we expect improvement of the estimate accuracy in place where the flow is continuous. We conduct numerical experiments to investigate the improvements that the proposed model yields in the estimation accuracy of optical flows.
Download

Paper Nr: 69
Title:

3D Reconstruction of Indoor Scenes using a Single RGB-D Image

Authors:

Panagiotis-Alexandros Bokaris, Damien Muselet and Alain Trémeau

Abstract: The three-dimensional reconstruction of a scene is essential for the interpretation of an environment. In this paper, a novel and robust method for the 3D reconstruction of an indoor scene using a single RGB-D image is proposed. First, the layout of the scene is identified and then, a new approach for isolating the objects in the scene is presented. Its fundamental idea is the segmentation of the whole image in planar surfaces and the merging of the ones that belong to the same object. Finally, a cuboid is fitted to each segmented object by a new RANSAC-based technique. The method is applied to various scenes and is able to provide a meaningful interpretation of these scenes even in cases with strong clutter and occlusion. In addition, a new ground truth dataset, on which the proposed method is further tested, was created. The results imply that the present work outperforms recent state-of-the-art approaches not only in accuracy but also in robustness and time complexity.
Download

Paper Nr: 74
Title:

Real-time Stereo Vision System at Tunnel

Authors:

Yuquan Xu, Seiichi Mita, Hossein Tehrani and Kazuhisa Ishimaru

Abstract: Although stereo vision has made great progress in recent years, there are limited works which estimate the disparity for challenging scenes such as tunnel scenes. In such scenes, owing to the low light conditions and fast camera movement, the images are severely degraded by motion blur. These degraded images limit the performance of the standard stereo vision algorithms. To address this issue, in this paper, we combine the stereo vision with the image deblurring algorithms to improve the disparity result. The proposed algorithm consists of three phases: the PSF estimation phase; the image restoration phase; and the stereo vision phase. In the PSF estimation phase, we introduce three methods to estimate the blur kernel, which are optical flow based algorithm, cepstrum base algorithm and simple constant kernel algorithm, respectively. In the image restoration phase, we propose a fast non-blind image deblurring algorithm to recover the latent image. In the last phase, we propose a multi-scale multi-path Viterbi algorithm to compute the disparity given the deblurred images. The advantages of the proposed algorithm are demonstrated by the experiments with data sequences acquired in the tunnel.
Download

Paper Nr: 87
Title:

Matching of Line Segment for Stereo Computation

Authors:

O. Martorell, A. Buades and B. Coll

Abstract: A stereo algorithm based on the matching of line segments between two images is proposed. We extract several characteristics of the segments which permit its matching across the two images. A depth ordering computed from the line segments of the reference image allows us to attribute the match disparity to the correct pixels. This depth sketch is computed by joining close line segments and identifying T-junctions and convexity points. The disparity computed for segments is then extrapolated to the rest of the image by means of a diffusion process. The performance of the proposed algorithm is illustrated by applying the procedure to synthetic stereo pairs.
Download

Paper Nr: 93
Title:

LiDAR-based 2D Localization and Mapping System using Elliptical Distance Correction Models for UAV Wind Turbine Blade Inspection

Authors:

Ivan Nikolov and Claus Madsen

Abstract: The wind energy sector faces a constant need for annual inspections of wind turbine blades for damage, erosion and cracks. These inspections are an important part of the wind turbine life cycle and can be very costly and hazardous to specialists. This has led to the use of automated drone inspections and the need for accurate, robust and inexpensive systems for localization of drones relative to the wing. Due to the lack of visual and geometrical features on the wind turbine blade, conventional SLAM algorithms have a limited use. We propose a cost-effective, easy to implement and extend system for on-site outdoor localization and mapping in low feature environment using the inexpensive RPLIDAR and an 9-DOF IMU. Our algorithm geometrically simplifies the wind turbine blade 2D cross-section to an elliptical model and uses it for distance and shape correction. We show that the proposed algorithm gives localization error between 1 and 20 cm depending on the position of the LiDAR compared to the blade and a maximum mapping error of 4 cm at distances between 1.5 and 3 meters from the blade. These results are satisfactory for positioning and capturing the overall shape of the blade.
Download

Paper Nr: 94
Title:

Gait Recognition with Compact Lidar Sensors

Authors:

Bence Gálai and Csaba Benedek

Abstract: In this paper, we present a comparative study on gait and activity analysis using LiDAR scanners with different resolution. Previous studies showed that gait recognition methods based on the point clouds of a Velodyne HDL-64E Rotating Multi-Beam LiDAR can be used for people re-identification in outdoor surveillance scenarios. However, the high cost and the weight of that sensor means a bottleneck for its wide application in surveillance systems. The contribution of this paper is to show that the proposed Lidar-based Gait Energy Image descriptor can be efficiently adopted to the measurements of the compact and significantly cheaper Velodyne VLP-16 LiDAR scanner, which produces point clouds with a nearly four times lower vertical resolution than HDL-64. On the other hand, due to the sparsity of the data, the VLP-16 sensor proves to be less efficient for the purpose of activity recognition, if the events are mainly characterized by fine hand movements. The evaluation is performed on five tests scenarios with multiple walking pedestrians, which have been recorded by both sensors in parallel.
Download

Paper Nr: 97
Title:

Explicit Image Quality Detection Rules for Functional Safety in Computer Vision

Authors:

Johann Thor Mogensen Ingibergsson, Dirk Kraft and Ulrik Pagh Schultz

Abstract: Computer vision has applications in a wide range of areas from surveillance to safety-critical control of autonomous robots. Despite the potentially critical nature of the applications and a continuous progress, the focus on safety in relation to compliance with standards has been limited. As an example, field robots are typically dependent on a reliable perception system to sense and react to a highly dynamic environment. The perception system thus introduces significant complexity into the safety-critical path of the robotic system. This complexity is often argued to increase safety by improving performance; however, the safety claims are not supported by compliance with any standards. In this paper, we present rules that enable low-level detection of quality problems in images and demonstrate their applicability on an agricultural image database. We hypothesise that low-level and primitive image analysis driven by explicit rules facilitates complying with safety standards, which improves the real-world applicability of existing proposed solutions. The rules are simple independent image analysis operations focused on determining the quality and usability of an image.
Download

Paper Nr: 99
Title:

Simultaneous Camera Calibration and Temporal Alignment of 2D and 3D Trajectories

Authors:

Joni Herttuainen, Tuomas Eerola, Lasse Lensu and Heikki Kälviäinen

Abstract: In this paper, we present an automatic method that given the 2D and 3D motion trajectories recorded with a camera and 3D sensor, automatically calibrates the camera with respect to the 3D sensor coordinates and aligns the trajectories with respect to time. The method utilizes a modified Random Sample Consensus (RANSAC) procedure that iteratively selects two points from both trajectories, uses them to calculate the scale and translation parameters for the temporal alignment, computes point correspondences, and estimates the camera matrix. We demonstrate the approach with a setup consisting of a standard web camera and Leap Motion sensor. We further propose necessary object tracking and trajectory filtering procedures to produce proper trajectories with the setup. The result showed that the proposed method achieves over 96% success rate with a test set of complex trajectories.
Download

Paper Nr: 122
Title:

Optical Flow Refinement using Reliable Flow Propagation

Authors:

Tan Khoa Mai, Michèle Gouiffes and Samia Bouchafa

Abstract: This paper shows how to improve optical flow estimation by considering a neighborhood consensus strategy along with a reliable flow propagation method. Propagation takes advantages of reliability measures that are available from local low level image features. In this paper, we focus on color but our method could be easily generalized by considering also texture or gradient features. We investigate the conditions of estimating accurate optical flow and managing correctly flow discontinuities by proposing a variant of the well-known Kanade-Lucas-Tomasi (KLT) approach. Starting from this classical approach, a consensual flow is estimated locally while two additional criteria are proposed to evaluate its reliability. Propagation of reliable flow throughout the image is then performed using a specific distance criterion based on color and proximity. Experiments are conducted within the Middlebury database and show better results than classic KLT and even global methods like the well known Horn and Schunck or Black and Anandan approaches.
Download

Paper Nr: 151
Title:

Multiple Target, Multiple Type Visual Tracking using a Tri-GM-PHD Filter

Authors:

Nathanael L. Baisa and Andrew Wallace

Abstract: We propose a new framework that extends the standard Probability Hypothesis Density (PHD) filter for multiple targets having three different types, taking into account not only background false positives (clutter), but also confusion between detections of different target types, which are in general different in character from background clutter. Our framework extends the existing Gaussian Mixture (GM) implementation of the PHD filter to create a tri-GM-PHD filter based on Random Finite Set (RFS) theory. The methodology is applied to real video sequences containing three types of multiple targets in the same scene, two football teams and a referee, using separate detections. Subsequently, Munkres’s variant of the Hungarian assignment algorithm is used to associate tracked target identities between frames. This approach is evaluated and compared to both raw detections and independent GM-PHD filters using the Optimal Sub-pattern Assignment (OSPA) metric and discrimination rate. This shows the improved performance of our strategy on real video sequences.
Download

Paper Nr: 214
Title:

Moving Object Detection by Connected Component Labeling of Point Cloud Registration Outliers on the GPU

Authors:

Michael Korn, Daniel Sanders and Josef Pauli

Abstract: Using a depth camera, the KinectFusion with Moving Objects Tracking (KinFu MOT) algorithm permits tracking the camera poses and building a dense 3D reconstruction of the environment which can also contain moving objects. The GPU processing pipeline allows this simultaneously and in real-time. During the reconstruction, yet untraced moving objects are detected and new models are initialized. The original approach to detect unknown moving objects is not very precise and may include wrong vertices. This paper describes an improvement of the detection based on connected component labeling (CCL) on the GPU. To achieve this, three CCL algorithms are compared. Afterwards, the migration into KinFu MOT is described. It incorporates the 3D structure of the scene and three plausibility criteria refine the detection. In addition, potential benefits on the CCL runtime of CUDA Dynamic Parallelism and of skipping termination condition checks are investigated. Finally, the enhancement of the detection performance and the reduction of response time and computational effort is shown.
Download

Paper Nr: 257
Title:

Recovering 3D Structure of Multilayer Transparent Objects from Multi-view Ray Tracing

Authors:

Atsunori Maeda, Fumihiko Sakaue and Jun Sato

Abstract: 3D reconstruction of object shape is one of the most important problems in the field of computer vision. Although many methods have been proposed up to now, the 3D reconstruction of transparent objects is still a very difficult unsolved problem. In particular, if the transparent objects have multiple layers with different refraction properties, the recovery of the 3D structure of transparent objects is quite difficult. In this paper, we propose a method for recovering the 3D structure of multilayer transparent objects. For this objective we introduce a new representation of 3D space by using a boxel with refraction properties, and recovering the refraction properties of each boxel by using the ray tracing. The efficiency of the proposed method is shown by some preliminary experiments.
Download

Paper Nr: 10
Title:

Practical Scheduling of Computer Vision Functions

Authors:

Adrien Chan-Hon-Tong and Stephane Herbin

Abstract: Plug and play scheduler adapted to computer vision context could boost the development of robotic platform embedding large variety of computer vision functions. In this paper, we make a step toward such scheduler by offering a framework, particularly adapted to time constraint image classification. The relevancy of our framework is established by experimentations on real life computer vision datasets and scenarios.
Download

Paper Nr: 39
Title:

Collaborative Contributions for Better Annotations

Authors:

Priyam Bakliwal, Guruprasad M. Hegde and C. V. Jawahar

Abstract: We propose an active learning based solution for efficient, scalable and accurate annotations of objects in video sequences. Recent computer vision solutions use machine learning. Effectiveness of these solutions relies on the amount of available annotated data which again depends on the generation of huge amount of accurately annotated data. In this paper, we focus on reducing the human annotation efforts with simultaneous increase in tracking accuracy to get precise, tight bounding boxes around an object of interest. We use a novel combination of two different tracking algorithms to track an object in the whole video sequence. We propose a sampling strategy to sample the most informative frame which is given for human annotation. This newly annotated frame is used to update the previous annotations. Thus, by collaborative efforts of both human and the system we obtain accurate annotations with minimal effort. Using the proposed method, user efforts can be reduced to half without compromising on the annotation accuracy. We have quantitatively and qualitatively validated the results on eight different datasets.
Download

Paper Nr: 48
Title:

Regularised Energy Model for Robust Monocular Ego-motion Estimation

Authors:

Hsiang-Jen Chien and Reinhard Klette

Abstract: For two decades, ego-motion estimation is an actively developing topic in computer vision and robotics. The principle of existing motion estimation techniques relies on the minimisation of an energy function based on re-projection errors. In this paper we augment such an energy function by introducing an epipolar-geometry-derived regularisation term. The experiments prove that, by taking soft constraints into account, a more reliable motion estimation is achieved. It also shows that the implementation presented in this paper is able to achieve a remarkable accuracy comparative to the stereo vision approaches, with an overall drift maintained under 2% over hundreds of metres.
Download

Paper Nr: 67
Title:

Parallelized Flight Path Prediction using a Graphics Processing Unit

Authors:

Maximilian Götzinger, Martin Pongratz, Amir M. Rahmani and Axel Jantsch

Abstract: Summarized under the term Transport-by-Throwing, robotic arms throwing objects to each other are a visionary system intended to complement the conventional, static conveyor belt. Despite much research and many novel approaches, no fully satisfactory solution to catch a ball with a robotic arm has been developed so far. A new approach based on memorized trajectories is currently being researched. This paper presents an algorithm for real-time image processing and flight prediction. Object detection and flight path prediction can be done fast enough for visual input data with a frame rate of 130 FPS (frames per second). Our experiments show that the average execution time for all necessary calculations on an NVidia GTX 560 TI platform is less than 7.7ms. The maximum times of up to 11.7ms require a small buffer for frame rates over 85 FPS. The results demonstrate that the use of a GPU (Graphics Processing Unit) considerably accelerates the entire procedure and can lead to execution rates of 3.5 to 7.2 faster than on a CPU. Prediction, which was the main focus of this research, is accelerated by a factor of 9.5 by executing the devised parallel algorithm on a GPU. Based on these results, further research could be carried out to examine the prediction system’s reliability and limitations (compare (Pongratz, 2016)).
Download

Paper Nr: 133
Title:

A Multi Patch Warping Approach for Improved Stereo Block Matching

Authors:

Mircea Paul Muresan, Sergiu Nedevschi and Radu Danescu

Abstract: Stereo cameras are a suitable solution for reconstructing the 3D information of the observed scenes, and, because of their low price and ease to set up and operate, they can be used in a wide area of applications, ranging from autonomous driving to advanced driver assistance systems or robotics. Due to the high quality of the results, energy based reconstruction methods like semi global matching have gained a lot of popularity in recent years. The disadvantages of semi global matching are the large memory footprint and the high computational complexity. In contrast, window based matching methods have a lower complexity, and are leaner with respect to the memory consumption. The downside of block matching methods is that they are more error prone, especially on surfaces which are not parallel to the image plane. In this paper we present a novel block matching scheme that improves the quality of local stereo correspondence algorithms. The first contribution of the paper consists in an original method for reliably reconstructing the environment on slanted surfaces. The second contribution consists in the creation of set of local constraints that filter out possible outlier disparity values. The third and final contribution consists in the creation of a refinement technique which improves the resulted disparity map. The proposed stereo correspondence approach has been validated on the KITTI stereo dataset.
Download

Paper Nr: 156
Title:

Sampling Density Criterion for Circular Structured Light 3D Imaging

Authors:

Deokwoo Lee and Hamid Krim

Abstract: 3D reconstruction work has chiefly focused on the accuracy of reconstruction results in computer vision, and efficient 3D functional camera system has been of interest in the field of mobile camera as well. The optimal sampling density, referred to as the minimum sampling rate for 3D or high-dimensional signal reconstruction, is proposed in this paper. There have been many research activities to develop an adaptive sampling theorem beyond the Shannon-Nyquist Sampling Theorem in the areas of signal processing, but sampling theorem for 3D imaging or reconstruction is an open challenging topic and crucial part of our contribution in this paper. We hence propose an approach to sampling rate (lower / upper bound) determination to recover 3D objects (surfaces) represented by a set of circular light patterns, and the criterion for a sampling rate is formulated using geometric characteristics of the light patterns overlaid on the surface. The proposed method is in a sense a foundation for a sampling theorem applied to 3D image processing, by establishing a relationship between frequency components and geometric information of a surface.
Download

Paper Nr: 158
Title:

InLiDa: A 3D Lidar Dataset for People Detection and Tracking in Indoor Environments

Authors:

Cristina Romero-González, Álvaro Villena, Daniel González-Medina, Jesus Martínez-Gómez, Luis Rodríguez-Ruiz and Ismael García-Varea

Abstract: The objective evaluation of people detectors and trackers is essential to develop high performance and general purpose solutions to these problems. This evaluation can be easily done thanks to the use of annotated datasets, but there are some combinations of sensors and scopes that have not been extensively explored. Namely, the application of large range 3D sensors in indoor environments for people detection purposes has been sparsely studied. To fill this gap, we propose InLiDa, a dataset that consists of six different sequences acquired in two different large indoor environments. The dataset is released with a set of tools valid for its use as benchmark for people detection and tracking proposals. Also baseline results obtained with state-of-the-art techniques for people detection and tracking are presented
Download

Paper Nr: 175
Title:

Multi Target Tracking by Linking Tracklets with a Convolutional Neural Network

Authors:

Yosra Dorai, Frederic Chausse, Sami Gazzah and Najoua Essoukri Ben Amara

Abstract: The computer vision community has developed many multi-object tracking methods in various fields. The focus is put on traffic scenes and video-surveillance applications where tracking object features are challenging. Indeed, in these particular applications, objects can be partially or totally occluded and can appear differently. Usual detection methods generally fail to leverage those limitations. To deal with this, a framework for multi-object tracking based on the linking of tracklets (mini-trajectories) is proposed. Despite the number of errors (false positives or missing detections) made by the Faster R-CNN detector, short-term Faster R-CNN detection similarities are tracked. The goal is to get tracklets in a given number of frames. We suggest to associate tracklets and apply an update function to correct the trajectories. The experiments show that on the one hand, our approach outperforms the detector to find the undetected objects. And on the other hand, the developed method eliminates the false positives and shows the effectiveness of tracking.
Download

Paper Nr: 228
Title:

Pedestrian Tracking using a Generalized Potential Field Approach

Authors:

Florian Particke, Lucila Patiño-Studencki, Jörn Thielecke and Christian Feist

Abstract: Mobile robots and autonomous driving cars operate in a shared environment with pedestrians. In order to avoid accidents, it is important to track and predict human trajectories in an optimal way. In this paper, a generalized potential field approach for characterizing pedestrian movements is proposed which goes beyond the well-known social force model. Its goal is to give a generalized architecture for improving the tracking accuracy of pedestrians in surveillance situations. In comparison to other fusion approaches, the number of proposed parameters is reduced and the parameters can be intuitively understood. For a simple scenario, in a forum the trajectories of pedestrians are predicted for a configured parameter set. For this purpose, the proposed model is used. The predicted trajectories are compared to the real trajectories of the pedestrians. First results regarding the accuracy of the approach are presented.
Download

Paper Nr: 253
Title:

Quantitative Comparison of Affine Invariant Feature Matching

Authors:

Zoltán Pusztai and Levente Hajder

Abstract: It is a key problem in computer vision to apply accurate feature matchers between images. Thus the comparison of such matchers is essential. There are several survey papers in the field, this study extends one of those: the aim of this paper is to compare competitive techniques on the ground truth (GT) data generated by our structured-light 3D scanner with a rotating table. The discussed quantitative comparison is based on real images of six rotating 3D objects. The rival detectors in the comparison are as follows: Harris-Laplace, Hessian-Laplace, Harris-Affine, Hessian-Affine, IBR, EBR, SURF, and MSER.
Download