VISAPP 2016 Abstracts


Area 1 - Image Formation and Preprocessing

Full Papers
Paper Nr: 31
Title:

Cornea-reflection-based Extrinsic Camera Calibration without a Direct View

Authors:

Kosuke Takahashi, Dan Mikami, Mariko Isogawa and Akira Kojima

Abstract: In this paper, we propose a novel method to extrinsically calibrate a camera to a 3D reference object that is not directly visible from the camera. We use the spherical human cornea as a mirror and calibrate the extrinsic parameters from the reflections of the reference points. The main contribution of this paper is to present a cornea-reflection-based calibration algorithm with minimal configuration; there are five reference points on a single plane and one mirror pose. In this paper, we derive a linear equation and obtain a closed-form solution of extrinsic calibration by introducing two key ideas. The first is to model the cornea as a virtual sphere, which enables us to estimate the center of the cornea sphere from its projection. The second idea is to use basis vectors to represent the position of the reference points, which enables us to deal with 3D information of reference points compactly. Besides, in order to make our method robust to observation noise, we minimize the reprojection error while maintaining the valid 3D geometry of the solution based on the derived linear equation. We demonstrate the advantages of the proposed method with qualitative and quantitative evaluations using synthesized and real data.
Download

Paper Nr: 84
Title:

High-dimensional Guided Image Filtering

Authors:

Shu Fujita and Norishige Fukushima

Abstract: We present high-dimensional filtering for extending guided image filtering. Guided image filtering is one of edge-preserving filtering, and the computational time is constant to the size of the filtering kernel. The constant time property is essential for edge-preserving filtering. When the kernel radius is large, however, the guided image filtering suffers from noises because of violating a local linear model that is the key assumption in the guided image filtering. Unexpected noises and complex textures often violate the local linear model. Therefore, we propose high-dimensional guided image filtering to avoid the problems. Our experimental results show that our high-dimensional guided image filtering can work robustly and efficiently for various image processing.
Download

Paper Nr: 102
Title:

Guided Filtering using Reflected IR Image for Improving Quality of Depth Image

Authors:

Takahiro Hasegawa, Ryoji Tomizawa, Yuji Yamauchi, Takayoshi Yamashita and Hironobu Fujiyoshi

Abstract: We propose the use of a reflected IR image as a guide image to improve the quality of depth image. Guided filtering is a technique that can quickly remove noise from a depth image by using a guide image. However, when an RGB image is used as a guide image, the quality of depth image does not be improved if the RGB image contains texture information (such as surface patterns and shadows). In this study, our aim is to obtain a depth image of higher quality by using a guide image derived from a reflected IR image, which have less texture information and a high correlation with depth image. Using reflected IR image, it is possible to perform filtering while retaining edge information between objects of different materials, without being affected by textures on the surfaces of these objects. In evaluation experiments, we confirmed that a guide image based on reflected IR image produce better denoising effects than RGB guide image. From the results of upsampling tests, we also confirmed that the proposed IR based guided filtering has a higher PSNR than that of RGB image.
Download

Paper Nr: 145
Title:

Super-resolution based on Edge-aware Sparse Representation Via Multiple Dictionaries

Authors:

Muhammad Haris and Hajime Nobuhara

Abstract: In this paper, we propose a new edge-aware super-resolution algorithm based on sparse representation via multiple dictionaries. The algorithm creates multiple pairs of dictionaries based on selective sparse representation. The dictionaries are clustered based on the edge orientation that categorized into 5 clusters: 0, 45, 90, 135, and non-direction. The proposed method is conceivably able to reduce blurring, blocking, and ringing artifacts in edge areas, compared with other methods. The experiment uses 900 natural grayscale images taken from USC SIPI Database. It is confirmed that our proposed method is better than current state-of-the-art algorithms. To amplify the evaluation, we use four evaluation indexes: higher peak signal-to-noise ratio (PSNR), structural similarity (SSIM), feature similarity (FSIM) index, and time. On 3x magnification experiment, our proposed method has the highest value for all evaluation compare to other methods by 11%, 14%, 6% in terms of PSNR, SSIM, and FSIM respectively. It is also proven that our proposed method has shorter execution time compare to other methods.
Download

Paper Nr: 182
Title:

Affine Invariant Self-similarity for Exemplar-based Inpainting

Authors:

Vadim Fedorov, Pablo Arias, Gabriele Facciolo and Coloma Ballester

Abstract: This paper presents a new method for exemplar-based image inpainting using transformed patches. We build upon a recent affine invariant self-similarity measure which automatically transforms patches to compare them in an appropriate manner. As a consequence, it intrinsically extends the set of available source patches to copy information from. When comparing two patches, instead of searching for the appropriate patch transformation in a highly dimensional parameter space, our approach allows us to determine a single transformation from the texture content in both patches. We incorporate the affine invariant similarity measure in a variational formulation for inpainting and present an algorithm together with experimental results illustrating this approach.
Download

Paper Nr: 219
Title:

VOPT: Robust Visual Odometry by Simultaneous Feature Matching and Camera Calibration

Authors:

Rafael F. V. Saracchini, Carlos Catalina, Rodrigo Minetto and Jorge Stolfi

Abstract: In this paper we describe VOPT, a robust algorithm for visual odometry. It tracks features of the environment with known position in space, which can be acquired through monocular or RGBD SLAM mapping algorithms. The main idea of VOPT is to jointly optimize the matching of feature projections on successive frames, the camera’s extrinsic matrix, the photometric correction parameters, and the weight of each feature at the same time, by a multi-scale iterative procedure. VOPT uses GPU acceleration to achieve real-time performance, and includes robust procedures for automatic initialization and recovery, without user intervention. Our tests show that VOPT outperforms the PTAMM algorithm in challenging videos available publicly.
Download

Paper Nr: 242
Title:

Non-local Means using Adaptive Weight Thresholding

Authors:

Asif Khan and Mahmoud R. El-Sakka

Abstract: Non-local means (NLM) is a popular image denoising scheme for reducing additive Gaussian noise. It uses a patch-based approach to find similar regions within a search neighborhood and estimates the denoised pixel based on the weighted average of all pixels in the neighborhood. All weights are considered for averaging, irrespective of the value of the weights. This paper proposes an improved variant of the original NLM scheme by thresholding the weights of the pixels within the search neighborhood, where the thresholded weights are used in the averaging step. The threshold value is adapted based on the noise level of a given image. The proposed method is used as a two-step approach for image denoising. In the first step the proposed method is applied to generate a basic estimate of the denoised image. The second step applies the proposed method once more but with different smoothing strength. Experiments show that the denoising performance of the proposed method is better than that of the original NLM scheme, and its variants. It also outperforms the state-of-the-art image denoising scheme, BM3D, but only at low noise levels (sigma <= 80).
Download

Short Papers
Paper Nr: 5
Title:

High-level Performance Evaluation of Object Detection based on Massively Parallel Focal-plane Acceleration Requiring Minimum Pixel Area Overhead

Authors:

Eloy Parra-Barrero, Jorge Fernández-Berni, Fernanda D. V. R. Oliveira, Ricardo Carmona-Galán and Ángel Rodríguez-Vázquez

Abstract: Smart CMOS image sensors can leverage the inherent data-level parallelism and regular computational flow of early vision by incorporating elementary processors at pixel level. However, it comes at the cost of extra area having a strong impact on the sensor sensitivity, resolution and image quality. In this scenario, the fundamental challenge is to devise new strategies capable of boosting the performance of the targeted vision pipeline while minimally affecting the sensing function itself. Such strategies must also feature enough flexibility to accommodate particular application requirements. From these high-level specifications, we propose a focal-plane processing architecture tailored to speed up object detection via the Viola-Jones algorithm. This architecture is supported by only two extra transistors per pixel and simple peripheral digital circuitry that jointly make up a massively parallel reconfigurable processing lattice. A performance evaluation of the proposed scheme in terms of accuracy and acceleration for face detection is reported.
Download

Paper Nr: 16
Title:

Iterative Fuzzy Segmentation for Breast Skin-line Detection

Authors:

Asma Touil and Karim Kalti

Abstract: In mammographic images, extracting different anatomical structures and tissues types is critical requirement for the breast cancer diagnosis. For instance, separating breast and background regions increases the accuracy and efficiency of mammographic processing algorithms. In this paper, we propose a new region-based method to properly segment breast and background regions in mammographic images. These regions are estimated by an Iterative Fuzzy Breast Segmentation method (IFBS). Based on the Fuzzy C-Means (FCM) algorithm, IFBS method iteratively increases the precision of an initially extracted breast region. This proposal is evaluated using the MIAS database. Experimental results show high accuracy and reliability in breast extraction when compared with Ground-Truth (GT) images.

Paper Nr: 61
Title:

Angular Uncertainty Refinement and Image Reconstruction Improvement in Cryo-electron Tomography

Authors:

Hmida Rojbani, Étienne Baudrier, Benoît Naegel, Loïc Mazo and Atef Hamouda

Abstract: In the field of cryo-electron tomography (cryo-ET), numerous approaches have been proposed to tackle the difficulties of the three-dimensional reconstruction problem. And that, in order to cope with (1) the missing and noisy data from the collected projections, (2) errors in projection images due to acquisition problems, (3) the capacity of processing large data sets and parameterizing the contrast function of the electron microscopy. In this paper, we present a novel approach for dealing with angular uncertainty in cryo-ET. To accomplish this task we propose a cost function and with the use of the nonlinear version of the optimization algorithm called Conjugate Gradient, we minimize it. We test the efficiency of our algorithm with both simulated and real data.
Download

Paper Nr: 66
Title:

Denoising 3D Computed Tomography Images using New Modified Coherence Enhancing Diffusion Model

Authors:

Feriel Romdhane, Faouzi Benzarti and Hamid Amiri

Abstract: The denoising step for Computed Tomography (CT) images is an important challenge in the medical image processing. These images are degraded by low resolution and noise. In this paper, we propose a new method for 3D CT denoising based on Coherence Enhancing Diffusion model. Quantitative measures such as PSNR, SSIM and RMSE are computed to a phantom CT image in order to improve the efficiently of our proposed model, compared to a number of denoising algorithms. Furthermore, experimental results on a real 3D CT data show that this approach is effective and promising in removing noise and preserving details.
Download

Paper Nr: 93
Title:

Stitching Grid-wise Atomic Force Microscope Images

Authors:

Mathias Vestergaard, Stefan Bengtson, Malte Pedersen, Christian Rankl and Thomas B. Moeslund

Abstract: Atomic Force Microscopes (AFM) are able to capture images with a resolution in the nano metre scale. Due to this high resolution, the covered area per image is relatively small, which can be problematic when surveying a sample. A system able to stitch AFM images has been developed to solve this problem. The images exhibit tilt, offset and scanner bow, which are counteracted by subtracting a polynomial from each line. To be able to stitch the images properly template selection is done by analyzing texture and using a voting scheme. Grids of 3x3 images have been successfully leveled and stitched.
Download

Paper Nr: 110
Title:

Variable Exposure Time Imaging for Obtaining HDR Images

Authors:

Saori Uda, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a novel imaging method called variable exposure time imaging for obtaining HDR images from single image capture. In this method, we control exposure time pixel by pixel. Thus, each pixel in an image taken by this imaging method is obtained under different exposure time. We call this image variable exposure image. By using the variable exposure image, we can synthesize a high dynamic range image efficiently, since we can optimize the exposure time pixel by pixel according to the input intensity at each pixel. Experimental results from the proposed method show the efficiency of the proposed imaging method.
Download

Paper Nr: 116
Title:

Virtual Correction of Eyesight using Visual Illusions

Authors:

Midori Aoki, Fumihiko Sakaue and Jun Sato

Abstract: Degradation of eyesight is a serious problem, and the number of weak-sighted people is increasing rapidly in recent years because of the spread of tablets and smart phones. The weak-sighted people often wear glasses and contact lenses for recovering their eyesight. However, these rectification devises are painful for weak-sighted people. Thus, in this paper, we propose a novel method for displaying visual information for weak-sighted people to see rectified images on displays. In particular, we show that visual illusions in human vision system can be used efficiently for correcting the eyesight. By using our method, weak-sighted people can see clear images on the display without wearing glasses and contact lenses. The efficiency of the proposed method is tested by using synthetic signals and real images.
Download

Paper Nr: 141
Title:

Detection of Abnormal Gait from Skeleton Data

Authors:

Meng Meng, Hassen Drira, Mohamed Daoudi and Jacques Boonaert

Abstract: Human gait analysis has becomes of special interest to computer vision community in recent years. The recently developed commodity depth sensors bring new opportunities in this domain.In this paper, we study the human gait using non intrusive sensors (Kinect 2) in order to classify normal human gait and abnormal ones. We propose the evolution of inter-joints distances as spatio temporal intrinsic feature that have the advantage to be robust to location. We achieve 98% success to classify normal and abnormal gaits and show some relevant features that are able to distinguish them.
Download

Paper Nr: 147
Title:

Parameter Estimation for HOSVD-based Approximation of Temporally Coherent Mesh Sequences

Authors:

Michał Romaszewski and Przemysław Głomb

Abstract: This paper is focused on the problem of parameter selection for approximation of animated 3D meshes (Temporally Coherent Mesh Sequences, TCMS) using Higher Order Singular Value Decomposition (HOSVD). The main application of this approximation is data compression. Traditionally, the approximation was done using matrix decomposition, but recently proposed tensor methods (e.g. HOSVD) promise to be more effective. However, the parameter selection for tensor-based methods is more complex and difficult than for matrix decomposition. We focus on the key parameter, the value of N-rank, which has major impact on data reduction rate and approximation error. We present the effect of N-rank choice on approximation performance in the form of rate-distortion curve. We show how to quickly create this curve by estimating the reconstruction error resulting from the N-rank approximation of TCMS data. We also inspect the reliability of created estimator. Application of proposed method improves performance of practical application of HOSVD for TCMS approximation.
Download

Paper Nr: 166
Title:

ZMP Trajectory from Human Body Locomotion Dynamics Evaluated by Kinect-based Motion Capture System

Authors:

Igor Danilov, Bulat Gabbasov, Ilya Afanasyev and Evgeni Magid

Abstract: This article presents the methods of zero moment point (ZMP) trajectory evaluation for human locomotion by processing biomechanical data recorded with Kinect-based motion capture (MoCap) system. Our MoCap system consists of four Kinect 2 sensors, using commercial iPi soft markerless tracking and visualization technology. We apply iPi Mocap Studio software to multi-depth sensor video recordings, acquiring visual and biomechanical human gait data, including linear and angular coordinates, velocities, accelerations and center of mass (CoM) position of each joint. Finally, we compute ZMP and ground projection of the CoM (GCOM) trajectories from human body dynamics in MATLAB by two methods, where human body is treated as (1) a single mass point, and (2) multiple mass points (with following ZMP calculation via inertia tensor). The further objective of our research is to reproduce the human-like gait with Russian biped robot AR-601M.
Download

Paper Nr: 168
Title:

A New Efficient Robustness Evaluation Approach for Video Watermarking based on Crowdsourcing

Authors:

Asma Kerbiche, Saoussen Ben Jabra, Ezzeddine Zagrouba and Vincent Charvillat

Abstract: Signature robustness is the most important criteria that must verify a watermarking approach. However, existing watermarking evaluation protocols always tested simple attacks like rotation, cropping, and compression but did not consider many dangerous attacks such as camcording which is more and more used for videos. In this paper, a new robustness evaluation approach for video watermarking is proposed. It is based on on-line attack’s game using crowdsourcing technique. In fact, the proposed game is provided to different users who will try to destruct an embedded signature by applying one or many combined attacks on a given marked video. Switch the choice of the users, the most important attacks can be selected. In more, users must not destroy the visual quality of the marked video to evaluate the tested watermarking approach. Experimental results show that the proposed approach permits to evaluate efficiently the robustness of any video watermarking. In addition, obtained results verify that camcording attack is very important in video watermarking evaluation process.
Download

Paper Nr: 230
Title:

Performance of Interest Point Descriptors on Hyperspectral Images

Authors:

Przemysław Głomb and Michał Cholewa

Abstract: Interest point descriptors (e.g. Scale Invariant Feature Transform, SIFT or Speeded-Up Robust Features, SURF) are often used both for classic image processing tasks (e.g. mosaic generation) or higher level machine learning tasks (e.g. segmentation or classification). Hyperspectral images are recently gaining popularity as a potent data source for scene analysis, material identification, anomaly detection or process state estimation. The structure of hyperspectral images is much more complex than traditional color or monochrome images, as they comprise of a large number of bands, each corresponding to a narrow range of frequencies. Because of varying image properties across bands, the application of interest point descriptors to them is not straightforward. To the best of our knowledge, there has been, to date, no study of performance of interest point descriptors on hyperspectral images that simultaneously integrate a number of methods and use a dataset with significant geometric transformations. Here, we study four popular methods (SIFT, SURF, BRISK, ORB) applied to complex scene recorded from several viewpoints. We presents experimental results by observing how well the methods estimate the 3D cameras’ positions, which we propose as a general performance measure.
Download

Paper Nr: 247
Title:

Spatio-temporal Center-symmetric Local Derivative Patterns for Objects Detection in Video Surveillance

Authors:

Marwa Jmal, Wided Souidene and Rabah Attia

Abstract: Nowadays, more attention is being focused on background subtraction methods regarding their importance in many computer vision applications. Most of the proposed approaches are classified as pixel-based due to their low complexity and processing speed. Other methods are considered as spatiotemporal-based as they consider the surroundings of each analyzed pixel. In this context, we propose a new texture descriptor that is suitable for this task. We benefit from the advantages of local binary patterns variants to introduce a novel spatio-temporal center-symmetric local derivative patterns (STCS-LDP). Several improvements and restrictions are set in the neighboring pixels comparison level, to make the descriptor less sensitive to noise while maintaining robustness to illumination changes. We also present a simple background subtraction algorithm which is based on our STCS-LDP descriptor. Experiments on multiple video sequences proved that our method is efficient and produces comparable results to the state of the art.
Download

Paper Nr: 86
Title:

Gender Recognition using Hog with Maximized Inter-Class Difference

Authors:

M. E. Yildirim, O. F. Ince, Y. B. Salman, J. Kwan Song, J. Sik Park and B. Woo Yoon

Abstract: Several methods and features have been proposed for gender recognition problem. Histogram of oriented gradients (Hog) is a widely used feature in image processing. This study proposes a gender recognition method using full body features. Human body from side and front view were represented by Hog. Using all bins in the histogram requires longer time for training. In order to decrease the computation time, descriptor size should be decreased. Inter-class difference was obtained as a vector and sorted in a descending order. The bins with the largest value were selected among this vector. Random forest and Adaboost methods were used for the recognition. As a result of both tests, the classifier using first 100 bins with maximum difference gives the optimum performance in terms of accuracy rate and computation time. Although Adaboost performed faster, the accuracy of random forest is higher in full body gender recognition.
Download

Paper Nr: 164
Title:

A Novel Key Frame Extraction Approach for Video Summarization

Authors:

Hana Gharbi, Sahbi Bahroun and Ezzeddine Zagrouba

Abstract: Video summarization is a principal task in video analysis and indexing algorithms. In this paper we will present a new algorithm for video key frame extraction. This process is one of the basic procedures for video retrieval and summary. Our new approach is based on interest points description and repeatability measurement. Before key frame extraction, the video should be segmented into shots. Then, for each shot, we detect interest points in all images. After that, we calculate repeatability matrix for each shot. Finally, we apply PCA and HAC to extract key frames.
Download

Paper Nr: 165
Title:

VAR a New Metric of Cryo-electron Tomography Resolution

Authors:

Hmida Rojbani and Atef Hamouda

Abstract: Motivate by reaching a better understanding of the biological cells, scientists use the Transmission Electron Microscope (TEM) to investigate their inner structures. The cryo-electron tomography (cryo-ET) offers the possibility to reconstruct 3D structure reconstruction of a cell slice, that by tilting it according different angles. The resolution limits is the biggest challenge in the cryo-ET. The two phases involved in increasing the resolution are the acquisition phase and the reconstruction phase. In this work, we focus in the last one, as the biologists treat the acquisition phase within the phase of acquisition itself. The resolution of reconstruction depends on many factors such as: (1) the noisy and missing information from the collected projections data, (2) the capacity of processing large data sets, (3) the parametrization of the contrast function of the microscope, (4) errors of the tilt angles used in projections. In this paper, we presented a new method to evaluate the resolution of a reconstruction algorithm. Then the proposed method is used to show the relation between errors of the tilt angles used in projection and the degradation of the resolution. The resolution evaluation tests are made with different reconstruction methods (analytic and algebraic) applied on synthetic and real data.
Download

Paper Nr: 193
Title:

Focus-aid Signal for Ultra High Definition Cameras

Authors:

Seiichi Gohshi and Hidetoshi Ito

Abstract: 4K and 8K systems are very promising media and offer highly realistic images. Such high-resolution video systems provide completely different impressions than HDTVs. However, it is difficult, even for a professional cameraman, to adjust the 4K/8K camera focus using only the small viewfinder on a camera. Indeed, it is sometimes difficult even to focus an HDTV camera with such a small viewfinder, and since 4K has four times higher resolution than HDTV, it is almost impossible to adjust a small viewfinder with the same size as that of an HDTV camera using only human eyes. Therefore, in content-creating fields, large monitors are generally used to adjust the focus; however, large monitors are bulky and do not fit practical requirements, which means that technical assistance is required. A possible solution to this problem is to detect the sharp edges created by high-frequency elements in fine-focus images and superimpose those edges on the image; the cameraman can then adjust the focus with additional information gained from maximizing the superimposed edges. However, conventional edge detection technologies are vulnerable against noise, which means that practical situations using this technique are limited to environments with good lighting conditions. This paper introduces a novel signal processing method that enables cameramen to adjust a 4k camera focus using their eyes.
Download

Paper Nr: 216
Title:

Exploiting the Kinematic of the Trajectories of the Local Descriptors to Improve Human Action Recognition

Authors:

Adel Saleh, Miguel Angel Garcia, Farhan Akram, Mohamed Abdel-Nasser and Domenec Puig

Abstract: This paper presents a video representation that exploits the properties of the trajectories of local descriptors in human action videos. We use spatial-temporal information, which is led by trajectories to extract kinematic properties: tangent vector, normal vector, bi-normal vector and curvature. The results show that the proposed method provides comparable results compared to the state-of-the-art methods. In turn, it outperforms compared methods in terms of time complexity.
Download

Paper Nr: 222
Title:

Low Resolution Sparse Binary Face Patterns

Authors:

Swathikiran Sudhakaran and Alex Pappachen James

Abstract: Automated recognition of low resolution face images from thumbnails represent a challenging image recognition problem. We propose the sequential fusion of wavelet transform computation, local binary pattern and sparse coding of images to accurately extract facial features from thumbnail images. A minimum distance classifier with Shepard's similarity measure is used as the classifier. The proposed method shows robust recognition performance when tested on face datasets (Yale B, AR and PUT) when compared against benchmark techniques for very low resolution (i.e. less than 45x45 pixels) face image recognition. The possible applications of the proposed thumbnail recognition include contextual search, intelligent image/video sorting and groups, and face image clustering.
Download

Paper Nr: 229
Title:

Taxonomy of 3D Sensors - A Survey of State-of-the-Art Consumer 3D-Reconstruction Sensors and their Field of Applications

Authors:

Julius Schöning and Gunther Heidemann

Abstract: Sensors used for 3D-reconstruction determine both the quality of the results and the nature of reconstruction algorithms. The spectrum of such sensors ranges from expensive to low cost, from highly specialized to out-of- the-shelf, and from stereo to mono sensors. The list of available sensors has been growing steadily and is becoming difficult to manage, even in the consumer sector. We provide a survey of existing consumer 3D sensors and a taxonomy for their assessment. This taxonomy provides information about recent developments, application domains and functional criteria. The focus of this survey is on low cost 3D sensors at an accessible price. Prototypes developed in academia are also very interesting, but the price of such sensors can not easily be estimated. We try to provide an unbiased basis for decision-making for specific 3D sensors. In addition to the assessment of existing technologies, we provide a list of preferable features for 3D reconstruction sensors. We close with a discussion of common problems in available sensor systems and discuss common fields of application, as well as areas which could benefit from the application of such sensors.
Download

Paper Nr: 243
Title:

BM3D Image Denoising using Learning-based Adaptive Hard Thresholding

Authors:

Farhan Bashar and Mahmoud R. El-Sakka

Abstract: Block Matching and 3D Filtering (BM3D) is considered to be the current state-of-art algorithm for additive image denoising. But this algorithm uses a fixed hard threshold value to attenuate noise from a 3D block. Experiment shows that this fixed hard thresholding deteriorates the performance of BM3D because it does not consider the context of corresponding blocks. We propose a learning based adaptive hard thresholding method to solve this problem and found excellent improvement over the original BM3D. Also, BM3D algorithm requires as an input the value of noise level in the input image. But in real life it is not practical to pass as an input the noise level of an image to the algorithm. We also added noise level estimation method in our algorithm without degrading the performance. Experimental results demonstrate that our proposed algorithm outperforms BM3D in both objective and subjective fidelity criteria.
Download

Area 2 - Image and Video Analysis

Full Papers
Paper Nr: 4
Title:

On the Contribution of Saliency in Visual Tracking

Authors:

Iman Alikhani, Hamed R.-Tavakoli, Esa Rahtu and Jorma Laaksonen

Abstract: Visual target tracking is a long-standing problem in the domain of computer vision. There are numerous methods proposed over several years. A recent trend in visual tracking has been target representation and tracking using saliency models inspired by the attentive mechanism of the human. Motivated to investigate the usefulness of such target representation scheme, we study several target representation techniques for mean-shift tracking framework, where the feature space can include color, texture, saliency, and gradient orientation information. In particular, we study the usefulness of the joint distribution of color-texture, color-saliency, and color-orientation in comparison with the color distribution. The performance is evaluated using the visual object tracking (VOT) 2013 which provides a systematic mechanism and a database for the assessment of tracking algorithms. We summarize the results in terms of accuracy & robustness; and discuss the usefulness of saliency-based target tracking.
Download

Paper Nr: 123
Title:

Classification-driven Active Contour for Dress Segmentation

Authors:

Lixuan Yang, Helena Rodriguez, Michel Crucianu and Marin Ferecatu

Abstract: In this work we propose a a dedicated object extractor for dress segmentation in fashion images by combining local information with a prior learning. First, a person detector is applied to localize sites in the image that are likely to contain the object. Then, an intra-image two-stage learning process is developed to roughly separate foreground pixels from the background. Finally, the object is finely segmented by employing an active contour algorithm that takes into account the previous segmentation and injects specific knowledge about local curvature in the energy function. The method is validated on a database of manually segmented images. We show examples of both successful segmentation and difficult cases. We quantitatively analyze each component and compare with the well-known GrabCut foreground extraction method.
Download

Paper Nr: 125
Title:

Facial Asymmetry Assessment from 3D Shape Sequences: The Clinical Case of Facial Paralysis

Authors:

Paul Audain Desrosiers, Yasmine Bennis, Boulbaba Ben Amor, Mohamed Daoudi and Pierre Guerreschi

Abstract: This paper addresses the problem of quantifying the facial asymmetry from dynamic 3D data. We investigate here the role of 4D (i.e. 3D+time) data to reveal the amount of both static and dynamic asymmetry, in a clinical use-case of facial paralysis. The final goal is to provide tools and solutions to clinicians for facial paralysis assessment and monitoring, which can provide qualitative and quantitative evaluations. To this end, the approach proposed here consider 3D facial sequences and adopt a recently-developed Riemannian approach for facial deformation analysis. After a preprocessing step, each frame of a given 3D sequence is approximated by an indexed collection of elastic radial curves. Riemannian shape analysis of obtained curves and their symmetrical counterparts, both elements of the same shape space, give rise to a feature vector, called Dense Scalar Fields (DSFs). The use of these DSFs reveals the amount of bilateral asymmetry of the face, when conveying expressions. That is, given a 3D frame, it is first reflected with respect to the YZ-plane, then compared to the obtained reflection using the DSFs. To exemplify the use of the proposed approach, a new dataset have been collected (of patients) before and after injecting Botulinum Toxin (BT) in related facial muscles. Experimental results obtained on this dataset show that the proposed approach allows clinicians to evaluate the facial asymmetry before and after the medical treatment.
Download

Paper Nr: 138
Title:

Simultaneous Surface Segmentation and BRDF Estimation via Bayesian Methods

Authors:

Malte Lenoch, Thorsten Wilhelm and Christian Wöhler

Abstract: We present a novel procedure that achieves segmentation of an arbitrary surface relying on the maximum a-posteriori estimation of its reflectance parameters. The number of surfaces segments is computed by the algorithm without user intervention. We employ Markov Chain Monte Carlo algorithms to compute the probability distributions associated with the model parameters of a Blinn reflectance model based on the input images. The fact that parameters are treated as probability distributions enables us to directly draw additional information about the certainty of the estimation from the results of both parameters and segmentation borders. Reversible jump MCMC allows us to include an unspecified number of change points in the computation, such that the algorithm explores model and parameter space at the same time and derives a segmentation of the surface from the input data. To accomplish this, we extend the existing concept of change points to two dimensions introducing a number of necessary new regulations and properties. The performance of the segmentation and reflectance estimation is evaluated on a synthetic and a real-world dataset.
Download

Paper Nr: 149
Title:

Leveraging Gabor Phase for Face Identification in Controlled Scenarios

Authors:

Yang Zhong and Haibo Li

Abstract: Gabor features have been widely employed in solving face recognition problems in controlled scenarios. To construct discriminative face features from the complex Gabor space, the amplitude information is commonly preferred, while the other one — the phase — is not well utilized due to its spatial shift sensitivity. In this paper, we address the problem of face recognition in controlled scenarios. Our focus is on the selection of a suitable signal representation and the development of a better strategy for face feature construction. We demonstrate that through our Block Matching scheme Gabor phase information is powerful enough to improve the performance of face identification. Compared to state of the art Gabor filtering based approaches, the proposed algorithm features much lower algorithmic complexity. This is mainly due to our Block Matching enables the employment of high definition Gabor phase. Thus, a single-scale Gabor frequency band is sufficient for discrimination. Furthermore, learning process is not involved in the facial feature construction, which avoids the risk of building a database-dependent algorithm. Benchmark evaluations show that the proposed learning-free algorithm outperforms state-of-the-art Gabor approaches and is even comparable to Deep Learning solutions.
Download

Paper Nr: 179
Title:

Robust Real-time Tracking Guided by Reliable Local Features

Authors:

Marcos D. Zuniga and Cristian M. Orellana

Abstract: This work presents a new light-weight approach for robust real-time tracking in difficult environments, for situations including occlusion and varying illumination. The method increases the robustness of tracking based on reliability measures from the segmentation phase, for improving the selection and tracking of reliable local features for overall object tracking. The local descriptors are characterised by colour, structural and segmentation features, to provide a robust detection, while their reliability is characterised by descriptor distance, spatial-temporal coherence, contrast, and illumination criteria. These reliability measures are utilised to weight the contribution of the local features in the decision process for estimating the real position of the object. The proposed method can be adapted to any visual system that performs an initial segmentation phase based on background subtraction, and multi-target tracking using dynamic models. First, we present how to extract pixel-level reliability measures from algorithms based on background modelling. Then, we present how to use these measures to derive feature-level reliability measures for mobile objects. Finally, we describe the process to utilise this information for tracking an object in different environmental conditions. Preliminary results show good capability of the approach for improving object localisation in presence of low illumination.
Download

Short Papers
Paper Nr: 14
Title:

A Multiresolution 3D Morphable Face Model and Fitting Framework

Authors:

Patrik Huber, Guosheng Hu, Rafael Tena, Pouria Mortazavian, Willem P. Koppen, William J. Christmas, Matthias Rätsch and Josef Kittler

Abstract: 3D Morphable Face Models are a powerful tool in computer vision. They consists of a PCA model of face shape and colour information and allow to reconstruct a 3D face from a single 2D image. 3D Morphable Face Models are used for 3D head pose estimation, face analysis, face recognition, and, more recently, facial landmark detection and tracking. However, they are not as widely used as 2D methods - the process of building and using a 3D model is much more involved. In this paper, we present the Surrey Face Model, a multi-resolution 3D Morphable Model that we make available to the public for non-commercial purposes. The model contains different mesh resolution levels and landmark point annotations as well as metadata for texture remapping. Accompanying the model is a lightweight open-source C++ library designed with simplicity and ease of integration as its foremost goals. In addition to basic functionality, it contains pose estimation and face frontalisation algorithms. With the tools presented in this paper, we aim to close two gaps. First, by offering different model resolution levels and fast fitting functionality, we enable the use of a 3D Morphable Model in time-critical applications like tracking. Second, the software library makes it easy for the community to adopt the 3D Morphable Face Model in their research, and it offers a public place for collaboration.
Download

Paper Nr: 15
Title:

A New Face Beauty Prediction Model based on Blocked LBP

Authors:

Guangming Lu, Xihua Xiao and Fangmei Chen

Abstract: In recent years, many scholars use machine learning methods to analyze facial beauty and achieve some good results, but there are still some problems needed to be considered, for instance, the face beauty degrees are not widely distributed, and previous works emphasized more on face geometry features, rather than texture features. This paper proposes a novel face beauty prediction model based on Blocked Local Binary Patterns (BLBP). First, we obtain the face area by ASMs model, then, the BLBP algorithm is proposed in accordance with texture features. Finally, we use Pearson correlation coefficient between the output of the facial beauty by our algorithm and subjective judgments by the raters for evaluation. Experimental results show that the method can predict the beauty of face images automatically and effectively.
Download

Paper Nr: 19
Title:

Joint Color and Depth Segmentation based on Region Merging and Surface Fitting

Authors:

Giampaolo Pagnutti and Pietro Zanuttigh

Abstract: The recent introduction of consumer depth cameras has opened the way to novel segmentation approaches exploiting depth data together with the color information. This paper proposes a region merging segmentation scheme that jointly exploits the two clues. Firstly a set of multi-dimensional vectors is built considering the 3D spatial position, the surface orientation and the color data associated to each scene sample. Normalized cuts spectral clustering is applied to the obtained vectors in order to over-segment the scene into a large number of small segments. Then an iterative merging procedure is used to recombine the segments into the regions corresponding to the various objects and surfaces. The proposed algorithm tries to combine close compatible segments and uses a NURBS surface fitting scheme on the considered segments in order to understand if the regions candidate for the merging correspond to a single surface. The comparison with state-of-the-art methods shows how the proposed method provides an accurate and reliable scene segmentation.
Download

Paper Nr: 33
Title:

Benchmarking RGB-D Segmentation: Toy Dataset of Complex Crowded Scenes

Authors:

Aleksi Ikkala, Joni Pajarinen and Ville Kyrki

Abstract: In this paper we present a new RGB-D dataset captured with the Kinect sensor. The dataset is composed of typical children’s toys and contains a total of 449 RGB-D images alongside with their annotated ground truth images. Compared to existing RBG-D object segmentation datasets, the objects in our proposed dataset have more complex shapes and less texture. The images are also crowded and thus highly occluded. Three state-of-the-art segmentation methods are benchmarked using the dataset. These methods attack the problem of object segmentation from different starting points, providing a comprehensive view on the properties of the proposed dataset as well as the state-of-the-art performance. The results are mostly satisfactory but there remains plenty of room for improvement. This novel dataset thus poses the next challenge in the area of RGB-D object segmentation.
Download

Paper Nr: 34
Title:

Development of Defect Verification System of IC Lead Frame Surface using a Ring-lighting

Authors:

Yoshiharu Nakamura and Shuichi Enokida

Abstract: It is especially needed for the IC lead frames used in the manufacture of semiconductors, which require both high quality and miniaturization. To overcome above, automatic defect detection systems based on image processing methods were proposed. Especially, this paper focuses on methods using the surface normal direction to detect a deformation in flat parts. Since most of these methods use a fixed parameter, the risk of missing a defect in industrial parts becomes a problem. In this paper, new defect detection method is proposed for detecting various defect sizes and defect types. This method determines the appropriate block size based on the median value of luminance dispersions calculated for several block sizes and learning from a sample that detects a defect point beforehand. We used 105 samples in our experiments. Our experimental results show our proposed method selects the superior parameters and identification of the defect area selected is superior with learning in detecting defects of several sizes.
Download

Paper Nr: 51
Title:

Wavelet-based Defect Detection System for Grey-level Texture Images

Authors:

Gintarė Vaidelienė and Jonas Valantinas

Abstract: In this study, a new wavelet-based approach (system) to the detection of defects in grey-level texture images is presented. This new approach explores space localization properties of the discrete wavelet transform (DWT) and generates statistically-based parameterized defect detection criteria. The introduced system’s parameter provides the user with a possibility to control the percentage of both the actually defect-free images detected as defective and/or the actually defective images detected as defect-free, in the class of texture images under investigation. The developed defect detection system was implemented using discrete Haar and Le Gall wavelet transforms. For the experimental part, samples of ceramic tiles, as well as glass samples, taken from real factory environment, were used.
Download

Paper Nr: 96
Title:

Robust Facial Landmark Detection and Face Tracking in Thermal Infrared Images using Active Appearance Models

Authors:

Marcin Kopaczka, Kemal Acar and Dorit Merhof

Abstract: Long wave infrared (LWIR) imaging is an imaging modality currently gaining increasing attention. Facial images acquired with LWIR sensors can be used for illumination invariant person recognition and the contactless extraction of vital signs such as respiratory rate. In order to work properly, these applications require a precise detection of faces and regions of interest such as eyes or nose. Most current facial landmark detectors in the LWIR spectrum localize single salient facial regions by thresholding. These approaches are not robust against out-of-plane rotation and occlusion. To address this problem, we therefore introduce a LWIR face tracking method based on an active appearance model (AAM). The model is trained with a manually annotated database of thermal face images. Additionally, we evaluate the effect of different methods for AAM generation and image preprocessing on the fitting performance. The method is evaluated on a set of still images and a video sequence. Results show that AAMs are a robust method for the detection and tracking of facial landmarks in the LWIR spectrum.
Download

Paper Nr: 108
Title:

Global Hybrid Registration for 3D Constructed Surfaces using Ray-casting and Improved Self Adaptive Differential Evolution Algorithm

Authors:

Tao Ngoc Linh, Hasegawa Hiroshi and Tam Bui

Abstract: As a fundamental task in computer vision, registration has been a solution for many application such as: world modeling, part inspection and manufacturing, object recognition, pose estimation, robotic navigation, and reverse engineering. Given two images, the aim is to find the best possible homogenous transformation movement resulting in a more completed view of objects or scenarios. The paper presents a novel algorithm of registering structured pointcloud surfaces by using a fast ray-casting based closest point method intergrated with a new developed global optimization method Improve Self Adaptive Differential Evolution (ISADE). Ray-casting based L2 error calculation method enables the algorithm to find the local minima error effectively while ISADE exploits the searching boundary to find the global minima. The new algorithm is evaluated on structured images captured by a Kinect camera to show the superior in quality and robustness of ISADE over state-of-the-art searching method and accuracy of the new method over a well known registration algorithm, KinectFusion.
Download

Paper Nr: 118
Title:

HBD: Hexagon-Based Binary Descriptors

Authors:

Yuan Liu and J. Paul Siebert

Abstract: In this paper, two new rotationally invariant hexagon-based binary descriptors (HBD), i.e., HexIDB and HexLDB, are proposed in order to obtain better feature discriminability while encoding less redundant information. Our new descriptors are generated based on a hexagonal grouping structure that improves upon the HexBinary descriptor we reported previously. The third level descriptors of HexIDB and HexLDB have 270 bits and 99 bits respectively fewer than that of SHexBinary, due to sampling 61% fewer fields. Using learned parameters, HBD demonstrates better performance when matching the majority of the images in Mikolajczyk and Scmidt’s standard benchmark dataset, as compared to existing benchmark descriptors. Moreover, HBD also achieves promising level of performance when applied to pose estimation using the ALOI dataset, achieving  0.5 pixels mean pose error, only slightly inferior to fixed-scale SIFT, but around 1.5 pixels better than standard SIFT.
Download

Paper Nr: 144
Title:

Efficient Marble Slab Classification using Simple Features

Authors:

Mert Kilickaya, Umut Cinar and Sinan Ugurluoglu

Abstract: The marbles consist a large part of the buildings and widely used. Though, the manufacturing process for marbles are time consuming and inefficient: Human experts assign inconsistent labels to different marble classes causing a big loss of time and money. It arises the need for an automatic method of classifying marbles. In this paper we present a novel method which utilizes color, structural and textural representations of a marble. Once the representation is combined with an accurate segmentation step, it achieves an accuracy of 94% on a newly collected dataset of 1000 images. We suggest the best settings for an automatic marble classification system which is simple and fast enough to be used in a real-life environment like marble factories.
Download

Paper Nr: 158
Title:

Optimal Feature-set Selection Controlled by Pose-space Location

Authors:

Klaus Müller, Ievgen Smielik and Klaus-Dieter Kuhnert

Abstract: In this paper a novel feature subset selection method for model-based 3D-pose recovery is introduced. Many different kind of features were applied to correspondence-based pose recovery tasks. Every single feature has advantages and disadvantages based on the object’s properties like shape, texture or size. For that reason it is worthwhile to select features with special attention to object’s properties. This selection process was the topic of several publications in the past. Since the object’s are not static but rotatable and even flexible, their properties change depends on there pose configuration. In consequence the feature selection process has different results when pose configuration changes. That is the point where the proposed method comes into play: it selects and combines features regarding the objects pose-space location and creates several different feature subsets. An exemplary test run at the end of the paper shows that the method decreases the runtime and increases the accuracy of the matching process.
Download

Paper Nr: 172
Title:

A New Parametric Description for Line Structures in 3D Medical Images by Means of a Weighted Integral Method

Authors:

Hidetoshi Goto, Takumi Naito and Hidekata Hontani

Abstract: The authors propose a method that describes line structures in given 3D medical images by estimating the values of model parameters: A Gaussian function is employed as the model function and the values of the parameters are estimated by means of a weighted integral method, in which you can estimate the parameter values by solving a system of linear equations of parameters which are derived from differential equations that are satisfied by the Gaussian model function. Different from many other model-based methods for the description, the proposed method requires no parameter sweep and hence can estimate the parameter values efficiently. Once you estimate the parameter values, you can describe the location, the orientation and the scale of line structures in given 3D images. Experimental results with artificial 3D images and with clinical X-ray CT ones demonstrate the estimation performance of the proposed method.
Download

Paper Nr: 180
Title:

Detecting People in Large Crowded Spaces using 3D Data from Multiple Cameras

Authors:

João Carvalho, Manuel Marques, João Paulo Costeira and Pedro Mendes Jorge

Abstract: Real time monitoring of large infrastructures has human detection as a core task. Since the people anonymity is a hard constraint in these scenarios, video cameras can not be used. This paper presents a low cost solution for real time people detection in large crowded environments using multiple depth cameras. In order to detect people, binary classifiers (person/notperson) were proposed based on different sets of features. It is shown that good classification performance can be achieved choosing a small set of simple feature.
Download

Paper Nr: 195
Title:

Skin Surface Reconstruction and 3D Vessels Segmentation in Speckle Variance Optical Coherence Tomography

Authors:

Marco Manfredi, Costantino Grana and Giovanni Pellacani

Abstract: In this paper we present a method for in vivo surface reconstruction and 3D vessels segmentation from Speckle-Variance Optical Coherence Tomography imaging, applied to dermatology. This novel technology allows to capture motion underneath the skin surface revealing the presence of blood vessels. Standard OCT visualization techniques are inappropriate for this new source of information, that is crucial in early skin cancer diagnosis. We investigate 3D reconstruction techniques for better visualization of both the external and internal structure of skin lesions, as a tool to help clinicians in the task of qualitative tumor evaluation.
Download

Paper Nr: 206
Title:

Towards the Rectification of Highly Distorted Texts

Authors:

Stefania Calarasanu, Séverine Dubuisson and Jonathan Fabrizio

Abstract: A frequent challenge for many Text Understanding Systems is to tackle the variety of text characteristics in born-digital and natural scene images to which current OCRs are not well adapted. For example, texts in perspective are frequently present in real-word images, but despite the ability of some detectors to accurately localize such text objects, the recognition stage fails most of the time. Indeed, most OCRs are not designed to handle text strings in perspective but rather expect horizontal texts in a parallel-frontal plane to provide a correct transcription. In this paper, we propose a rectification procedure that can correct highly distorted texts, subject to rotation, shearing and perspective deformations. The method is based on an accurate estimation of the quadrangle bounding the deformed text in order to compute a homography to transform this quadrangle (and its content) into a horizontal rectangle. The rectification is validated on the dataset proposed during the ICDAR 2015 Competition on Scene Text Rectification.
Download

Paper Nr: 221
Title:

Region Extraction of Multiple Moving Objects with Image and Depth Sequence

Authors:

Katsuya Sugawara, Ryosuke Tsuruga, Toru Abe and Takuo Suganuma

Abstract: This paper proposes a novel method for extracting the regions of multiple moving objects with an image and a depth sequence. In addition to image features, diverse types of features, such as depth and image-depth-derived 3D motion, have been used in existing methods for improving the accuracy and robustness of object region extraction. Most of the existing methods determine individual object regions according to the spatial-temporal similarities of such features, i.e., they regard a spatial-temporal area of uniform features as a region sequence corresponding to the same object. Consequently, the depth features in a moving object region, where the depth varies with frames, and the motion features in a nonrigid or articulated object region, where the motion varies with parts, cannot be effectively used for object region extraction. To deal with these difficulties, our proposed method extracts the region sequences of individual moving objects according to depth feature similarity adjusted by each object movement and motion feature similarity computed only in adjacent parts. Through the experiments on scenes where a person moves a box, we demonstrate the effectiveness of the proposed method in extracting the regions of multiple moving objects.
Download

Paper Nr: 1
Title:

An Image Impairment Assessment Procedure using the Saliency Map Technique

Authors:

Hayato Teranaka and Minoru Nakayama

Abstract: An automated mechanical assessment procedure is required to evaluate image quality and impairment. This paper proposes a procedure for image impairment assessment using visual attention, such as saliency maps of the impaired images. To evaluate the performance of this image assessment procedure, an experiment was conducted to study viewer’s subjective evaluations of impaired images, and the relationships between viewer’s ratings and a previously developed set of values were then analyzed. Also, the limitations of the procedure which was developed were discussed in order to improve assessment performance. The use of image features and frequency-domain representation values for the test images was proposed.
Download

Paper Nr: 26
Title:

The Challenges and Advantages with a Parallel Implementation of Feature Matching

Authors:

Anders Hast and Andrea Marchetti

Abstract: The number of cores per cpu is predicted to double every second year. Therefore, the opportunity to parallelise currently used algorithms in computer vision and image processing needs to be addressed sooner rather than later. A parallel feature matching approach is proposed and evaluated in Matlab􏰂. The key idea is to use different interest point detectors so that each core can work on its own subset independently of the others. However, since the image pairs are the same, the homography will be essentially the same and can therefore be distributed by the process that first finds a solution. Nevertheless, the speedup is not linear and reasons why is discussed.
Download

Paper Nr: 37
Title:

Automatic Image Colorization based on Feature Lines

Authors:

Van Nguyen, Vicky Sintunata and Terumasa Aoki

Abstract: Automatic image colorization is one of the attractive research topics in image processing. The most crucial task in this field is how to design an algorithm to define appropriate color from the reference image(s) for propagating to the target image. In other words, we need to determine whether two pixels in reference and target images have similar color. In previous methods, many approaches have been introduced mostly based on local feature matching algorithms. However, they still have some defects as well as time-consuming. In this paper, we will present a novel automatic image colorization method based on Feature Lines. Feature Lines is our new concept, which enhances the concept of Color Lines. It represents the distribution of each pixel feature vector as being elongated around the lines so that we are able to assemble the similar feature pixels into one feature line. By introducing this new technique, pixel matching between reference and target images performs precisely. The experimental achievements show our proposed method achieves smoother, evener and more natural color assignment than the previous methods.
Download

Paper Nr: 49
Title:

A Benchmark of Computational Models of Saliency to Predict Human Fixations in Videos

Authors:

Shoaib Azam, Syed Omer Gilani, Moongu Jeon, Rehan Yousaf and Jeong Bae Kim

Abstract: In many applications of computer graphics and design, robotics and computer vision, there is always a need to predict where human looks in the scene. However this is still a challenging task that how human visual system certainly works. A number of computational models have been designed using different approaches to estimate the human visual system. Most of these models have been tested on images and performance is calculated on this basis. A benchmark is made using images to see the immediate comparison between the models. Apart from that there is no benchmark on videos, to alleviate this problem we have a created a benchmark of six computational models implemented on 12 videos which have been viewed by 15 observers in a free viewing task. Further a weighted theory (both manual and automatic) is designed and implemented on videos using these six models which improved Area under the ROC. We have found that Graph Based Visual Saliency (GBVS) and Random Centre Surround Models have outperformed the other models.
Download

Paper Nr: 99
Title:

Adaptive Neuro-Fuzzy Inference System for Echoes Classification in Radar Images

Authors:

Leila Sadouki and Boualem Haddad

Abstract: In order to remove the undesirable clutter which reduces the radar performances and causes significant errors in the rainfall estimation, we implemented in this paper an algorithm deals with the classification of radar echoes. The radar images studied are those recorded in Sétif (Algeria) every 15 minutes, we used a combination of textural approach, with the grey-level co-occurrence matrices, and a grid partition based fuzzy inference system, named ANFIS-GRID. We have used two parameters, namely Energy and local homogeneity that are considered to be the most effective in discriminating between precipitation echoes and clutter. Those parameters are used as inputs for the ANFIS-GRID, while the output of this system is the radar echo types. In function of the best mean rate of correct recognition and using two different optimization methods, the structure with 2 inputs, 4 membership functions, 16 rules and 1 output was selected as the most efficient ANFIS-GRID. This method gives a mean rate of correct recognition of echoes to over 93.52% (91.30% for precipitation echoes and 95.60% for clutter). In addition, the proposed approach gives a process maximum time of less than 90 seconds, which allows the filtering of the images in real time.
Download

Paper Nr: 139
Title:

Estimating Spectral Sensitivity of Human Observer for Multiplex Image Projection

Authors:

Koji Muramatsu, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose an efficient method for estimating the spectral sensitivity of human retina. The proposed method is used for the multiplex image projection by using multi-band projectors. The multiplex image projection can represent different images to different observers by using the differences of spectral sensitivities of these observers. Although precise spectral sensitivity is required for the multiplex image projection, the ordinary estimation methods require a lot of time for estimation. Thus, we in this paper propose an efficient method for estimating the spectral sensitivity of human eye. The experimental results show the efficiency of the proposed method.
Download

Paper Nr: 187
Title:

An Entropy-based Model for a Fast Computation of SSIM

Authors:

Vittoria Bruni and Domenico Vitulano

Abstract: The paper presents a model for assessing image quality from a subset of pixels. It is based on the fact that human beings do not explore the whole image information for quantifying its degree of distortion. Hence, the vision process can be seen in agreement with the Asymptotic Equipartition Property. The latter assures the existence of a subset of sequences of image blocks able to describe the whole image source with a prefixed and small error. Specifically, the well known Structural SIMilarity index (SSIM) has been considered. Its entropy has been used for defining a method for the selection of those image pixels that enable SSIM estimation with enough precision. Experimental results show that the proposed selection method is able to reduce the number of operations required by SSIM of about 200 times, with an estimation error less than 8%.
Download

Paper Nr: 215
Title:

Palm Vein Recognition based on NonsubSampled Contourlet Transform Features

Authors:

Amira Oueslati, Nadia Feddaoui and Kamel Hamrouni

Abstract: This paper presents a novel approach for person recognition by palm vein texture image based on Nonsub-Sampled Contourlet Transform (NSCT). Our approach consists of four steps. First, we reduce noise and en-hance contrast in order to produce a better quality of palm vein image then we localize the texture in the ROI. Next, the texture of enhanced image is analyzed by NSCT and obtained features witch are encoded to generate a signature of 676 bytes. Finally, we compute hamming distance in comparison to take decision. The experi-ments are performed on CASIA Multi-Spectral Palm print Image database. The method evaluation is complet-ed in both verification and identification scenarios and experimental results are compared with other methods. Experiments results prove the effectiveness and the robustness of NSCT method to extract discriminative fea-tures of palm veins texture.
Download

Paper Nr: 244
Title:

How Important is Scale in Galaxy Image Classification?

Authors:

AbdulWahab Kabani and Mahmoud R. El-Sakka

Abstract: In this paper, we study the importance of scale on Galaxy Image Classification. Galaxy Image classification involves performing Morphological Analysis to determine the shape of the galaxy. Traditionally, Morphological Analysis is carried out by trained experts. However, as the number of images of galaxies is increasing, there’s a desire to come up with a more scalable approach for classification. In this paper, we pre-process the images to have three different scales. Then, we train the same neural network for small number of epochs (number of passes over the data) on all of these three scales. After that, we report the performance of the neural network on each scale. There are two main contributions in this paper. First, we show that scale plays a major role in the performance of the neural network. Second, we show that normalizing the scale of the galaxy image produces better results. Such normalization can be extended to any image classification task with similar characteristics to the galaxy images and where there’s no background clutter.
Download

Paper Nr: 246
Title:

Facial Paresis Index Prediction by Exploiting Active Appearance Models for Compact Discriminative Features

Authors:

Luise Modersohn and Joachim Denzler

Abstract: In the field of otorhinolaryngology, the dysfunction of the facial nerve is a common disease which results in a paresis of usually one half of the patients face. The grade of paralysis is measured by physicians with rating scales, e.g. the Stennert Index or the House-Brackmann scale. In this work, we propose a method to analyse and predict the severity of facial paresis on the basis of single images. We combine feature extraction methods based on a generative approach (Active Appearance Models) with a fast non-linear classifier (Random Decision Forests) in order to predict the patients grade of facial paresis. In our proposed framework, we make use of highly discriminative features based on the fitting parameters of the Active Appearance Model, Action Units and Landmark distances. We show in our experiments that it is possible to correctly predict the grade of facial paresis in many cases, although the visual appearance is strongly varying. The presented method creates new opportunities to objectively document the patients progress in therapy.
Download

Paper Nr: 249
Title:

A Robust Pixel ECC based Algorithm for Occluded Image Alignment

Authors:

Nefeli Lamprinou and Emmanouil Z. Psarakis

Abstract: The alignment of occluded images constitutes a common and difficult problem. In this paper we propose a new method based on ECC algorithm tailored to occluded image alignment problem which enjoy a simple closed-form solution with low computational cost. Moreover, the use of a proper subset of the region of interest that limits the impact of the outliers in the estimation of the parameters is proposed. The use of this set seems to make the proposed method insensitive to the use of the occluded image as template or as warped in the alignment process. The proposed method is compared against two well known Gradient Corellation based methods by its application in several image alignment problems and in all cases outperforms its rivals in terms of accuracy and percentage of convergence.
Download

Area 3 - Image and Video Understanding

Full Papers
Paper Nr: 2
Title:

Detecting Fine-grained Sitting Affordances with Fuzzy Sets

Authors:

Viktor Seib, Malte Knauf and Dietrich Paulus

Abstract: Recently, object affordances have moved into the focus of researchers in computer vision. Affordances describe how an object can be used by a specific agent. This additional information on the purpose of an object is used to augment the classification process. With the herein proposed approach we aim at bringing affordances and object classification closer together by proposing fine-grained affordances. We present an algorithm that detects fine-grained sitting affordances in point clouds by iteratively transforming a human model into the scene. This approach enables us to distinguish object functionality on a finer-grained scale, thus more closely resembling the different purposes of similar objects. For instance, traditional methods suggest that a stool, chair and armchair all afford sitting. This is also true for our approach, but additionally we distinguish sitting without backrest, with backrest and with armrests. This fine-grained affordance definition closely resembles individual types of sitting and better reflects the purposes of different chairs. We experimentally evaluate our approach and provide fine-grained affordance annotations in a dataset from our lab.
Download

Paper Nr: 12
Title:

Combining Contextual and Modal Action Information into a Weighted Multikernel SVM for Human Action Recognition

Authors:

Jordi Bautista-Ballester, Jaume Jaume Vergés-Llahí and Domenec Puig

Abstract: Understanding human activities is one of the most challenging modern topics for robots. Either for imitation or anticipation, robots must recognize which action is performed by humans when they operate in a human environment. Action classification using a Bag of Words (BoW) representation has shown computational simplicity and good performance, but the increasing number of categories, including actions with high confusion, and the addition, especially in human robot interactions, of significant contextual and multimodal information has led most authors to focus their efforts on the combination of image descriptors. In this field, we propose the Contextual and Modal MultiKernel Learning Support Vector Machine (CMMKL-SVM). We introduce contextual information -objects directly related to the performed action by calculating the codebook from a set of points belonging to objects- and multimodal information -features from depth and 3D images resulting in a set of two extra modalities of information in addition to RGB images-. We code the action videos using a BoW representation with both contextual and modal information and introduce them to the optimal SVM kernel as a linear combination of single kernels weighted by learning. Experiments have been carried out on two action databases, CAD-120 and HMDB. The upturn achieved with our approach attained the same results for high constrained databases with respect to other similar approaches of the state of the art and it is much better as much realistic is the database, reaching a performance improvement of 14.27 % for HMDB.
Download

Paper Nr: 21
Title:

Using Motion Blur to Recognize Hand Gestures in Low-light Scenes

Authors:

Daisuke Sugimura, Yusuke Yasukawa and Takayuki Hamamoto

Abstract: We propose a method for recognizing hand gestures in low-light scenes. In such scenes, hand gesture images are significantly deteriorated because of heavy noise; therefore, previous methods may not work well. In this study, we exploit a single color image constructed by temporally integrating a hand gesture sequence. In general, the temporal integration of images improves the signal-to-noise (S/N) ratio; it enables us to capture sufficient appearance information of the hand gesture sequence. The key idea of this study is to exploit a motion blur, which is produced when integrating a hand gesture sequence temporally. The direction and the magnitude of motion blur are discriminative characteristics that can be used for differentiating hand gestures. In order to extract these features of motion blur, we analyze the gradient intensity and the color distributions of a single motion-blurred image. In particular, we encode such image features to self-similarity maps, which capture pairwise statistics of spatially localized features within a single image. The use of self-similarity maps allows us to represent invariant characteristics to the individual variations in the same hand gestures. Using self-similarity maps, we construct a classifier for hand gesture recognition. Our experiments demonstrate the effectiveness of the proposed method.
Download

Paper Nr: 45
Title:

Knowing What You Don’t Know - Novelty Detection for Action Recognition in Personal Robots

Authors:

Thomas Moerland, Aswin Chandarr, Maja Rudinac and Pieter Jonker

Abstract: Novelty detection is essential for personal robots to continuously learn and adapt in open environments. This paper specifically studies novelty detection in the context of action recognition. To detect unknown (novel) human action sequences we propose a new method called background models, which is applicable to any generative classifier. Our closed-set action recognition system consists of a new skeleton-based feature combined with a Hidden Markov Model (HMM)-based generative classifier, which has shown good earlier results in action recognition. Subsequently, novelty detection is approached from both a posterior likelihood and hypothesis testing view, which is unified as background models. We investigate a diverse set of background models: sum over competing models, filler models, flat models, anti-models, and some reweighted combinations. Our standard recognition system has an inter-subject recognition accuracy of 96% on the Microsoft Research Action 3D dataset. Moreover, the novelty detection module combining anti-models with flat models has 78% accuracy in novelty detection, while maintaining 78% standard recognition accuracy as well. Our methodology can increase robustness of any current HMM-based action recognition system against open environments, and is a first step towards an incrementally learning system.
Download

Paper Nr: 46
Title:

Modeling Human Motion for Predicting Usage of Hospital Operating Room

Authors:

Ilyes Sghir and Shishir Shah

Abstract: In this paper, we present a system that exploits existing video streams from a hospital operating room (OR) to infer OR usage states. We define OR states that are relevant for assessing OR usage efficiency. We adopt a holistic approach that involves the combination of two meaningful human motion features: gestures or upper body movements computed using optical flow and whole body movements computed through motion trajectories. The two features are independently modeled for each of the defined OR usage states and eventually fused to obtain a final decision. Our approach is tested on a large collection of videos and the results show that the combination of both human motion features provide significant discriminative power in understanding usage of an OR.
Download

Paper Nr: 48
Title:

Real-time Scale-invariant Object Recognition from Light Field Imaging

Authors:

Séverine Cloix, Thierry Pun and David Hasler

Abstract: We present a novel light field dataset along with a real-time and scale-invariant object recognition system. Our method is based on bag-of-visual-words and codebook approaches. Its evaluation was carried out on a subset of our dataset of unconventional images. We show that the low variance in scale inferred from the specificities of a plenoptic camera allows high recognition performance. With one training image per object to recognise, recognition rates greater than 90 % are demonstrated despite a scale variation of up to 178 %. Our versatile light-field image dataset, CSEM-25, is composed of five classes of five instances captured with the recent industrial Raytrix R5 camera at different distances with several poses and backgrounds. We make it available for research purposes.
Download

Paper Nr: 52
Title:

Hough Parameter Space Regularisation for Line Detection in 3D

Authors:

Manuel Jeltsch, Christoph Dalitz and Regina Pohle-Fröhlich

Abstract: The Hough transform is a well known technique for detecting lines or other parametric shapes in point clouds. When it is used for finding lines in a 3D-space, an appropriate line representation and quantisation of the parameter space is necessary. In this paper, we address the problem that a straightforward quantisation of the optimal four-parameter representation of a line after Roberts results in an inhomogeneous tessellation of the geometric space that introduces bias with respect to certain line orientations. We present a discretisation of the line directions via tessellation of an icosahedron that overcomes this problem whenever one parameter in the Hough space represents a direction in 3D (e.g. for lines or planes). The new method is applied to the detection of ridges and straight edges in laser scan data of buildings, where it performs better than a straightforward quantisation.
Download

Paper Nr: 55
Title:

3D Descriptor for an Oriented-human Classification from Complete Point Cloud

Authors:

Kyis Essmaeel, Cyrille Migniot and Albert Dipanda

Abstract: In this paper we present a new 3D descriptor for the human classification. It is applied over a complete point cloud (i.e 360° view) acquired with a multi-kinect system. The proposed descriptor is derived from the Histogram of Oriented Gradient (HOG) descriptor: surface normal vectors are employed instead of gradients, 3D poins are expressed on a cylindrical space and 3D orientation quantization are computed by projecting the normal vectors on a regular polyhedron. Our descriptor is utilized through a Support Vector Machine (SVM) classifier. The SVM classifier is trained using an original database composed of data acquired by our multi-kinect system. The evaluation of the proposed 3D descriptor over a set of candidates shows very promising results. The descriptor can efficiently discriminate human from non-human candidates and provides the frontal direction of the human with a high precision. The comparison with a well known descriptor demonstrates significant improvements of results.
Download

Paper Nr: 72
Title:

3D Human Poses Estimation from a Single 2D Silhouette

Authors:

Fabrice Dieudonné Atrevi, Damien Vivet, Florent Duculty and Bruno Emile

Abstract: This work focuses on the problem of automatically extracting human 3D poses from a single 2D image. By pose we mean the configuration of human bones in order to reconstruct a 3D skeleton representing the 3D posture of the detected human. This problem is highly non-linear in nature and confounds standard regression techniques. Our approach combines prior learned correspondences between silhouettes and skeletons extracted from 3D human models. In order to match detected silhouettes with simulated silhouettes, we used Krawtchouk geometric moment as shape descriptor. We provide quantitative results for image retrieval across different action and subjects, captured from differing viewpoints. We show that our approach gives promising result for 3D pose extraction from a single silhouette.
Download

Paper Nr: 120
Title:

Analyzing the Stability of Convolutional Neural Networks against Image Degradation

Authors:

Hamed Habibi Aghdam, Elnaz Jahani Heravi and Domenec Puig

Abstract: Understanding the underlying process of Convolutional Neural Networks (ConvNets) is usually done through visualization techniques. However, these techniques do not provide accurate information about the stability of ConvNets. In this paper, our aim is to analyze the stability of ConvNets through different techniques. First, we propose a new method for finding the minimum noisy image which is located in the minimum distance from the decision boundary but it is misclassified by its ConvNet. Second, we exploratorly and quanitatively analyze the stability of the ConvNets trained on the CIFAR10, the MNIST and the GTSRB datasets. We observe that the ConvNets might make mistakes by adding a Gaussian noise with s = 1 (barely perceivable by human eyes) to the clean image. This suggests that the inter-class margin of the feature space obtained from a ConvNet is slim. Our second founding is that augmenting the clean dataset with many noisy images does not increase the inter-class margin. Consequently, a ConvNet trained on a dataset augmented with noisy images might incorrectly classify the images degraded with a low magnitude noise. The third founding reveals that even though an ensemble improves the stability, its performance is considerably reduced by a noisy dataset.
Download

Paper Nr: 137
Title:

Using Apache Lucene to Search Vector of Locally Aggregated Descriptors

Authors:

Giuseppe Amato, Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro and Lucia Vadicamo

Abstract: Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on metric space using conventional text search engines, such as Apache Lucene. This technique is based on comparing the permutations of some reference objects in place of the original metric distance. However, the Achilles heel of STR approach is the need to reorder the result set of the search according to the metric distance. This forces to use a support database to store the original objects, which requires efficient random I/O on a fast secondary memory (such as flash-based storages). In this paper, we propose to extend the Surrogate Text Representation to specifically address a class of visual metric objects known as Vector of Locally Aggregated Descriptors (VLAD). This approach is based on representing the individual sub-vectors forming the VLAD vector with the STR, providing a finer representation of the vector and enabling us to get rid of the reordering phase. The experiments on a publicly available dataset show that the extended STR outperforms the baseline STR achieving satisfactory performance near to the one obtained with the original VLAD vectors.
Download

Paper Nr: 140
Title:

Structured Edge Detection for Improved Object Localization using the Discriminative Generalized Hough Transform

Authors:

Eric Gabriel, Ferdinand Hahmann, Gordon Böer, Hauke Schramm and Carsten Meyer

Abstract: Automatic localization of target objects in digital images is an important task in Computer Vision. The Generalized Hough Transform (GHT) and its variant, the Discriminative Generalized Hough Transform (DGHT), are model-based object localization algorithms which determine the most likely object position based on accumulated votes in the so-called Hough space. Many automatic localization algorithms - including the GHT and the DGHT - operate on edge images, using e.g. the Canny or the Sobel Edge Detector. However, if the image contains many edges not belonging to the object of interest (e.g. from other objects, background clutter, noise etc.), these edges cause misleading votes which increase the probability of localization errors. In this paper we investigate the effect of a more sophisticated edge detection algorithm, called Structured Edge Detector, on the performance of a DGHT-based object localization approach. This method utilizes information on the shape of the target object to substantially reduce the amount of non-object edges. Combining this technique with the DGHT leads to a significant localization performance improvement for automatic pedestrian and car detection.
Download

Paper Nr: 154
Title:

Robust Background Modeling and Foreground Detection using Dynamic Textures

Authors:

M. Sami Zitouni, Harish Bhaskar and Mohammed Al-Mualla

Abstract: In this paper, a dynamic background modeling and hence foreground detection technique using a Gaussian Mixture Model (GMM) of spatio-temporal patches of dynamic texture (DT) is proposed. Existing methods for background modeling cannot adequately distinguish movements in both background and foreground, that usually characterizes any dynamic scene. Therefore, in most of these methods, the separation of the background from foreground requires precise tuning of parameters or an apriori model of the foreground. The proposed method aims to differentiate between global from local motion by attributing the video using spatio-temporal patches of DT modeled using a typical GMM framework. In addition to alleviating the aforementioned limitations, the proposed method can cope with complex dynamic scenes without the need for training or parameter tuning. Qualitative and quantitative analysis of the method compared against competing baselines have demonstrated the superiority of the method and the robustness against dynamic variations in the background.
Download

Paper Nr: 160
Title:

Transductive Transfer Learning to Specialize a Generic Classifier Towards a Specific Scene

Authors:

Houda Maâmatou, Thierry Chateau, Sami Gazzah, Yann Goyat and Najoua Essoukri Ben Amara

Abstract: In this paper, we tackle the problem of domain adaptation to perform object-classification and detection tasks in video surveillance starting by a generic trained detector. Precisely, we put forward a new transductive transfer learning framework based on a sequential Monte Carlo filter to specialize a generic classifier towards a specific scene. The proposed algorithm approximates iteratively the target distribution as a set of samples (selected from both source and target domains) which feed the learning step of a specialized classifier. The output classifier is applied to pedestrian detection into a traffic scene. We have demonstrated by many experiments, on the CUHK Square Dataset and the MIT Traffic Dataset, that the performance of the specialized classifier outperforms the generic classifier and that the suggested algorithm presents encouraging results.
Download

Paper Nr: 163
Title:

Information Fusion for Action Recognition with Deeply Optimised Hough Transform Paradigm

Authors:

Geoffrey Vaquette, Catherine Achard and Laurent Lucat

Abstract: Automatic human action recognition is a challenging and largely explored domain. In this work, we focus on action segmentation with Hough Transform paradigm and more precisely with Deeply Optimised Hough Transform (DOHT). First, we apply DOHT on video sequences using the well-known dense trajectories features and then, we propose to extend the method to efficiently merge information coming from various sensors. We have introduced three different ways to perform fusion, depending on the level at which information is merged. Advantages and disadvantages of these solutions are presented from the performance point of view and also according to the ease of use. Thus, one of the fusion level has the advantage to stay suitabe even if one or more sensors is out of order or disturbed.
Download

Paper Nr: 170
Title:

Monocular Depth Ordering using Perceptual Occlusion Cues

Authors:

Babak Rezaeirowshan, Coloma Ballester and Gloria Haro

Abstract: In this paper we propose a method to estimate a global depth order between the objects of a scene using information from a single image coming from an uncalibrated camera. The method we present stems from early vision cues such as occlusion and convexity and uses them to infer both a local and a global depth order. Monocular occlusion cues, namely, T-junctions and convexities, contain information suggesting a local depth order between neighbouring objects. A combination of these cues is more suitable, because, while information conveyed by T-junctions is perceptually stronger, they are not as prevalent as convexity cues in natural images. We propose a novel convexity detector that also establishes a local depth order. The partial order is extracted in T-junctions by using a curvature-based multi-scale feature. Finally, a global depth order, i.e., a full order of all shapes that is as consistent as possible with the computed partial orders that can tolerate conflicting partial orders is computed. An integration scheme based on a Markov chain approximation of the rank aggregation problem is used for this purpose. The experiments conducted show that the proposed method compares favorably with the state of the art.
Download

Paper Nr: 189
Title:

Action-centric Polar Representation of Motion Trajectories for Online Action Recognition

Authors:

Fabio Martinez, Antoine Manzanera, Michèle Gouiffès and Thanh Phuong Nguyen

Abstract: This work introduces a novel action descriptor that represents activities instantaneously in each frame of a video sequence for action recognition. The proposed approach first characterizes the video by computing kinematic primitives along trajectories obtained by semi-dense point tracking in the video. Then, a frame level characterization is achieved by computing a spatial action-centric polar representation from the computed trajectories. This representation aims at quantifying the image space and grouping the trajectories within radial and angular regions. Motion histograms are then temporally aggregated in each region to form a kinematic signature from the current trajectories. Histograms with several time depths can be computed to obtain different motion characterization versions. These motion histograms are updated at each time, to reflect the kinematic trend of trajectories in each region. The action descriptor is then defined as the collection of motion histograms from all the regions in a specific frame. Classic support vector machine (SVM) models are used to carry out the classification according to each time depth. The proposed approach is easy to implement, very fast and the representation is consistent to code a broad variety of actions thanks to a multi-level representation of motion primitives. The proposed approach was evaluated on different public action datasets showing competitive results (94% and 88:7% of accuracy are achieved in KTH and UT datasets, respectively), and an efficient computation time.
Download

Paper Nr: 248
Title:

A Holistic Method to Recognize Characters in Natural Scenes

Authors:

Muhammad Ali and Hassan Foroosh

Abstract: Local features like Histogram of Gradients (HoG), Shape Contexts (SC) etc. are normally used by research community concerned with text recognition in natural scene images. The main issue that comes with this approach is ad hoc rasterization of feature vector which can disturb global structural and spatial correlations while constructing feature vector. Moreover, such approaches, in general, don’t take into account rotational invariance property that often leads to failed recognition in cases where characters occur in rotated positions in scene images. To address local feature dependency and rotation problems, we propose a novel holistic feature based on active contour model, aka snakes. Our feature vector is based on two variables, direction and distance, cumulatively traversed by each point as the initial circular contour evolves under the force field induced by the image. The initial contour design in conjunction with cross-correlation based similarity metric enables us to account for rotational variance in the character image. We use various datasets, including synthetic and natural scene character datasets, like Chars74K-Font, Chars74K-Image, and ICDAR2003 to compare results of our approach with several baseline methods and show better performance than methods based on local features (e.g. HoG). Our leave-random-one-out-cross validation yields even better recognition performance, justifying our approach of using holistic character recognition.
Download

Short Papers
Paper Nr: 18
Title:

Activity Prediction using a Space-Time CNN and Bayesian Framework

Authors:

Hirokatsu Kataoka, Yoshimitsu Aoki, Kenji Iwata and Yutaka Satoh

Abstract: We present a technique to address the new challenge of activity prediction in computer vision field. In activity prediction, we infer the next human activity through "classified activities" and "activity data analysis.” Moreover, the prediction should be processed in real-time to avoid dangerous or anomalous activities. The combination of space--time convolutional neural networks (ST-CNN) and improved dense trajectories (iDT) are able to effectively understand human activities in image sequences. After categorizing human activities, we insert activity tags into an activity database in order to sample a distribution of human activity. A naive Bayes classifier allows us to achieve real-time activity prediction because only three elements are needed for parameter estimation. The contributions of this paper are: (i) activity prediction within a Bayesian framework and (ii) ST-CNN and iDT features for activity recognition. Moreover, human activity prediction in real-scenes is achieved with 81.0% accuracy.
Download

Paper Nr: 28
Title:

Robust Pallet Detection for Automated Logistics Operations

Authors:

Robert Varga and Sergiu Nedevschi

Abstract: A pallet detection system is presented which is designed for automated forklifts for logistics operations. The system performs stereo reconstruction and pallets are detected using a sliding window approach. In this paper we propose a candidate generation method and we introduce feature descriptors for grayscale images that are tailored to the current task. The features are designed to be invariant to certain types of illumination changes and are called normalized pair differences because of the formula involved in their calculation. Experimental results validate our approach on extensive real world data.
Download

Paper Nr: 30
Title:

Recognizing Human Actions based on Extreme Learning Machines

Authors:

Grégoire Lefebvre and Julien Cumin

Abstract: In this paper, we tackle the challenge of action recognition by building robust models from Extreme Learning Machines (ELM). Applying this approach from reduced preprocessed feature vectors on the Microsoft Research Cambridge-12 (MSRC-12) Kinect gesture dataset outperforms the state-of-the-art results with an average correct classification rate of 0.953 over 20 runs, when splitting in two equal subsets for training and testing the 6,244 action instances. This ELM based proposal using a multi-quadric radial basis activation function is compared to other classical classification strategies such as Support Vector Machines (SVM) and Multi-Layer Perceptron (MLP) and advancements are also presented in terms of execution times.
Download

Paper Nr: 53
Title:

Person Re-identification based on Human Query on Soft Biometrics using SVM Regression

Authors:

Athira Nambiar, Alexandre Bernardino and Jacinto C. Nascimento

Abstract: We propose a novel methodology for person re-identification (Re-ID) based on the biometric description of the upper-torso region of the human body. The proposed methodology leverages soft biometrics via Support Vector Regression (SVR) and Shape Context (SC) features obtained from the upper-torso silhouette of the human body. First, mappings from the upper-torso Shape Context to soft biometrics are learned from virtual avatars rendered by computer graphics engines, to circumvent the need for time consuming manual labelling of human datasets. Second, it is possible to formulate a human query of a given suspect against a gallery of previously stored soft biometrics. At this point, the proposed system is able to provide a ranked list of the persons, based on the description given. Third, an extensive study on the different regression methodologies to achieve the above mentioned mappings is carried out. We also conduct real time Re-ID experiments in an existing Re-ID dataset, and promising results are reported.
Download

Paper Nr: 57
Title:

A Priori Data and A Posteriori Decision Fusions for Human Action Recognition

Authors:

Julien Cumin and Grégoire Lefebvre

Abstract: In this paper, we tackle the challenge of human action recognition using multiple data sources by mixing a priori data fusion and a posteriori decision fusion. Our strategy applied from 3 main classifiers (Dynamic Time Warping, Multi-Layer Perceptron and Siamese Neural Network) using several decision fusion methods (Voting, Stacking, Dempster-Shafer Theory and Possibility Theory) on two databases (MHAD (Ofli et al., 2013) and ChAirGest (Ruffieux et al., 2013)) outperforms state-of-the-art results with respectively 99.85%±0:53 and 96.40%±3:37 of best average correct classification when evaluating a leave-one-subject-out protocol.
Download

Paper Nr: 58
Title:

Superpixels in Pedestrian Detection from Stereo Images in Urban Traffic Scenarios

Authors:

Ion Giosan and Sergiu Nedevschi

Abstract: Pedestrian detection is a common task in every driving assistance system. The main goal resides in obtaining a high accuracy detection in a reasonable amount of processing time. This paper proposes a novel method for superpixel-based pedestrian hypotheses generation and their validation through feature classification. We analyze the possibility of using superpixels in pedestrian detection by investigating both the execution time and the accuracy of the results. Urban traffic images are acquired by a stereo-cameras system. A multi-feature superpixels-based method is used for obstacles segmentation and pedestrian hypotheses selection. Histogram of Oriented Gradients features are extracted both on the raw 2D intensity image and also on the superpixels mean intensity image for each hypothesis. Principal Component Analysis is also employed for selecting the relevant features. Support Vector Machine and AdaBoost classifiers are trained on: initial features and selected features extracted from both raw 2D intensity image and mean superpixels intensity image. The comparative results show that superpixels- based pedestrian detection clearly reduce the execution time while the quality of the results is just slightly decreased.
Download

Paper Nr: 62
Title:

Unsupervised Framework for Interactions Modeling between Multiple Objects

Authors:

Ali Al-Raziqi and Joachim Denzler

Abstract: Extracting compound interactions involving multiple objects is a challenging task in computer vision due to different issues such as the mutual occlusions between objects, the varying group size and issues raised from the tracker. Additionally, the single activities are uncommon compared with the activities that are performed by two or more objects, e.g., gathering, fighting, running, etc. The purpose of this paper is to address the problem of interaction recognition among multiple objects based on dynamic features in an unsupervised manner. Our main contribution is twofold. First, a combined framework using a tracking-by-detection framework for trajectory extraction and HDPs for latent interaction extraction is introduced. Another important contribution is the introduction of a new dataset, the Cavy dataset. The Cavy dataset contains about six dominant interactions performed several times by two or three cavies at different locations. The cavies are interacting in complicated and unexpected ways, which leads to perform many interactions in a short time. This makes working on this dataset more challenging. The experiments in this study are not only performed on the Cavy dataset but we also use the benchmark dataset Behave. The experiments on these datasets demonstrate the effectiveness of the proposed method. Although the our approach is completely unsupervised, we achieved satisfactory results with a clustering accuracy of up to 68.84% on the Behave dataset and up to 45% on the Cavy dataset.
Download

Paper Nr: 73
Title:

Head Yaw Estimation using Frontal Face Detector

Authors:

José Mennesson, Afifa Dahmane, Taner Danisman and Ioan Marius Bilasco

Abstract: Detecting accurately head orientation is an important task in systems relying on face analysis. The estimation of the horizontal rotation of the head (yaw rotation) is a key step in detecting the orientation of the face. The purpose of this paper is to use a well-known frontal face detector in order to estimate head yaw angle. Our approach consists in simulating 3D head rotations and detecting face using a frontal face detector. Indeed, head yaw angle can be estimated by determining the angle at which the 3D head must be rotated to be frontal. This approach is model-free and unsupervised (except the generic learning step of VJ algorithm). The method is experimented and compared with the state-of-the-art approaches using continuous and discrete protocols on two well-known databases : FacePix and Pointing04.
Download

Paper Nr: 81
Title:

Introducing FoxFaces: A 3-in-1 Head Dataset

Authors:

Amel Aissaoui, Afifa Dahmane, Jean Martinet and Ioan Marius Bilasco

Abstract: We introduce a new test collection named FoxFaces, dedicated to researchers in face recognition and analysis. The creation of this dataset was motivated by a lack encountered in the existing 3D/4D datasets. FoxFaces contains 3 face datasets obtained with several devices. Faces are captured with different changes in pose, expression and illumination. The presented collection is unique in two aspects: the acquisition is performed using three little constrained devices offering 2D, depth and stereo information on faces. In addition, it contains both still images and videos allowing static and dynamic face analysis. Hence, our dataset can be an interesting resource for the evaluation of 2D, 3D and bimodal algorithms on face recognition under adverse conditions as well as facial expression recognition and pose estimation algorithms in static and dynamic domains (images and videos). Stereo, color, and range images and videos of 64 adult human subjects are acquired. Acquisitions are accompanied with information about the subjects identity, gender, facial expression, approximate pose orientation and the coordinates of some manually located facial fiducial points.
Download

Paper Nr: 90
Title:

Fast and Accurate Face Orientation Measurement in Low-resolution Images on Embedded Hardware

Authors:

Dries Hulens, Kristof Van Beeck and Toon Goedemé

Abstract: In a lot of applications it is important to collect some information about the gaze orientation or head-angle of a person. Just consider measuring the alertness of a car driver to see if he is still awake, or the attentiveness of people crossing a street to see if they noticed the cars driving by. In our own application we want to apply cinematographic rules (e.g. the rule of thirds where a face should be positioned left or right in the frame depending on the gaze direction) on images taken from on a UAV. Nowadays these applications should run on embedded hardware so they can be easily attached on e.g. a car or a UAV. This implies that the head angle detection algorithm should run in real-time on minimal hardware. Therefore we developed two approaches that run in real-time on embedded hardware while gaining excellent performance. We demonstrated these approaches on both a publicly available face dataset and our own dataset recorded by a UAV.
Download

Paper Nr: 104
Title:

Hand Waving Gesture Detection using a Far-infrared Sensor Array with Thermo-spatial Region of Interest

Authors:

Chisato Toriyama, Yasutomo Kawanishi, Tomokazu Takahashi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase, Tomoyoshi Aizawa and Masato Kawade

Abstract: We propose a method of hand waving gesture detection using a far-infrared sensor array. The far-infrared sensor array captures the spatial distribution of temperature as a thermal image by detecting far-infrared waves emitted from heat sources. The advantage of the sensor is that it can capture human position and movement while protecting the privacy of the target individual. In addition, it works even at night-time without any light source. However, it is difficult to detect a gesture from a thermal image sequence captured by the sensor due to its low-resolution and noise. The problem is that the noise appears as a similar pattern as the gesture. Therefore, we introduce “Spatial Region of Interest (SRoI)” to focus on the region with motion. Also, to suppress the influence of other heat sources, we introduce “Thermal Region of Interest (TRoI)” to focus on the range of the human body temperature. In this paper, we demonstrate the effectiveness of the method through an experiment and discuss its result.
Download

Paper Nr: 106
Title:

Semi-automatic Hand Annotation Making Human-human Interaction Analysis Fast and Accurate

Authors:

Stijn De Beugher, Geert Brône and Toon Goedemé

Abstract: The detection of human hands is of great importance in a variety of domains including research on humancomputer interaction, human-human interaction, sign language and physiotherapy. Within this field of research one is interested in relevant items in recordings, such as for example faces, human body or hands. However, nowadays this annotation is mainly done manual, which makes this task extremely time consuming. In this paper, we present a semi-automatic alternative for the manual labeling of recordings. Our system automatically searches for hands in images and asks for manual intervention if the confidence of a detection is too low. Most of the existing approaches rely on complex and computationally intensive models to achieve accurate hand detections, while our approach is based on segmentation techniques, smart tracking mechanisms and knowledge of human pose context. This makes our approach substantially faster as compared to existing approaches. In this paper we apply our semi-automatic hand detection to the annotation of mobile eye-tracker recordings on human-human interaction. Our system makes the analysis of such data tremendously faster (244×) while maintaining an average accuracy of 93.68% on the tested datasets.
Download

Paper Nr: 112
Title:

Evaluating the Effects of Convolutional Neural Network Committees

Authors:

Fran Jurišić, Ivan Filković and Zoran Kalafatić

Abstract: Many high performing deep learning models for image classification put their base models in a committee as a final step to gain competitive edge. In this paper we focus on that aspect, analyzing how committee size and makeup of models trained with different preprocessing methods impact final performance. Working with two datasets, representing both rigid and non-rigid object classification in German Traffic Sign Recognition Benchmark (GTSRB) and CIFAR-10, and two preprocessing methods in addition to original images, we report performance improvements and compare them. Our experiments cover committees trained on just one dataset variation as well as hybrid ones, unreliability of small committees of low error models and performance metrics specific to the way committees are built. We point out some guidelines to predict committee behavior and good approaches to analyze their impact and limitations.
Download

Paper Nr: 117
Title:

How Effective Are Aggregation Methods on Binary Features?

Authors:

Giuseppe Amato, Fabrizio Falchi and Lucia Vadicamo

Abstract: During the last decade, various local features have been proposed and used to support Content Based Image Retrieval and object recognition tasks. Local features allow to effectively match local structures between images, but the cost of extraction and pairwise comparison of the local descriptors becomes a bottleneck when mobile devices and/or large database are used. Two major directions have been followed to improve efficiency of local features based approaches. On one hand, the cost of extracting, representing and matching local visual descriptors has been reduced by defining binary local features. On the other hand, methods for quantizing or aggregating local features have been proposed to scale up image matching on very large scale. In this paper, we performed an extensive comparison of the state-of-the-art aggregation methods applied to ORB binary descriptors. Our results show that the use of aggregation methods on binary local features is generally effective even if, as expected, there is a loss of performance compared to the same approaches applied to non-binary features. However, aggregations of binary feature represent a worthwhile option when one need to use devices with very low CPU and memory resources, as mobile and wearable devices.
Download

Paper Nr: 134
Title:

ACTIVE, an Extensible Cataloging Platform for Automatic Indexing of Audiovisual Content

Authors:

Maurizio Pintus, Maurizio Agelli, Felice Colucci, Nicola Corona, Alessandro Sassu and Federico Santamaria

Abstract: The cost of manual metadata production is high, especially for audiovisual content, where a time-consuming inspection is usually required in order to identify the most appropriate annotations. There is a growing need from digital content industries for solutions capable of automating such a process. In this work we present ACTIVE, a platform for indexing and cataloging audiovisual collections through the automatic recognition of faces and speakers. Adopted algorithms are described and our main contributions on people clustering and caption-based people identification are presented. Results of experiments carried out on a set of TV shows and audio files are reported and analyzed. An overview of the whole architecture is presented as well, with a focus on chosen solutions for making the platform easily extensible (plug-ins) and for distributing CPU-intensive calculations across a network of computers.
Download

Paper Nr: 135
Title:

Robust Face Identification with Small Sample Sizes using Bag of Words and Histogram of Oriented Gradients

Authors:

Mahir Faik Karaaba, Olarik Surinta, L. R. B. Schomaker and Marco A. Wiering

Abstract: Face identification under small sample conditions is currently an active research area. In a case of very few reference samples, optimally exploiting the training data to make a model which has a low generalization error is an important challenge to create a robust face identification algorithm. In this paper we propose to combine the histogram of oriented gradients (HOG) and the bag of words (BOW) approach to use few training examples for robust face identification. In this HOG-BOW method, from every image many sub-images are first randomly cropped and given to the HOG feature extractor to compute many different feature vectors. Then these feature vectors are given to a K-means clustering algorithm to compute the centroids which serve as a codebook. This codebook is used by a sliding window to compute feature vectors for all training and test images. Finally, the feature vectors are fed into an L2 support vector machine to learn a linear model that will classify the test images. To show the efficiency of our method, we also experimented with two other feature extraction algorithms: HOG and the scale invariant feature transform (SIFT). All methods are compared on two well-known face image datasets with one to three training examples per person. The experimental results show that the HOG-BOW algorithm clearly outperforms the other methods.
Download

Paper Nr: 142
Title:

Low Latency Action Recognition with Depth Information

Authors:

Ali Seydi Keceli and Ahmet Burak Can

Abstract: In this study an approach for low latency action recognition is proposed. Low latency action recognition aims to recognize actions without observing the whole action sequence. In the proposed approach, a skeletal model is obtained from depth images. Features extracted from the skeletal model are considered as time series and histograms. To classify actions, Adaboost M1 classifier is utilized with an SVM kernel. The trained classifiers are tested with different action observation ratios and compared with some of the studies in the literature. The model produces promising results without observing the whole action sequence.
Download

Paper Nr: 143
Title:

A Prototype Application for Long-time Behavior Modeling and Abnormal Events Detection

Authors:

Nicoletta Noceti and Francesca Odone

Abstract: In this work we present a prototype application for modelling common behaviours from long-time observations of a scene. The core of the system is based on the method proposed in (Noceti and Odone, 2012), an adaptive technique for profiling patterns of activities on temporal data -- coupling a string-based representation and an unsupervised learning strategy -- and detecting anomalies --- i.e., dynamic events diverging with respect to the usual dynamics. We propose an engineered framework where the method is adopted to perform an online analysis over very long time intervals (weeks of activity). The behaviour models are updated to accommodate new patterns and cope with the physiological scene variations. We provide a thorough experimental assessment, to show the robustness of the application in capturing the evolution of the scene dynamics.
Download

Paper Nr: 150
Title:

Multiple 3D Object Recognition using RGB-D Data and Physical Consistency for Automated Warehousing Robots

Authors:

Shuichi Akizuki and Manabu Hashimoto

Abstract: In this research, we propose a method to recognize multiple objects in the shelves of automated warehouses. The purpose of this research is to enhance the reliability of the Hypothesis Verification (HV) method that simultaneously recognizes layout of multiple objects. The proposed method have employed not only the RGB-D consistency between the input scene and the scene hypothesis but also the physical consistency. By considering the physical consistency of the scene hypothesis, the proposed HV method can efficiently reject false one. Experiment results for object which are used at Amazon Picking Challenge 2015 have been confirmed that the recognition success rate of the proposed method is higher than the previous HV method.
Download

Paper Nr: 150
Title:

Multiple 3D Object Recognition using RGB-D Data and Physical Consistency for Automated Warehousing Robots

Authors:

Shuichi Akizuki and Manabu Hashimoto

Abstract: In this research, we propose a method to recognize multiple objects in the shelves of automated warehouses. The purpose of this research is to enhance the reliability of the Hypothesis Verification (HV) method that simultaneously recognizes layout of multiple objects. The proposed method have employed not only the RGB-D consistency between the input scene and the scene hypothesis but also the physical consistency. By considering the physical consistency of the scene hypothesis, the proposed HV method can efficiently reject false one. Experiment results for object which are used at Amazon Picking Challenge 2015 have been confirmed that the recognition success rate of the proposed method is higher than the previous HV method.
Download

Paper Nr: 181
Title:

A New Labeled Quadtree-based Distance for Medical Image Retrieval

Authors:

Mahaman Sani Chaibou and Karim Kalti

Abstract: We present in this paper a quadtree based approach for image retrieval by visual content. Images are represented by quadtrees to take advantage of spatial information and fast indexing. Quadtree nodes represent parts of image. They are labeled according to a predefined set of labels, which depend on the application domain. Most often labels correspond to objects in images of the considered domain. The image database is therefore converted to and indexed as a labeled quadtree database. To search for the most similar images to a query, we compare quadtrees of images by means of a peer-nodes-based two-term distance. The first term compares images according to objects they contain to ensure content type correspondence. The second term compares images according to the shapes of their objects. The approach is applied on medical generic image databases. Experimental results on a database containing six types of medical images show the potential of the proposed CBIR approach.

Paper Nr: 204
Title:

Shot, Scene and Keyframe Ordering for Interactive Video Re-use

Authors:

Lorenzo Baraldi, Costantino Grana, Guido Borghi, Roberto Vezzani and Rita Cucchiara

Abstract: This paper presents a complete system for shot and scene detection in broadcast videos, as well as a method to select the best representative key-frames, which could be used in new interactive interfaces for accessing large collections of edited videos. The final goal is to enable an improved access to video footage and the re-use of video content with the direct management of user-selected video-clips.
Download

Paper Nr: 217
Title:

Feature-augmented Trained Models for 6DOF Object Recognition and Camera Calibration

Authors:

Kripasindhu Sarkar, Alain Pagani and Didier Stricker

Abstract: In this paper we address the problem in the offline stage of 3D modelling in feature based object recognition. While the online stage of recognition - feature matching and pose estimation, has been refined several times over the past decade incorporating filters and heuristics for robust and scalable recognition, the offline stage of creating feature based models remained unchanged. In this work we take advantage of the easily available 3D scanners and 3D model databases like 3D-warehouse, and use them as our source of input for 3D CAD models of real objects. We process on the CAD models to produce feature-augmented trained models which can be used by any online recognition stage of object recognition. These trained models can also be directly used as a calibration rig for performing camera calibration from a single image. The evaluation shows that our fully automatically created feature-augmented trained models perform better in terms of recognition recall over the baseline - which is the tedious manual way of creating feature models. When used as a calibration rig, our feature augmented models achieve comparable accuracy with the popular camera-calibration techniques thereby making them an easy and quick way of performing camera calibration.
Download

Paper Nr: 76
Title:

Facial Emotion Recognition from Kinect Data – An Appraisal of Kinect Face Tracking Library

Authors:

Tanwi Mallick, Palash Goyal, Partha Pratim Das and Arun Kumar Majumdar

Abstract: Facial expression classification and emotion recognition from gray-scale or colour images or videos have been extensively explored over the last two decades. In this paper we address the emotion recognition problem using Kinect 1.0 data and the Kinect Face Tracking Library (KFTL). A generative approach based on facial muscle movements is used to classify emotions. We detect various Action Units (AUs) of the face from the feature points extracted by KFTL and then recognize emotions by Artificial Neural Networks (ANNs) based on the detected AUs. We use six emotions, namely, Happiness, Sadness, Fear, Anger, Surprise and Neutral for our work and appraise the strengths and weaknesses of KFTL in terms of feature extraction, AU computations, and emotion detection. We compare our work with earlier studies on emotion recognition from Kinect 1.0 data.
Download

Paper Nr: 161
Title:

Feature Selection for Emotion Recognition based on Random Forest

Authors:

Sonia Gharsalli, Bruno Emile, Hélène Laurent and Xavier Desquesnes

Abstract: Automatic facial emotion recognition is a challenging problem. Emotion recognition system robustness is particularly difficult to achieve as the similarity of some emotional expressions induces confusion between them. Facial representation needs feature extraction and feature selection. This paper presents a selection method incorporated into an emotion recognition system. Appearance features are firstly extracted by a Gabor filter bank and the huge feature size is reduced by a pretreatment step. Then, an iterative selection method based on Random Forest (RF) feature importance measure is applied. Emotions are finally classified by SVM. The proposed approach is evaluated on the Cohn-Kanade database with seven expressions (anger, happiness, fear, disgust, sadness, surprise and the neutral expression). Emotion recognition rate achieves 95.2% after feature selection and an improvement of 22% for sadness recognition is noticed. PCA is also used to select features and compared to RF base feature selection method. As well, a comparison with emotion recognition methods from literature which use a feature selection step is done.
Download

Paper Nr: 231
Title:

Person Re-identification based on Intelligence Fusion of Color and Depth Information

Authors:

Bohui Zhu, Lijuan Hong and Jin Huang

Abstract: Person re-identification, to identify specific persons from multiply camera network, is a challenging task due to large visual appearance variation caused by changes in view angle, light condition, background and occlusion. To address these challenges, we propose a novel Intelligence Fusion of Color and Depth Information (IFCDI) model which not only use color image but also depth image. In our approach, we introduce a novel human detection method to extract human content from depth image. Different from most existing human detection methods, our method could not only detect human position but extract exact human body pixels from background. Then we select six salient human body parts by utilizing depth information, and extract each part’s appearance features from corresponding color image to construct individual signature. In this way, our model achieves robustness and distinction against variation of view point background clutter, occlusion and human pose. Experiments demonstrate that our method achieves state-of-art performance.

Paper Nr: 232
Title:

Quantifying the Specificity of Near-duplicate Image Classification Functions

Authors:

Richard Connor and Franco Alberto Cardillo

Abstract: There are many published methods for detecting similar and near-duplicate images. Here, we consider their use in the context of unsupervised near-duplicate detection, where the task is to find a (relatively small) near- duplicate intersection of two large candidate sets. Such scenarios are of particular importance in forensic near-duplicate detection. The essential properties of a such a function are: performance, sensitivity, and specificity. We show that, as collection sizes increase, then specificity becomes the most important of these, as without very high specificity huge numbers of false positive matches will be identified. This makes even very fast, highly sensitive methods completely useless. Until now, to our knowledge, no attempt has been made to measure the specificity of near-duplicate finders, or even to compare them with each other. Recently, a benchmark set of near-duplicate images has been established which allows such assessment by giving a near-duplicate ground truth over a large general image collection. Using this we establish a methodology for calculating specificity. A number of the most likely candidate functions are compared with each other and accurate measurement of sensitivity vs. specificity are given. We believe these are the first such figures be to calculated for any such function.
Download

Paper Nr: 234
Title:

Human Recognition in RGBD Combining Object Detectors and Conditional Random Fields

Authors:

Konstantinos Amplianitis, Ronny Hänsch and Ralf Reulke

Abstract: This paper addresses the problem of detecting and segmenting human instances in a point cloud. Both fields have been well studied during the last decades showing impressive results, not only in accuracy but also in computational performance. With the rapid use of depth sensors, a resurgent need for improving existing state-of-the-art algorithms, integrating depth information as an additional constraint became more ostensible. Current challenges involve combining RGB and depth information for reasoning about location and spatial extent of the object of interest. We make use of an improved deformable part model algorithm, allowing to deform the individual parts across multiple scales, approximating the location of the person in the scene and a conditional random field energy function for specifying the object’s spatial extent. Our proposed energy function models up to pairwise relations defined in the RGBD domain, enforcing label consistency for regions sharing similar unary and pairwise measurements. Experimental results show that our proposed energy function provides a fairly precise segmentation even when the resulting detection box is imprecise. Reasoning about the detection algorithm could potentially enhance the quality of the detection box allowing capturing the object of interest as a whole.
Download

Paper Nr: 238
Title:

Human Detection from Aerial Imagery for Automatic Counting of Shellfish Gatherers

Authors:

Mathieu Laroze, Luc Courtrai and Sébastien Lefèvre

Abstract: Automatic human identification from aerial image time series or video sequences is a challenging issue. We propose here a complete processing chain that operates in the context of recreational shellfish gatherers counting in a coastal environment (the Gulf of Morbihan, South Brittany, France). It starts from a series of aerial photographs and builds a mosaic in order to prevent multiple occurrences of the same objects on the overlapping parts of aerial images. To do so, several stitching techniques are reviewed and discussed in the context of large aerial scenes. Then people detection is addressed through a sliding window analysis combining the HOG descriptor and a supervised classifier. Several classification methods are compared, including SVM, Random Forests, and AdaBoost. Experimental results show the interest of the proposed approach, and provides directions for future research.
Download

Paper Nr: 245
Title:

An Efficient Dual Dimensionality Reduction Scheme of Features for Image Classification

Authors:

Hai-Xia Long, Li Zhou, Qiang Zhang, Jing Zhang and Xiao-Guang Li

Abstract: The statistical property of Bag of Word (BoW) model and spatial property of Spatial Pyramid Matching (SPM) are usually used to improve distinguishing ability of features by adding redundant information for image classification. But the increasing of the image feature dimension will cause “curse of dimensionality” problem. To address this issue, a dual dimensionality reduction scheme that combines Locality Preserving Projection (LPP) with the Principal Component Analysis (PCA) has been proposed in the paper. Firstly, LPP has been used to reduce the feature dimensions of each SPM and each dimensionality reduced feature vector is cascaded into a global vector. After that, the dimension of the global vector is reduced by PCA. The experimental results on four standard image classification databases show that, compared with the benchmark ScSPM( Sparse coding based Spatial Pyramid Matching), when the dimension of image features is reduced to only 5% of that of the baseline scheme, the classification performance of the dual dimensionality reduction scheme proposed in this paper still can be improved about 5%.
Download

Area 4 - Applications and Services

Full Papers
Paper Nr: 42
Title:

Camera Placement Optimization Conditioned on Human Behavior and 3D Geometry

Authors:

Pranav Mantini and Shishir K. Shah

Abstract: This paper proposes an algorithm to optimize the placement of surveillance cameras in a 3D infrastructure. The key differentiating feature in the algorithm design is the incorporation of human behavior within the infrastructure for optimization. Infrastructures depending on their geometries may exhibit regions with dominant human activity. In the absence of observations, this paper presents a method to predict this human behavior and identify such regions to deploy an effective surveillance scenario. Domain knowledge regarding the infrastructure was used to predict the possible human motion trajectories in the infrastructure. These trajectories were used to identify areas with dominant human activity. Furthermore, a metric that quantifies the position and orientation of a camera based on the observable space, activity in the space, pose of objects of interest within the activity, and their image resolution in camera view was defined for optimization. This method was compared with the state-of-the-art algorithms and the results are shown with respect to amount of observable space, human activity, and face detection rate per camera in a configuration of cameras.
Download

Paper Nr: 59
Title:

A Smart Visual Information Tool for Situational Awareness

Authors:

Marco Vernier, Manuela Farinosi and Gian Luca Foresti

Abstract: In the last years, social media have grown in popularity with millions of users that everyday produce and share online digital content. This practice reveals to be particularly useful in extra-ordinary context, such as during a disaster, when the data posted by people can be integrated with traditional emergency management tools and used for event detection and hyperlocal situational awareness. In this contribution, we present SVISAT, an innovative visualization system for Twitter data mining, expressly conceived for signaling in real time a given event through the uploading and sharing of visual information (i.e., photos). Using geodata, it allows to display on a map the wide area where the event is happening, showing at the same time the most popular hashtags adopted by people to spread the tweets and the most relevant images/photos which describe the event itself.
Download

Paper Nr: 132
Title:

Infinite 3D Modelling Volumes

Authors:

E. Funk and A. Börner

Abstract: Modern research in mobile robotics proposes to combine localization and perception in order to recognize previously visited locations and thus to improve localization as well as the object recognition processes recursively. A crucial issue is to perform updates of the scene geometry when novel observations become available. The reason is that a practical application often requires a system to model large 3D environments at high resolution which exceeds the storage of the local memory. The underlying work presents an optimized volume data structure for infinite 3D environments which facilitates i) successive world model updates without the need to recompute the full dataset, ii) very fast in-memory data access scheme enabling the integration of high resolution 3D sensors in real-time, iii) efficient level-of-detail for visualization and coarse geometry updates. The technique is finally demonstrated on real world application scenarios which underpin the feasibility of the research outcomes.
Download

Paper Nr: 156
Title:

Probability-based Scoring for Normality Map in Brain MRI Images from Normal Control Population

Authors:

Thach-Thao Duong

Abstract: The increasing availability of MRI brain data opens up a research direction for abnormality detection which is necessary to on-time detection of impairment and performing early diagnosis. The paper proposes scores based on z-score transformation and kernel density estimation (KDE) which are respectively Gaussian-based assumption and nonparametric modeling to detect the abnormality in MRI brain images. The methodologies are applied on gray-matter-based score of Voxel-base Morphometry (VBM) and sparse-based score of Sparse-based Morphometry (SBM). The experiments on well-designed normal control (CN) and Alzheimer disease (AD) subsets extracted from MRI data set of Alzheimer’s Disease Neuroimaging Initiative (ADNI) are conducted with threshold-based classification. The analysis of abnormality percentage of AD and CN population is carried out to validate the robustness of the proposed scores. The further cross validation on Linear discriminant analysis (LDA) and Support vector machine (SVM) classification between AD and CN show significant accuracy rate, revealing the potential of statistical modeling to measure abnormality from a population of normal subjects.
Download

Short Papers
Paper Nr: 10
Title:

Hubless 3D Medical Image Bundle Registration

Authors:

Rémi Agier, Sébastien Valette, Laurent Fanton, Pierre Croisille and Rémy Prost

Abstract: We propose a hubless medical image registration scheme, able to conjointly register massive amounts of images. Exploiting 3D points of interest combined with global optimization, our algorithm allows partial matches, does not need any prior information (full body image as a central patient model) and exhibits very good robustness by exploiting inter-volume relationships. We show the efficiency of our approach with the rigid registration of 400 CT volumes, and we provide an eye-detection application as a first step to patient image anonymization.
Download

Paper Nr: 22
Title:

Non-linear Distance-based Semi-supervised Multi-class Gesture Recognition

Authors:

Husam Al-Behadili, Arne Grumpe and Christian Wöhler

Abstract: The automatic recognition of gestures is important in a variety of applications, e.g. human-machine-interaction. Commonly, different individuals execute gestures in a slightly different manner and thus a fully labelled dataset is not available while unlabelled data may be acquired from an on-line stream. Consequently, gesture recognition systems should be able to be trained in a semi-supervised learning scenario. Additionally, real-time systems and large-scale data require a dimensionality reduction of the data to reduce the processing time. This is commonly achieved by linear subspace projections. Most of the gesture data sets, however, are non-linearly distributed. Hence, linear sub-space projection fails to separate the classes. We propose an extension to linear subspace projection by applying a non-linear transformation to a space of higher dimensional after the linear subspace projection. This mapping, however, is not explicitly evaluated but implicitly used by a kernel function. The kernel nearest class mean (KNCM) classifier is shown to handle the non-linearity as well as the semi-supervised learning scenario. The computational expense of the non-linear kernel function is compensated by the dimensionality reduction of the previous linear subspace projection. The method is applied to a gesture dataset comprised of 3D trajectories. The trajectories were acquired using the Kinect sensor. The results of the semi-supervised learning show high accuracies that approach the accuracy of a fully supervised scenario already for small dimensions of the subspace and small training sets. The accuracy of the semi-supervised KNCM exceeds the accuracy of the original nearest class mean classifier in all cases.
Download

Paper Nr: 23
Title:

Neural Network based Novelty Detection for Incremental Semi-supervised Learning in Multi-class Gesture Recognition

Authors:

Husam Al-Behadili, Arne Grumpe and Christian Wöhler

Abstract: The problems of infinitely long data streams and its concept drift as well as non-linearly separable classes and the possible emergence of “novel classes” are topics of high relevance for gesture data streaming based automatic recognition systems. To address these problems we apply a semi-supervised learning technique using a neural network in combination with an incremental update rule. Neural networks have been shown to handle non-linearly separable data and the incremental update ensures that the parameters of the classifier follow the “concept-drift” without the necessity of an increased training set. Since a semi-supervised learning technique is sensitive to false labels, we apply an outlier detection method based on extreme value theory and confidence band intervals. The proposed algorithm uses the extreme learning machine, which is easily updated and works with multi-classes. A comparison with an auto-encoder neural network shows that the proposed algorithm has superior properties. Especially, the processing time is greatly reduced.
Download

Paper Nr: 27
Title:

Visual Target Tracking in Clay Pigeon Shooting Sports: Estimation of Flight Parameters and Throwing Range

Authors:

Franz Andert, Simon Freudenthal and Stefan Levedag

Abstract: This paper presents a method to estimate the trajectory and the flight distance of thrown pigeon clays. The basic principle is to measure the beginning of the flight with a camera system in order to forecast the further flight down to the ground impact. The demand of such advanced measuring methods arises from sporting clays competition regulations, where the launching machines have to be adjusted towards specific throwing angles and ranges. The presented method uses a wide-baseline stereo camera system (32 m camera distance) to measure the 3D clay disc positions, and the flight parameters are then identified by aerodynamic and kinematic considerations. This allows to estimate the whole path and the throwing distance, especially without a need to measure the ground impact itself. Applying this method to sporting clays facilities, the launching machines can be adjusted easier and more precisely, being advantageous especially for competitions. Additionally, it becomes possible to obtain the theoretical throwing distance on small sports areas bounded by nets or walls where a ground impact is not measurable.
Download

Paper Nr: 35
Title:

The Role of Machine Learning in Medical Data Analysis. A Case Study: Flow Cytometry

Authors:

Paolo Rota, Florian Kleber, Michael Reiter, Stefanie Groeneveld-Krentz and Martin Kampel

Abstract: In last years automated medical data analysis turned out to be one of the frontiers of Machine Learning. Medical operators are still reluctant to rely completely in automated solutions at diagnosis stage. However, Machine Learning researchers have focused their attention in this field, proposing valuable methods having often an outcome comparable to human evaluation. In this paper we give a brief overview on the role of Computer Vision and Machine Learning in solving medical problems in an automatic (supervised or unsupervised) fashion, we consider then a case study of Flow Cytometry data analysis for MRD assessment in Acute Lymphoblastic Leukemia. The clinical evaluation procedure of this type of data consists in a time taking manual labeling that can be performed only after an intensive training, however sometimes different experience may lead to different opinions. We are therefore proposing two different approaches: the first is generative semi-supervised Gaussian Mixture Model based approach, the latter is a discriminative semi-supervised Deep Learning based approach.
Download

Paper Nr: 54
Title:

Chinese Character Images Recognition and Its Application in Mobile Platform

Authors:

Gang Gu, Jiangqin Wu, Tianjiao Mao and Pengcheng Gao

Abstract: Chinese characters are profound and polysemantic. Reading a Chinese character is a procedure of image understanding, if the Chinese character is captured as an image. Due to the complexity of structure and plenty of Chinese characters, there always exist some unfamiliar characters when reading books, so it would be great if a tool is provided to help users understand the meaning of unknown characters. We propose a method that combines global and local features(i.e., GIST and SIFT features) to recognize the Chinese character images captured from mobile camera. Three schemes are investigated based on practical considerations. Firstly,the so-called GIST and SIFT descriptors extracted from Chinese character images are adopted purely as features. Then filter the SIFT feature points of similar Chinese character images based on GIST feature. Finally, compress the storage of GIST and SIFT descriptors to accommodate mobile platform with Similarity Sensitive Coding(SSC) algorithm. At the stage of recognition, the top 2k Chinese characters are firstly obtained by hamming distance in GIST feature space, then reorder the selected characters as final result by SIFT feature. We build an Android app that implements the recognition algorithm. Experiment shows satisfying recognition results of our proposed application compared to other Android apps.
Download

Paper Nr: 71
Title:

Welding Groove Mapping - Implementation and Evaluation of Image Processing Algorithms on Shiny Surfaces

Authors:

Cristiano Rafael Steffens, Bruno Quaresma Leonardo, Sidnei Carlos da Silva Filho, Valquiria Huttner, Vagner Santos da Rosa and Silvia Silva da Costa Botelho

Abstract: Electric arc welding is a key process in the heavy steel industries. It is a very complex task that demands a high degree of control in order to meet the international standards for fusion welding. We propose a Vision-Based Measurement (VBM) system and evaluate how different algorithms impact the results. The proposed system joins hardware and software to image the welding plates using a single CMOS camera, run computer vision algorithms and control the welding equipment. A complete prototype, using a commercial linear welding robot is presented. The evaluation of the system as a groove mapping equipment, considering different processing algorithms combined with noise removal and line segment detection techniques, allows us to define the appropriated approach for shop floor operation, combining low asymptotic cost and measurement quality.
Download

Paper Nr: 79
Title:

Fast Gait Recognition from Kinect Skeletons

Authors:

Tanwi Mallick, Ankit Khedia, Partha Pratim Das and Arun Kumar Majumdar

Abstract: Recognizing persons from gait has attracted attention in computer vision research for over a decade and a half. To extract the motion information in gait, researchers have either used wearable markers or RGB videos. Markers naturally offer good accuracy and reliability but has the disadvantage of being intrusive and expensive. RGB images, on the other hand, need high processing time to achieve good accuracy. Advent of low-cost depth data from Kinect 1.0 and its human-detection and skeleton-tracking abilities have opened new opportunities in gait recognition. Using skeleton data it gets cheaper and easier to get the body-joint information that can provide critical clue to gait-related motions. In this paper, we attempt to use the skeleton stream from Kinect 1.0 for gait recognition. Various types of gait features are extracted from the joint-points in the stream and the appropriate classifiers are used to compute effective matching scores. To test our system and compare performance, we create a benchmark data set of 5 walks each for 29 subjects and implement a state-of-the-art gait recognizer for RGB videos. Tests show a moderate accuracy of 65% for our system. This is low compared to the accuracy of RGB-based method (which achieved 83% on the same data set) but high compared to similar skeleton-based approaches (usually below 50%). Further we compare execution time of various parts of our system to highlight efficiency advantages of our method and its potential as a real-time recogniser if an optimized implementation can be done.
Download

Paper Nr: 151
Title:

Automated Soft Contact Lens Detection using Gradient based Information

Authors:

Balender Kumar, Aditya Nigam and Phalguni Gupta

Abstract: The personal identification number (PIN), credit card numbers and email passwords etc have something in common. All of them can easily be guessed or stolen. Currently, users have been encouraged to create strong passwords by using biometric techniques like fingerprint, palmprint, iris and other such traits. In all biometric techniques, iris recognition can be considered as one of the best, well known and accurate technique but it can be spoofed very easily using plastic eyeballs, printed iris and contact lens. Attacks by using soft contact lens are more challenging because they have transparent texture that can blur the iris texture. In this paper a robust algorithm to detect the soft contact lens by working through a small ring-like area near the outer edge from the limbs boundary and calculate the gradient of candidate points along the lens perimeter is proposed. Experiments are conducted on IIITD-Vista, IIITD-Cogent, UND 2010 and our indigenous database. Result of the experiment indicate that our method outperforms previous soft lens detection techniques in terms of False Rejection Rate and False Acceptance Rate.
Download

Paper Nr: 153
Title:

Finger-Knuckle-Print ROI Extraction using Curvature Gabor Filter for Human Authentication

Authors:

Aditya Nigam and Phalguni Gupta

Abstract: Biometric based human recognition is a most obvious method for automatically resolving personal identity with high reliability. In this paper we present a novel finger-knuckle-print ROI extraction algorithm. The basic Gabor filter is modified to Curvature Gabor Filter (CGF) to obtain central knuckle line and central knuckle point which are further used to extract FKP ROI image. Largest public FKP database is used for testing which consists of 7;920 images collected from 660 different fingers. The results has been compared with the only other existing Convex Direction Coding (CDC) ROI extraction algorithm. It has been observed that the proposed algorithm achieves better performance with EER drop percentage more than 20% in all experiments. This suggests that the proposed CGF algorithm has been extracting ROI more consistently then CDC and hence can facilitates any finger-knuckle-print based biometric systems.
Download

Paper Nr: 224
Title:

Emotion Recognition through Body Language using RGB-D Sensor

Authors:

Lilita Kiforenko and Dirk Kraft

Abstract: This paper presents results on automatic non-acted human emotion recognition using full standing body movements and postures. The focus of this paper is to show that it is possible to classify emotions using a consumer depth sensor in an everyday scenario. The features for classification are body joint rotation angles and meta-features that are fed into a Support Vector Machines classifier. The work of Gaber-Barron and Si (2012) is used as inspiration and many of their proposed meta-features are reimplemented or modified. In this work we try to identify ”basic” human emotions, that are triggered by various visual stimuli. We present the emotion dataset that is recorded using Microsoft Kinect for Windows sensor and body joints rotation angles that are extracted using Microsoft Kinect Software Development Kit 1.6. The classified emotions are curiosity, confusion, joy, boredom and disgust. We show that human real emotions can be classified using body movements and postures with a classification accuracy of 55.62%.
Download

Paper Nr: 226
Title:

Safeguarding Privacy by Reliable Automatic Blurring of Faces in Mobile Mapping Images

Authors:

Steven Puttemans, Stef Van Wolputte and Toon Goedemé

Abstract: When capturing images in the wild containing pedestrians, privacy issues remain a major concern for industrial applications. Our application, collecting cycloramic mobile mapping data in crowded environments, is an example of this. If the data is processed and accessed by third parties, privacy of pedestrians must be ensured. This is where pedestrian detectors come into play, used to detect individuals and privacy mask them through blurring. The problem of undesired false positive detections, typical for pedestrian detectors and unavoidable, still leaves undesired areas of the images being blurred. We tackled this problem using application-specific scene constraints, modelled by a height-position mapping based on scene-specific pedestrian annotation data, combined with reducing the field of interest and case-specific false positive elimination classifiers. We applied a soft blurring technique to avoid the artificial look of simply applying Gaussian blurring to the found detections, which results in an effective fully-automated masking pipeline for privacy safeguarding in mobile mapping images. We prove that we can use pre-trained pedestrian detection models, but by collecting a limited amount of application-specific annotations and by exploiting scene-specific constraints, we are able to boost the detection accuracy enormously.
Download

Paper Nr: 11
Title:

Image Transformation of Eye Areas for Synthesizing Eye-contacts in Video Conferencing

Authors:

Takuya Inoue, Tomokazu Takahashi, Takatsugu Hirayama, Yasutomo Kawanishi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase, Takayuki Kurozumi and Kunio Kashino

Abstract: Recently, the spread of Web cameras has facilitated video-conferencing. Since a Web camera is usually located outside the display while the user looks at his/her partner in the display, there is a problem that they cannot establish eye contact with each other. Various methods have been proposed to solve this problem, but most of them required specific sensors. In this paper, we propose a method that transforms the eye areas to synthesize eye contact using a single camera that is commonly implemented in laptop computers and mobile phones. Concretely, we implemented a system which transforms the user’s eye areas in an image to his/her eye image with a straight gaze to the camera only when the user’s gaze falls in a range that the partner would perceive eye contact.
Download

Paper Nr: 60
Title:

Patient Distraction and Entertainment System for Magnetic Resonance Imaging using Visual Effects Synchronized to the Scanner Acoustic Noise

Authors:

Refaat E. Gabr and Ponnada A. Narayana

Abstract: Acoustic noise is a major source of discomfort for patients undergoing magnetic resonance imaging (MRI) examination. Loud noise is generated from fast gradient switching during MRI scanning. The noise level is reduced by wearing hearing protection devices, but the noise cannot be entirely avoided. Patient distraction techniques can shift the attention away from the annoying noise. We implemented a simple and low-cost system for patient distraction using visual effects that are synchronized to the gradient acoustic noise. This multisensory approach for patient distraction is implemented on a 3.0T scanner and tested in six healthy adult volunteers. After the scan was completed, the volunteers were asked about their scan experience with visualization, rating their preference on a 0-10 scale. The images were visual inspected for any artifacts. All volunteers indicated improved experience with the proposed visualization system with an average score of 6.3. The image quality was not affected by visualization.
Download

Paper Nr: 77
Title:

Image Encryption using Improved Keystream Generator of Achterbahn-128

Authors:

Aissa Belmeguenai, Oulaya Berrak and Khaled Mansouri

Abstract: The images transmission become more and more widely used in everyday life and even have been known to be vulnerable to interception and unauthorized access. The security of their transmission became necessary. In this paper an improved version of the Achterbahn -128 for image encryption and decryption have been proposed. The proposed design is based on seventeen binary primitive nonlinear feedback shift registers (NLFSRs) whose polynomials are primitive and a nonlinear Boolean function. The outputs of seventeen registers are combined by the nonlinear Boolean function to produce keysteam sequence. The proposed scheme is compared to a Achterbahn-128. The results of several experimental, statistical analysis and sensitivity analysis show that the proposed image encryption scheme is better than Achterbahn-128 and provides an efficient and secure way for image encryption and transmission.
Download

Paper Nr: 127
Title:

Structural Synthesis based on PCA: Methodology and Evaluation

Authors:

Sriniwas Chowdhary Maddukuri, Wolfgang Heidl, Christian Eitzinger and Andreas Pichler

Abstract: In recent surface inspections systems, interactive training of fault classification is becoming state of the art. While being most informative for both training and explanation, fault samples at the decision boundary are rare in production datasets. Therefore, augmenting the dataset with synthesized samples at the decision boundary could greatly accelerate the training procedure. Traditionally, synthesis methods had proven to be useful for computer graphics applications and have only been applied for generating samples with stochastic and regular texture patterns. Presently, the state of the art synthesis methods assume that the test sample is available and are feature independent. In the context of surface inspection systems, incoming samples are often classified to several defect classes after the feature extraction stage. Therefore, the goal in this research work is to perform the synthesis for a new feature vector such that the resulting synthesized image visualizes the decision boundary. This paper presents a methodology for structural synthesis based on principal components analysis. The methodology expects the samples of the training set as an input. It renders the synthesized form of the input samples of training set through eigenimages and its computed coefficients by solving a linear regression problem. The methodology has been evaluated on an industrial dataset to validate its performance.
Download

Paper Nr: 191
Title:

A Content-based Watermarking Scheme based on Clifford Fourier Transform

Authors:

Maroua Affes, Malek Sellami Meziou, Yassine Lehiani, Marius Preda and Faouzi Ghorbel

Abstract: In this paper, we propose a new watermarking method based on Harris interest points and Fourier Clifford Transform. We employed Harris detector to select robust interest points and to generate some non-overlapped circular interest regions. Each region was transformed into Clifford Fourier domain and the watermark was embedded into the Clifford transform coefficients magnitude. Experimental results show the robustness of the proposed method against JPEG compression.
Download

Paper Nr: 192
Title:

Video Watermarking Algorithm based on Combined Transformation and Digital Holography

Authors:

De Li, XueZhe Jin and Jongweon Kim

Abstract: Recent researches of robust digital watermarking are focused on the processing method in frequency field. Most of those algorithms are limited on one transform domain or combine different kinds of transform field together. In this paper, a new video watermarking algorithm is proposed, which is applied in 2D DWT and 3D DCT combined with Discrete Fraction Random Transform (DFRNT) encryption and digital holography respectively. The embedded information is applied in 2D bar-code which obtained from holograph. Extraction of the watermark uses the tech of holographic reconstruct. The results of experiment show that the proposed watermarking algorithm has a good performance of robustness and security.
Download

Paper Nr: 196
Title:

EyeRec: An Open-source Data Acquisition Software for Head-mounted Eye-tracking

Authors:

Thiago Santini, Wolfgang Fuhl, Thomas Kübler and Enkelejda Kasneci

Abstract: Head-mounted eye tracking offers remarkable opportunities for human computer interaction in dynamic scenarios (e.g., driving assistance). Although a plethora of proprietary software for the acquisition of such eyetracking data exists, all of them are plagued by a critical underlying issue: their source code is not available to the end user. Thus, a researcher is left with few options when facing a scenario in which the proprietary software does not perform as expected. In such a case, the researcher is either forced to change the experimental setup (which is undesirable) or invest a considerable amount of time and money in a different eye-tracking system (which may also underperform). In this paper, we introduce EyeRec, an open-source data acquisition software for head-mounted eye-tracking. Out of the box, EyeRec offers real-time state-of-the-art pupil detection and gaze estimation, which can be easily replaced by user implemented algorithms if desired. Moreover, this software supports multiple head-mounted eye-tracking hardware, records eye and scene videos, and stores pupil and gaze information, which are also available as a real-time stream. Thus, EyeRec can be an efficient means towards facilitating gazed-based human computer interaction research and applications. Available at: www.perception.uni-tuebingen.de
Download

Paper Nr: 209
Title:

A Part based Modeling Approach for Invoice Parsing

Authors:

Enes Aslan, Tugrul Karakaya, Ethem Unver and Yusuf Sinan AKGUL

Abstract: Automated invoice processing and information extraction has attracted remarkable interest from business and academic circles. Invoice processing is a very critical and costly operation for participation banks because credit authorization process must be linked with real trade activity via invoices. The classical invoice processing systems first assign the invoices to an invoice class but any error in document class decision will cause the invoice parsing to be invalid. This paper proposes a new invoice class free parsing method that uses a two-phase structure. The first phase uses individual invoice part detectors and the second phase employs an efficient part-based modeling approach. At the first phase, we employ different methods such as SVM, maximum entropy and HOG to produce candidates for the various types of invoice parts. At the second phase, the basic idea is to parse an invoice by parts arranged in a deformable composition similar to face or human body detection from digital images. The main advantage of the part-based modeling (PBM) approach is that this system can handle any type of invoice, a crucial functionality for business processes at participation banks. The proposed system is tested with real invoices and experimental results confirm the effectiveness of the proposed approach.
Download

Area 5 - Motion, Tracking and Stereo Vision

Full Papers
Paper Nr: 8
Title:

Relative Pose Estimation from Straight Lines using Parallel Line Clustering and its Application to Monocular Visual Odometry

Authors:

Naja von Schmude, Pierre Lothe and Bernd Jähne

Abstract: This paper tackles the problem of relative pose estimation between two monocular camera images in textureless scenes. Due to a lack of point matches, point-based approaches such as the 5-point algorithm often fail when used in these scenarios. Therefore we investigate relative pose estimation from line observations. We propose a new approach in which the relative pose estimation from lines is extended by a 3D line direction estimation step. The estimated line directions serve to improve the robustness and the efficiency of all processing phases: they enable us to guide the matching of line features and allow an efficient calculation of the relative pose. First, we describe in detail the novel 3D line direction estimation from a single image by clustering of parallel lines in the world. Secondly, we propose an innovative guided matching in which only clusters of lines with corresponding 3D line directions are considered. Thirdly, we introduce the new relative pose estimation based on 3D line directions. Finally, we combine all steps to a visual odometry system. We evaluate the different steps on synthetic and real sequences and demonstrate that in the targeted scenarios we outperform the state-of-the-art in both accuracy and computation time.
Download

Paper Nr: 29
Title:

Novel Ways to Estimate Homography from Local Affine Transformations

Authors:

Daniel Barath and Levente Hajder

Abstract: State-of-the-art 3D reconstruction methods usually apply point correspondences in order to compute the 3D geometry of objects represented by dense point clouds. However, objects with relatively large and flat surfaces can be most accurately reconstructed if the homographies between the corresponding patches are known. Here we show how the homography between patches on a stereo image pair can be estimated. We discuss that these proposed estimators are more accurate than the widely used point correspondence-based techniques because the latter ones only consider the last column (the translation) of the affine transformations, whereas the new algorithms use all the affine parameters. Moreover, we prove that affine-invariance is equivalent to perspective-invariance in the case of known epipolar geometry. Three homography estimators are proposed. The first one calculates the homography if at least two point correspondences and the related affine transformations are known. The second one computes the homography from only one point pair, if the epipolar geometry is estimated beforehand. These methods are solved by linearization of the original equations, and the refinements can be carried out by numerical optimization. Finally, a hybrid homography estimator is proposed that uses both point correspondences and photo-consistency between the patches. The presented methods have been quantitatively validated on synthesized tests. We also show that the proposed methods are applicable to real-world images as well, and they perform better than the state-of-the-art point correspondence-based techniques.
Download

Paper Nr: 36
Title:

Geometric Eye Gaze Tracking

Authors:

Adam Strupczewski, Błażej Czupryński, Jacek Naruniec and Kamil Mucha

Abstract: This paper presents a novel eye gaze estimation method based on calculating the gaze vector in a geometric approach. There have been many publications in the topic of eye gaze estimation, but most are related to using dedicated infra red equipment and corneal glints. The presented approach, on the other hand, assumes that only an RGB input image of the user’s face is available. Furthermore, it requires no calibration but only simple one-frame initialization. In comparison to other systems presented in literature, our method has better accuracy. The presented method relies on determining the 3D location of the face and eyes in the initialization frame, tracking these locations in each consecutive frame and using this knowledge to estimating the gaze vector and point where the user is looking. The algorithm runs in real time on mobile devices.
Download

Paper Nr: 74
Title:

Regularization Terms for Motion Estimation - Links with Spatial Correlations

Authors:

Yann Lepoittevin and Isabelle Herlin

Abstract: Motion estimation from image data has been widely studied in the literature. Due to the aperture problem, one equation with two unknowns, a Tikhonov regularization is usually applied, which constrains the estimated motion field. The paper demonstrates that the use of regularization functions is equivalent to the definition of correlations between pixels and the formulation of the corresponding correlation matrices is given. This equivalence allows to better understand the impact of the regularization with a display of the correlation values as images. Such equivalence is of major interest in the context of image assimilation as these methods are based on the minimization of errors that are correlated on the space-time domain. It also allows to characterize the role of the errors during the assimilation process.
Download

Paper Nr: 75
Title:

Vision-based Robotic System for Object Agnostic Placing Operations

Authors:

Nikolaos Rofalis, Lazaros Nalpantidis, Nils Axel Andersen and Volker Krüger

Abstract: Industrial robots are part of almost all modern factories. Even though, industrial robots nowadays manipulate objects of a huge variety in different environments, exact knowledge about both of them is generally assumed. The aim of this work is to investigate the ability of a robotic system to operate within an unknown environment manipulating unknown objects. The developed system detects objects, finds matching compartments in a placing box, and ultimately grasps and places the objects there. The developed system exploits 3D sensing and visual feature extraction. No prior knowledge is provided to the system, neither for the objects nor for the placing box. The experimental evaluation of the developed robotic system shows that a combination of seemingly simple modules and strategies can provide effective solution to the targeted problem.
Download

Paper Nr: 87
Title:

Direct Stereo Visual Odometry based on Lines

Authors:

Thomas Holzmann, Friedrich Fraundorfer and Horst Bischof

Abstract: We propose a novel stereo visual odometry approach, which is especially suited for poorly textured environments. We introduce a novel, fast line segment detector and matcher, which detects vertical lines supported by an IMU. The patches around lines are then used to directly estimate the pose of consecutive cameras by minimizing the photometric error. Our algorithm outperforms state-of-the-art approaches in challenging environments. Our implementation runs in real-time and is therefore well suited for various robotics and augmented reality applications.
Download

Paper Nr: 94
Title:

Assessing Facial Expressions in Virtual Reality Environments

Authors:

Catarina Runa Miranda and Verónica Costa Orvalho

Abstract: Humans rely on facial expressions to transmit information, like mood and intentions, usually not provided by the verbal communication channels. The recent advances in Virtual Reality (VR) at consumer-level (Oculus VR 2014) created a shift in the way we interact with each other and digital media. Today, we can enter a virtual environment and communicate through a 3D character. Hence, to the reproduction of the users’ facial expressions in VR scenarios, we need the on-the-fly animation of the embodied 3D characters. However, current facial animation approaches with Motion Capture (MoCap) are disabled due to persistent partial occlusions produced by the VR headsets. The unique solution available for this occlusion problem is not suitable for consumer-level applications, depending on complex hardware and calibrations. In this work, we propose consumer-level methods for facial MoCap under VR environments. We start by deploying an occlusions-support method for generic facial MoCap systems. Then, we extract facial features to create Random Forests algorithms that accurately estimate emotions and movements in occluded facial regions. Through our novel methods, MoCap approaches are able to track non-occluded facial movements and estimate movements in occluded regions, without additional hardware or tedious calibrations. We deliver and validate solutions to facilitate face-to-face communication through facial expressions in VR environments.
Download

Paper Nr: 113
Title:

A Turntable-based Approach for Ground Truth Tracking Data Generation

Authors:

Zoltán Pusztai and Levente Hajder

Abstract: Quantitative evaluation of feature trackers can lead significant improvements in accuracy. There are widely used ground truth databases in the field. One of the most popular datasets is the Middlebury database to compare optical flow algorithms. However, the database does not contain rotating 3D objects. This paper proposes a turntable-based approach that fills this gap. The key challenge here is to calibrate very accurately the applied camera, projector, and turntable. We show here that this is possible, even if just a simple chessboard plane is used for the calibration. The proposed approach is validated on 3D reconstruction and ground truth tracking data generation of real-world objects.
Download

Paper Nr: 155
Title:

Enhanced Depth Estimation using a Combination of Structured Light Sensing and Stereo Reconstruction

Authors:

Andreas Wittmann, Anas Al-Nuaimi, Eckehard Steinbach and Georg Schroth

Abstract: We present a novel approach for depth sensing that combines structured light scanning and stereo reconstruc- tion. High-resolution disparity maps are derived in an iterative-upsampling process that jointly optimizes measurements from graph cuts based stereo reconstruction and structured light sensing using an accelerated α-expansion algorithm. Different from previously proposed fusion approaches, the disparity estimation is initialized using the low-resolution structured light prior. This results in a dense disparity map that can be computed very efficiently and which serves as an improved prior for subsequent iterations at higher resolu- tions. The advantages of the proposed fusion approach over the sole use of stereo are threefold. First, for pixels that exhibit prior knowledge from structured lighting, a reduction of the disparity search range to the uncertainty interval of the prior allows for a significant reduction of ambiguities. Second, the resulting limited search range greatly reduces the runtime of the algorithm. Third, the structured light prior enables a dynamic tuning of the smoothness constraint to allow for a better depth estimation for inclined surfaces.
Download

Paper Nr: 162
Title:

A Robust Particle Filtering Approach with Spatially-dependent Template Selection for Medical Ultrasound Tracking Applications

Authors:

Marco Carletti, Diego Dall'Alba, Marco Cristani and Paolo Fiorini

Abstract: Tracking moving organs captured by ultrasound imaging techniques is of fundamental importance in many applications, from image-guided radiotherapy to minimally invasive surgery. Due to operative constraints, tracking has to be carried out on-line, facing classic computer vision problems that are still unsolved in the community. One of them is the update of the template, which is necessary to avoid drifting phenomena in the case of template-based tracking. In this paper, we offer an innovative and robust solution to this problem, exploiting a simple yet important aspect which often holds in biomedical scenarios: in many cases, the target (a blood vessel, cyst or localized lesion) exists in a semi-static operative field, where the unique motion is due to organs that are subjected to quasi-periodic movements. This leads the target to occupy certain areas of the scene at some times, exhibiting particular visual layouts. Our solution exploits this scenario, and consists into a template-based particle filtering strategy equipped with a spatially-localized vocabulary, which in practice suggests the tracker the most suitable template to be used among a set of available ones, depending on the proposal distribution. Experiments have been performed on the MICCAI CLUST 2015 benchmark, reaching an accuracy (i.e. mean tracking error) of 1.11 mm and a precision of 1.53 mm. These results widely satisfy the clinical requirements imposed by image guided surgical procedure and show fostering future developments.
Download

Paper Nr: 169
Title:

Shape and Reflectance from RGB-D Images using Time Sequential Illumination

Authors:

Matis Hudon, Adrien Gruson, Paul Kerbiriou, Rémi Cozot and Kadi Bouatouch

Abstract: In this paper we propose a method for recovering the shape (geometry) and the diffuse reflectance from an image (or video) using a hybrid setup consisting of a depth sensor (Kinect), a consumer camera and a partially controlled illumination (using a flash). The objective is to show how combining RGB-D acquisition with a sequential illumination is useful for shape and reflectance recovery. A pair of two images are captured: one non flashed (image under ambient illumination) and a flashed one. A pure flash image is computed by subtracting the non flashed image from the flashed image. We propose an novel and near real-time algorithm, based on a local illumination model of our flash and the pure flash image, to enhance geometry (from the noisy depth map) and recover reflectance information.
Download

Paper Nr: 210
Title:

Automatic and Generic Evaluation of Spatial and Temporal Errors in Sport Motions

Authors:

Marion Morel, Richard Kulpa, Anthony Sorel, Catherine Achard and Séverine Dubuisson

Abstract: Automatically evaluating and quantifying the performance of a player is a complex task since the important motion features to analyze depend on the type of performed action. But above all, this complexity is due to the variability of morphologies and styles of both the experts who perform the reference motions and the novices. Only based on a database of experts' motions and no additional knowledge, we propose an innovative 2-level DTW (Dynamic Time Warping) approach to temporally and spatially align the motions and extract the imperfections of the novice's performance for each joints. In this study, we applied our method on tennis serve but since it is automatic and morphology-independent, it can be applied to any individual motor performance.
Download

Paper Nr: 211
Title:

Selective Use of Appropriate Image Pairs for Shape from Multiple Motions based on Gradient Method

Authors:

Norio Tagawa and Syouta Tsukada

Abstract: For the gradient-based shape from motion, relative motions with various directions at each 3-D point on a target object are generally effective for accurate shape recovery. On the other hand, a proper motion size exists for each 3-D point having an intensity pattern and a depth that varied in each, i.e., a too large motion causes a large error in depth recovery as an alias problem, and a too small motion is inappropriate from the viewpoint of an SNR. Application of random camera rotations imitating involuntary eye movements of a human eyeball has been proposed, which can generate multiple image pairs. In this study, in order to realize accurate shape recovery, we improve the gradient method based on the multiple image pairs by selecting appropriate image pairs to be used. Its effectiveness is verified through experiments using the actual camera system that we developed.
Download

Paper Nr: 228
Title:

A Semi-local Surface Feature for Learning Successful Grasping Affordances

Authors:

Mikkel Tang Thomsen, Dirk Kraft and Norbert Krüger

Abstract: We address the problem of vision based grasp affordance learning and prediction on novel objects by proposing a new semi-local shape-based descriptor, the Sliced Pineapple Grid Feature (SPGF). The primary characteristic of the feature is the ability to encode semantically distinct surface structures, such as “walls”, “edges” and “rims”, that show particular potential as a primer for grasp affordance learning and prediction. When the SPGF feature is used in combination with a probabilistic grasp affordance learning approach, we are able to achieve grasp success-rates of up to 84% for a varied object set of three classes and up to 96% for class specific objects.
Download

Paper Nr: 240
Title:

Watch Where You’re Going! - Pedestrian Tracking Via Head Pose

Authors:

Sankha S. Mukherjee, Rolf H. Baxter and Neil M. Robertson

Abstract: In this paper we improve pedestrian tracking using robust, real-time human head pose estimation in low resolution RGB data without any smoothing motion priors such as direction of motion. This paper presents four principal novelties. First, we train a deep convolutional neural network (CNN) for head pose classification with data from various sources ranging from high to low resolution. Second, this classification network is then fine-tuned on the continuous head pose manifold for regression based on a subset of the data. Third, we attain state-of-art performance on public low resolution surveillance datasets. Finally, we present improved tracking results using a Kalman filter based intentional tracker. The tracker fuses the instantaneous head pose information in the motion model to improve tracking based on predicted future location. Our implementation computes head pose for a head image in 1.2 milliseconds on commercial hardware, making it real-time and highly scalable.
Download

Short Papers
Paper Nr: 40
Title:

Two View Geometry Estimation by Determinant Minimization

Authors:

Lorenzo Sorgi and Andrey Bushnevskiy

Abstract: Two view geometry estimation, the task of inferring the relative pose between two cameras using only the image content, is one of the fundamental and most studied problems in Computer Vision. In this paper we present a new approach for two view geometry estimation, based on the minimization of an objective function given by the overall volume of the tetrahedrons identified in 3D space by pairs of corresponding feature points. This error measure is equivalent to the determinant of a real valued square matrix, function of the point match coordinates in the camera space, and we show how to minimize it taking advantage of the Perturbation Theorem. Test performed on synthetic and real dataset confirm an increased estimation accuracy compared to the state-of-art.
Download

Paper Nr: 91
Title:

Multiple People Tracking in Smart Camera Networks by Greedy Joint-Likelihood Maximization

Authors:

Nyan Bo Bo, Francis Deboeverie, Peter Veelaert and Wilfried Philips

Abstract: This paper presents a new method to track multiple people reliably using a network of calibrated smart cameras. The task of tracking multiple persons is very difficult due to non-rigid nature of the human body, occlusions and environmental changes. Our proposed method recursively updates the positions of all persons based on the observed foreground images from all smart cameras and the previously known location of each person. The performance of our proposed method is evaluated on indoor video sequences containing person– person/object–person occlusions and sudden illumination changes. The results show that our method performs well with Multiple Object Tracking Accuracy as high as 100% and Multiple Object Tracking Precision as high as 86%. Performance comparison to a state of the art tracking system shows that our method outperforms.
Download

Paper Nr: 101
Title:

RGB-D and Thermal Sensor Fusion - Application in Person Tracking

Authors:

Ignacio Rocco Spremolla, Michel Antunes, Djamila Aouada and Björn Ottersten

Abstract: Many systems combine RGB cameras with other sensor modalities for fusing visual data with complementary environmental information in order to achieve improved sensing capabilities. This article explores the possibility of fusing a commodity RGB-D camera and a thermal sensor. We show that using traditional methods, it is possible to accurately calibrate the complete system and register the three RGB-D-T data sources. We propose a simple person tracking algorithm based on particle filters, and show how to combine the mapped pixel information from the RGB-D-T data. Furthermore, we use depth information to adaptively scale the tracked target area when radial displacements from the camera occur. Experimental results provide evidence that this allows for a significant tracking performance improvement in situations with large radial displacements, when compared to using only a tracker based on RGB or RGB-T data.
Download

Paper Nr: 105
Title:

One Shot Photometric Stereo from Reflectance Classification

Authors:

Toshiya Kawabata, Fumihiko Sakaue and Jun Sato

Abstract: 3D reconstruction of object shape is one of the most important problem in the field of computer vision. Especially, estimation of normal orientation of object surface is useful for photo-realistic image rendering. For this estimation, the photometric stereo is often used. However, it requires multiple images taken under different lighting conditions in the same pose, and thus, we cannot apply it to moving objects in general. In this paper, we propose a one-shot photometric stereo for estimating surface normal of moving objects with arbitrary textures. In our method, we estimate surface orientation and reflectance property simultaneously. For this objective, reflectance data set is used for decreasing DoF (Degree of Freedom) of estimation. In addition, we classify reflectance property of an input image into limited number of classes. By using the prior knowledge, our method can estimate surface orientation and reflectance property, even if input information is not sufficient for the estimation.
Download

Paper Nr: 109
Title:

Robust Matching of Occupancy Maps for Odometry in Autonomous Vehicles

Authors:

Martin Dimitrievski, David Van Hamme, Peter Veelaert and Wilfried Philips

Abstract: In this paper we propose a novel real-time method for SLAM in autonomous vehicles. The environment is mapped using a probabilistic occupancy map model and EGO motion is estimated within the same environment by using a feedback loop. Thus, we simplify the pose estimation from 6 to 3 degrees of freedom which greatly impacts the robustness and accuracy of the system. Input data is provided via a rotating laser scanner as 3D measurements of the current environment which are projected on the ground plane. The local ground plane is estimated in real-time from the actual point cloud data using a robust plane fitting scheme based on the RANSAC principle. Then the computed occupancy map is registered against the previous map using phase correlation in order to estimate the translation and rotation of the vehicle. Experimental results demonstrate that the method produces high quality occupancy maps and the measured translation and rotation errors of the trajectories are lower compared to other 6DOF methods. The entire SLAM system runs on a mid-range GPU and keeps up with the data from the sensor which enables more computational power for the other tasks of the autonomous vehicle.
Download

Paper Nr: 119
Title:

Abnormal Event Detection using Scene Partitioning by Regional Activity Pattern Analysis

Authors:

Jongmin Yu, Jeonghwan Gwak, Seongjong Noh and Moongu Jeon

Abstract: This paper presents a method for detecting abnormal events based on scene partitioning. To develop the practical application for abnormal event detection, the proposed method focuses on handling various activity patterns caused by diverse moving objects and geometric conditions such as camera angles and distances between the camera and objects. We divide a frame into several blocks and group the blocks with similar motion patterns. Then, the proposed method constructs normal-activity models for local regions by using the grouped blocks. These regional models allow to detect unusual activities in complex surveillance scenes by considering specific regional local activity patterns. We construct a new dataset called GIST Youtube dataset, using the Youtube videos to evaluate performance in practical scenes. In the experiments, we used the dataset of the university of minnesota, and our dataset. From the experimental study, we verified that the proposed method is efficient in the complex scenes which contain the various activity patterns.
Download

Paper Nr: 124
Title:

Lane-level Positioning based on 3D Tracking Path of Traffic Signs

Authors:

Sung-ju Kim and Soon-Yong Park

Abstract: Lane-level vehicle positioning is an important task for enhancing the accuracy of in-vehicle navigation systems and the safety of autonomous vehicles. GPS (Global Positioning System) or DGPS (Differential GPS) techniques are generally used in lane-level poisoning systems, which only provide an accuracy level up to 2-3 m. In this paper, we introduce a vision based lane-level positioning technique that provides more accurate prediction results. The proposed method predicts the current driving lane of the vehicle by tracking the 3D location of the traffic signs that are in the side-way of the road using a stereo camera. Several experiments are conducted to analyse the feasibility of the proposed method in driving lane level prediction. According to the experimental results, the proposed method could achieve 90.9% accuracy.
Download

Paper Nr: 129
Title:

Motion based Segmentation for Robot Vision using Adapted EM Algorithm

Authors:

Wei Zhao and Nico Roos

Abstract: Robots operate in a dynamic world in which objects are often moving. The movement of objects may help the robot to segment the objects from the background. The result of the segmentation can subsequently be used to identify the objects. This paper investigates the possibility of segmenting objects of interest from the background for the purpose of identification based on motion. It focusses on two approaches to represent the movements: one based on optical flow estimation and the other based on the SIFT-features. The segmentation is based on the expectation-maximization algorithm. A support vector machine, which classifies the segmented objects, is used to evaluate the result of the segmentation.
Download

Paper Nr: 178
Title:

Evaluation of Foveated Stereo Matching for Robotic Cloth Manipulation

Authors:

Tian Xu and Paul Cockshott

Abstract: Due to the recent development of robotic techniques, cloth manipulation has become an important task. Stereo matching forms a crucial part of the robotic vision and aims to derive the depth information from the image pairs captured by the stereo cameras. However, processing high resolution images to capture sufficient details meanwhile in real-time is very challenging. In addition to accelerating by current multi-core GPU infrastructure, in this work, we utilize foveated matching algorithm to improve the efficiency. To study the effect of foveated matching algorithm on two common robotic manipulation tasks, cloth grasping and flattening, we first create a "garment with wrinkle" dataset that includes depth map ground-truth for garments, which is to our knowledge not available in the research community. Secondly, using this dataset, we found that foveated matching is effective in trading off accuracy for efficiency for stereo matching. Finally, by assuming the robotic behavior following previous work for both cloth grasping and flattening tasks, we demonstrate that using foveated matching can achieve the same level of accuracy for completing both tasks with two to three times of acceleration.
Download

Paper Nr: 200
Title:

Towards High-Quality Parallel Stabilization

Authors:

Abdelrahman Ahmed and Mohamed S. Shehata

Abstract: With the widespread use of handheld devices and unmanned aerial vehicles (UAVs) that has the ability to record video sequences. Digital video stabilization becomes more important as these sequences are usually shaky undermining the visual quality of the video. Digital video stabilization has been studied for decades yielding an extensive amount of literature in the field. However, most of them are highly sequential. In this paper, we present a new parallel technique that exploits the parallel architecture found in modern day devices. The algorithm divides the frame into blocks and estimates a camera path for each block to better enhance the estimation of the transformation needed to adjust for the shakiness of the video.
Download

Paper Nr: 202
Title:

Time-to-Contact from Underwater Images

Authors:

Laksmita Rahadianti, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a method for estimating time-to-contact (TTC) of moving objects and cameras in underwater environments. The time-to-contact is useful for navigating moving vehicles and for avoiding collisions in the 3D space. The existing methods calculate time-to-contact from geometric features of objects such as corners and edges. However, if the cameras and objects are in scattering media, such as fog and water, the degradation of image intensity caused by light scattering makes it difficult to find geometric features in images. Thus, in this paper we propose a method for estimating time-to-contact in scattering media by using the change in image intensity caused by the camera motion.
Download

Paper Nr: 208
Title:

A Novel Technique for Point-wise Surface Normal Estimation

Authors:

Daniel Barath and Ivan Eichhardt

Abstract: Nowadays multi-view stereo reconstruction algorithms can achieve impressive results using many views of the scene. Our primary objective is to robustly extract more information about the underlying surface from fewer images. We present a method for point-wise surface normal and tangent plane estimation in stereo case to reconstruct real-world scenes. The proposed algorithm works for general camera model, however, we choose the pinhole-camera in order to demonstrate its efficiency. The presented method uses particle swarm optimization under geometric and epipolar constraints in order to achieve suitable speed and quality. An oriented point cloud is generated using a single point correspondence for each oriented 3D point and a cost function based on photo-consistency. It can straightforwardly be extended to multi-view reconstruction. Our method is validated in both synthesized and real tests. The proposed algorithm is compared to one of the state-of-the-art patch-based multi-view reconstruction algorithms.
Download

Paper Nr: 25
Title:

Visual Navigation with Street View Image Matching

Authors:

Chih-Hung Hsu and Huei-Yung Lin

Abstract: The vision based navigation approach is a key to success for the driving assistance technology. In this work, we presents a visual navigation assistance system based on the geographic information of the vehicle and image matching between the online and pre-established data. With the rough GPS coordinates, we utilize the image retrieval algorithms to find the most similar image in the panoramic image database. The searching results are then compared with the input image for feature matching to find the landmarks in the panoramic image. By using the 360 field-of-view of the panoramic images, the camera’s heading can be calculated by the matching results. Finally, the landmark information is identified by the markers on the Google map as visual guidance and assistance.
Download

Paper Nr: 63
Title:

Absolute Localization using Visual Data for Autonomous Vehicles

Authors:

Safa Ouerghi, Rémi Boutteau, Pierre Merriaux, Nicolas Ragot, Xavier Savatier and Pascal Vasseur

Abstract: In this paper, we propose an algorithm for estimating the absolute pose of a vehicle using visual data. Our method works in two steps: first we construct a visual map of geolocalized landmarks, then we localize the vehicle using this map. The main advantages of our method are that the localization of the vehicle is absolute and that it requires only a monocular camera and a low-cost GPS. We firstly outline our method, then we present our experimental results on real images using a reference database: the KITTI Vision Benchmark Suite.
Download

Paper Nr: 205
Title:

Towards a Tracking Algorithm based on the Clustering of Spatio-temporal Clouds of Points

Authors:

Andrea Cavagna, Chiara Creato, Lorenzo Del Castello, Stefania Melillo, Leonardo Parisi and Massimiliano Viale

Abstract: The interest in 3D dynamical tracking is growing in fields such as robotics, biology and fluid dynamics. Recently, a major source of progress in 3D tracking has been the study of collective behaviour in biological systems, where the trajectories of individual animals moving within large and dense groups need to be reconstructed to understand the behavioural interaction rules. Experimental data in this field are generally noisy and at low spatial resolution, so that individuals appear as small featureless objects and trajectories must be retrieved by making use of epipolar information only. Moreover, optical occlusions often occur: in a multicamera system one or more objects become indistinguishable in one view, potentially subjected to loss of identity over long-time trajectories. The most advanced 3D tracking algorithms overcome optical occlusions making use of set-cover techniques, which however have to solve NP-hard optimization problems. Moreover, current methods are not able to cope with occlusions arising from actual physical proximity of objects in 3D space. Here, we present a new method designed to work directly on (3D + 1) clouds of points representing the full spatio-temporal evolution of the moving targets. We can then use a simple connected components labeling routine, which is linear in time, to solve optical occlusions, hence lowering from NP to P the complexity of the problem. Finally, we use normalized cut spectral clustering to tackle 3D physical proximity.
Download

Paper Nr: 235
Title:

3D Thermal Monitoring and Measurement using Smart-phone and IR Thermal Sensor

Authors:

Arindam Saha, Keshaw Dewangan and Ranjan Dasgupta

Abstract: Continuous and on the fly heat monitoring in industries like manufacturing and chemical is of compelling research nowadays. The recent advancement in IR thermal sensors unfold the possibilities to fuse the thermal information with other low cost sensor (like optical camera) to perform area or volumetric heat measurement of any heated object. Recent development of affordable handheld mobile thermal sensor as a smart-phone attachment by FLIR encouraged the researcher to develop thermal monitoring system as smart-phone application. In pursuit of this goal we present a light weight system with a combination of optical and thermal sensors to create a thermal dense 3D model along with area/volume measurement of the heated zones using smart-phone. Our proposed pipeline captures RGB and thermal images simultaneously using FLIR thermal attachment. Estimates the poses for RGB and depth images, 3D models are generated by tracking the features from RGB images. Back-projection is used to colour the 3D points to represent both in RGB as well as an estimated surface temperature. The final output of the system is the detected hot region with area/volumetric measurement. Experimental results demonstrate that the cost effective system is capable to measure hot areas accurately and usable in everyday life.
Download

Paper Nr: 241
Title:

Stereo Vision-based Local Occupancy Grid Map for Autonomous Navigation in ROS

Authors:

Pablo Marín-Plaza, Jorge Beltrán, Ahmed Hussein, Basam Musleh, David Martín, Arturo de la Escalera and José María Armingol

Abstract: Autonomous navigation for unmanned ground vehicles has gained significant interest in the research community of mobile robotics. This increased attention comes from its noteworthy role in the field of Intelligent Transportation Systems (ITS). In order to achieve the autonomous navigation for ground vehicles, a detailed model of the environment is required as its input map. This paper presents a novel approach to recognize static obstacles by means of an on-board stereo camera and build a local occupancy grid map in a Robot Operating System (ROS) architecture. The output maps include information concerning the environment 3D structures, which is based on stereo vision. These maps can enhance the global grid map with further details for the undetected obstacles by the laser rangefinder. In order to evaluate the proposed approach, several experiments are performed in different scenarios. The output maps are precisely compared to the corresponding global map segment and to the equivalent satellite image. The obtained results indicate the high performance of the approach in numerous situations.
Download