Monday 17 January 2022
IS&T Welcome & PLENARY: Quanta Image Sensors: Counting Photons Is the New Game in Town
07:00 – 08:10
The Quanta Image Sensor (QIS) was conceived as a different image sensor—one that counts photoelectrons one at a time using millions or billions of specialized pixels read out at high frame rate with computation imaging used to create gray scale images. QIS devices have been implemented in a CMOS image sensor (CIS) baseline room-temperature technology without using avalanche multiplication, and also with SPAD arrays. This plenary details the QIS concept, how it has been implemented in CIS and in SPADs, and what the major differences are. Applications that can be disrupted or enabled by this technology are also discussed, including smartphone, where CIS-QIS technology could even be employed in just a few years.
Eric R. Fossum, Dartmouth College (United States)
Eric R. Fossum is best known for the invention of the CMOS image sensor “camera-on-a-chip” used in billions of cameras. He is a solid-state image sensor device physicist and engineer, and his career has included academic and government research, and entrepreneurial leadership. At Dartmouth he is a professor of engineering and vice provost for entrepreneurship and technology transfer. Fossum received the 2017 Queen Elizabeth Prize from HRH Prince Charles, considered by many as the Nobel Prize of Engineering “for the creation of digital imaging sensors,” along with three others. He was inducted into the National Inventors Hall of Fame, and elected to the National Academy of Engineering among other honors including a recent Emmy Award. He has published more than 300 technical papers and holds more than 175 US patents. He co-founded several startups and co-founded the International Image Sensor Society (IISS), serving as its first president. He is a Fellow of IEEE and OSA.
08:10 – 08:40 EI 2022 Welcome Reception
Wednesday 19 January 2022
IS&T Awards & PLENARY: In situ Mobility for Planetary Exploration: Progress and Challenges
07:00 – 08:15
This year saw exciting milestones in planetary exploration with the successful landing of the Perseverance Mars rover, followed by its operation and the successful technology demonstration of the Ingenuity helicopter, the first heavier-than-air aircraft ever to fly on another planetary body. This plenary highlights new technologies used in this mission, including precision landing for Perseverance, a vision coprocessor, new algorithms for faster rover traverse, and the ingredients of the helicopter. It concludes with a survey of challenges for future planetary mobility systems, particularly for Mars, Earth’s moon, and Saturn’s moon, Titan.
Larry Matthies, Jet Propulsion Laboratory (United States)
Larry Matthies received his PhD in computer science from Carnegie Mellon University (1989), before joining JPL, where he has supervised the Computer Vision Group for 21 years, the past two coordinating internal technology investments in the Mars office. His research interests include 3-D perception, state estimation, terrain classification, and dynamic scene analysis for autonomous navigation of unmanned vehicles on Earth and in space. He has been a principal investigator in many programs involving robot vision and has initiated new technology developments that impacted every US Mars surface mission since 1997, including visual navigation algorithms for rovers, map matching algorithms for precision landers, and autonomous navigation hardware and software architectures for rotorcraft. He is a Fellow of the IEEE and was a joint winner in 2008 of the IEEE’s Robotics and Automation Award for his contributions to robotic space exploration.
Image Processing: Algorithms and Systems XX Posters
08:20 – 09:20
EI Symposium
Poster interactive session for all conferences authors and attendees.
IPAS-191
P-08: Class specific biased extrapolation of images in latent space for imbalanced image classification, Suhyeon Jeong and Seungkyu Lee, Kyung Hee University (Republic of Korea) [view abstract]
In this work, we study the effectiveness of prior re-sampling approaches for imbalanced image classification. We propose to investigate inter-class and within-class characteristics and conduct class specific extrapolation re-sampling for optimal imbalanced learning.
IPAS-192
P-09: Computer vision-based classification of schizophrenia patients from retinal imagery, Diana Joseph, Adriann Lai, Steven Silverstein, Rajeev Ramchandran, and Edgar Bernal, University of Rochester (United States) [view abstract]
Changes in retinal structure have been documented in patients with chronic schizophrenia using optical coherence tomography (OCT) metrics, but these studies were limited by the measurements provided by OCT machines. In this paper, we leverage machine and deep learning techniques to analyze OCT images and train algorithms to differentiate between schizophrenia patients and healthy controls. In order to address data scarcity issues, we use intermediate representations extracted from ReLayNet, a pretrained convolutional neural network designed to segment macula layers from OCT images. Experimental results show that classifiers trained on deep features and OCT-machine provided metrics can reliably distinguish between chronic schizophrenia patients and an age-matched control population. Further, we present what is to our knowledge the first reported empirical evidence showing that separation can be achieved between first-episode schizophrenia patients and their age-matched control group by leveraging deep image features extracted from OCT imagery.
IPAS-193
P-10: Optimal parameters selection of the Frost filter based on despeckling efficiency prediction for Sentinel SAR images, Oleksii S. Rubel1, Andrii S. Rubel1, Vladimir Lukin1, and Karen Egiazarian2; 1National Aerospace University (Ukraine) and 2Tampere University (Finland) [view abstract]
Synthetic aperture radar (SAR) images have found numerous applications. However, further analysis of SAR images including interpretation, classification, segmentation, etc. is an extremely challenging task due to the presence of highly intensive speckle noise. Therefore, image despeckling is one of the main stages in preliminary SAR data processing. Over the past decades, a large number of different image despeckling techniques have been proposed ranging from local statistics filters to deep learning based ones. In this study, we analyze one of the most known and widely used local statistics Frost filter. Despeckling efficiency of the Frost filter significantly depends on the sliding window size and tuning (also called damping) factor. Here, we present a method for optimal parameters selection of the Frost filter for a given image based on despeckling efficiency prediction. Despeckling efficiency prediction for the Frost filter is carried out using a set of statistical and spectral input parameters and multilayer neural network. It is shown that such a prediction can be performed before applying image despeckling with a high accuracy and it is faster than despeckling itself. Both simulated speckled images and real-life Sentinel-1 SAR images have been used for extensive evaluation of the proposed method.
IPAS-194
P-11: Simulation-based virtual reality training for firefighters, Mohamed Saifeddine Hadj Sassi1, Federica Battisti2, and Marco Carli1; 1Roma Tre University and 2University of Padova (Italy) [view abstract]
Simulation-based training is used to improve learners’ skills and enhance their knowledge. Recently, virtual reality technology has been exploited in simulation mainly for training purposes, to enable learning while performing simulated activities that are dangerous or even impossible to be simulated in the real world. In this context, we present a simulation-based firefighter training for an earthquake scenario developed in collaboration with national Italian firefighters rescue units (Italian “Istituto Superiore Antincendi”). The proposed training model is based on a virtual reality solution and foresees a novel interaction and game model developed specifically for training first-responders. The simulator environment is a head-mounted display where the learner interacts with objects and performs specific tasks. The performed test show that the use of virtual reality can improve the effectiveness of training. Indeed, trainees show a better perception of the scene which is reflected in a faster response in the real situation. The proposed training system can help the firefighter by providing adequate information on how to deal with risks.
Tuesday 25 January 2022
IS&T Awards & PLENARY: Physics-based Image Systems Simulation
07:00 – 08:00
Three quarters of a century ago, visionaries in academia and industry saw the need for a new field called photographic engineering and formed what would become the Society for Imaging Science and Technology (IS&T). Thirty-five years ago, IS&T recognized the massive transition from analog to digital imaging and created the Symposium on Electronic Imaging (EI). IS&T and EI continue to evolve by cross-pollinating electronic imaging in the fields of computer graphics, computer vision, machine learning, and visual perception, among others. This talk describes open-source software and applications that build on this vision. The software combines quantitative computer graphics with models of optics and image sensors to generate physically accurate synthetic image data for devices that are being prototyped. These simulations can be a powerful tool in the design and evaluation of novel imaging systems, as well as for the production of synthetic data for machine learning applications.
Joyce Farrell, Stanford Center for Image Systems Engineering, Stanford University, CEO and Co-founder, ImagEval Consulting (United States)
Joyce Farrell is a senior research associate and lecturer in the Stanford School of Engineering and the executive director of the Stanford Center for Image Systems Engineering (SCIEN). Joyce received her BS from the University of California at San Diego and her PhD from Stanford University. She was a postdoctoral fellow at NASA Ames Research Center, New York University, and Xerox PARC, before joining the research staff at Hewlett Packard in 1985. In 2000 Joyce joined Shutterfly, a startup company specializing in online digital photofinishing, and in 2001 she formed ImagEval Consulting, LLC, a company specializing in the development of software and design tools for image systems simulation. In 2003, Joyce returned to Stanford University to develop the SCIEN Industry Affiliates Program.
PANEL: The Brave New World of Virtual Reality
08:00 – 09:00
Advances in electronic imaging, computer graphics, and machine learning have made it possible to create photorealistic images and videos. In the future, one can imagine that it will be possible to create a virtual reality that is indistinguishable from real-world experiences. This panel discusses the benefits of this brave new world of virtual reality and how we can mitigate the risks that it poses. The goal of the panel discussion is to showcase state-of-the art synthetic imagery, learn how this progress benefits society, and discuss how we can mitigate the risks that the technology also poses. After brief demos of the state-of-their-art, the panelists will discuss: creating photorealistic avatars, Project Shoah, and digital forensics.
Panel Moderator: Joyce Farrell, Stanford Center for Image Systems Engineering, Stanford University, CEO and Co-founder, ImagEval Consulting (United States)
Panelist: Matthias Neissner, Technical University of Munich (Germany)
Panelist: Paul Debevec, Netflix, Inc. (United States)
Panelist: Hany Farid, University of California, Berkeley (United States)
Image Filtering, Enhancement, and Object Detection
Session Chair:
Karen Egiazarian, Tampere University (Finland)
09:15 – 10:20
Green Room
09:15
Conference Introduction
09:20IPAS-344
Contrast enhancement: Cross-modal learning approach for medical images, Rabia Naseem1, Akib J. Islam1,2, Faouzi Alaya Cheikh1, and Azeddine Beghdadi3; 1Norwegian University of Science and Technology (Norway), 2University Jean Monnet Saint-Etienne (France), and 3University Sorbonne Paris Nord (France) [view abstract]
Contrast is an imperative perceptible attribute embodying the image quality. In medical images, the poor quality specifically low contrast inhibits precise interpretation of the image. Contrast enhancement is, therefore, applied not merely to improve the visual quality of images but also enabling them to facilitate further processing tasks. We propose a contrast enhancement approach based on cross-modal learning in this paper. Cycle-GAN (Generative Adversarial Network) is used for this purpose, where UNet augmented with global features acts as a generator. Besides, individual batch normalization has been used to make generators adapt specifically to their input distributions. The proposed method accepts low contrast T2-weighted (T2-w) Magnetic Resonance images (MRI) and uses the corresponding high contrast T1-w MRI to learn the global contrast characteristics. The experiments were conducted on a publicly available IXI dataset. Comparison with recent CE methods and quantitative assessment using two prevalent metrics FSIM and BRISQUE validate the superior performance of the proposed method.
09:40IPAS-345
Rapid circle detection through fusion of summative statistics of edge components, Scott A. Craver and Pheona Anjoy, Binghamton University (United States) [view abstract]
Circle detection of edge images can involve significant time and memory requirements, particularly if the circles have unknown radii over a large range. We describe an algorithm that processes an edge image in a single linear pass, compiling statistics of connected components that can be used by two distinct least square methods. Because the compiled statistics are all sums, these components can then be quickly merged without any further examination of image pixels. Fusing multiple circle detectors allows more powerful circle detection. The resulting algorithm is of linear complexity in the number of image pixels, and quadratic complexity in a much smaller number of cluster statistics.
10:00IPAS-346
Training decision trees to guide feature selection for infrared image pre-screening algorithms, Dawne Deaver1 and Nader Namazi2; 1US Army DEVCOM C5ISR and 2The Catholic University of America (United States) [view abstract]
This research explores a fresh approach to the selection and weighting of classical image features for infrared object detection and target-like clutter rejection. Traditional statistical techniques are used to calculate individual features, while modern supervised machine learning techniques are used to rank-order the predictive-value of each feature. This paper describes the use of Decision Trees to determine which features have the highest value in prediction of the correct binary target/non-target class. This work is unique in that it is focused on infrared imagery and exploits interpretable machine learning techniques for the selection of hand-crafted features integrated into a pre-screening algorithm.
Multi-dimensional and Multimodal Image Processing Algorithms I
Session Chair:
Karen Egiazarian, Tampere University (Finland)
10:45 – 11:45
Green Room
10:45IPAS-354
On properties of visual quality metrics in remote sensing applications, Oleg Ieremeiev1, Vladimir Lukin1, Krzysztof Okarma2, Karen Egiazarian3, and Benoit Vozel4; 1National Aerospace University (Ukraine), 2West Pomeranian University of Technology (Poland), 3Tampere University (Finland), and 4University of Rennes 1 (France) [view abstract]
Visual quality is important for remote sensing data presented as grayscale, color or pseudo-color images. Although several visual quality metrics (VQMs) have been used to characterize such data, only a limited analysis of their applicability in remote sensing applications has been done so far. In this paper, we study correlation factors for a wide set of VQMs for color images with distortion types typical for remote sensing. It is demonstrated that there are many metrics that have very high Spearman rank order correlation, e.g. PSNR-based and SSIM-based metrics. Meanwhile, there are also metrics that are practically uncorrelated with others. A detailed analysis of VQMs that have the largest SROCC values and belong to different groups is presented in this paper
11:05IPAS-355
Face detection and recognition in organic video: A comparative study for sport celebrities database, Yigit O. AKBAY and Mihai Mitrea, Institut Mines-Telecom (France) [view abstract]
The present paper reports on an experimental study carried out under the applicative field of organic video processing, and relates to the possibility of identifying sport celebrities (soccer players) in video content. In contrast to common state-of-the-art studies, a special attention is paid on the cases in which the face is not completely included in the frame (lateral views, partial occlusions, etc.) and/or in which arbitrarily lighting conditions occur. To this aim, we consider two conventional types of face detection algorithms (Haar Cascade Classifier, and MMOD - Max-Margin object detection) coupled to two conventional face recognition models (LSBH - Local binary pattern histogram, and CNN-based Pruned ResNet). The experimental work consists of evaluating the end-to-end performances of the four possible combinations among the above-mentioned two face detection and two face recognition methods. A database of 20 video sequences of about 3 minutes each is organized. As an overall conclusion, we brought to light that the MMOD coupled to a Pruned ResNet model seems to better suit the organic video processing use-case constraints, being able to reach a recognition rate of 98%.
11:25IPAS-356
Volumetric segmentation for integral microscopy with Fourier plane recording, Sergio Moreschini, Robert Bregovic, and Atanas Gotchev, University of Tampere (Finland) [view abstract]
Light Field (LF) microscopy has emerged as a fast-growing field of interest in recent years due to its undoubted capacity of capturing in-vivo samples from multiple perspectives. In this work, we present a framework for Volume Reconstruction from LF images created following the setup of a Fourier Integral Microscope (FIMic). In our approach we do not use real images, instead, we use a dataset generated in Blender which mimics the capturing process of a FIMic. The resulted images have been used to create a Focal Stack (FS) of the LF, from which Epipolar Plane Images (EPIs) have been extracted. The FS and the EPIs have been used to train three different deep neural networks based on the classic U-Net architecture. The Volumetric Reconstruction is the result of the average of the probabilities produced by such networks.
Multi-dimensional and Multimodal Image Processing Algorithms II
Session Chair:
Sos Agaian, College of Staten Island and the Graduate Center, CUNY (United States)
15:00 – 16:00
Green Room
15:00IPAS-365
A frame level rate allocation algorithm based on temporal dependency model for AV1, Cheng Chen, Jingning Han, Paul Wilkins, and Yaowu Xu, Google Inc. (United States) [view abstract]
Rate control is an essential module in video coding. Rate control strategies strive to deliver a stable playback experience as well as achieving high compression efficiency for modern video applications, constrained by restricted bandwidth and buffer limits. The difficulty of rate control often lies in the adaptation ability of the underlying algorithm to capture the variability of content and temporal correlation across frames. In this paper, we present a rate allocation algorithm to model the distortion propagation in the hierarchical coding structure premised on the temporal dependency model at frame level. Our experiments show that with the information collected from the temporal dependency model, the proposed rate allocation algorithm significantly improves the coding efficiency over the AV1 baseline on a set of variable user generated video clips.
15:20IPAS-366
Alignment and fusion of visible and infrared images based on gradient-domain processing, Ayaka Tanihata, Masayuki Tanaka, and Masatoshi Okutomi, Tokyo Institute of Technology (Japan) [view abstract]
An image fusion of different modal images, such as visible and far-infrared images, is an important image processing technique because different modal images can compensate for each other. Many existing image fusion algorithms assume that different modal images are perfectly aligned. However, that assumption is not satisfied in many practical situations. In this paper, we propose an image alignment and fusion algorithm with gradient-domain processing. First, we extract the gradient maps from both modality images. Then, assuming disparities between the two gradient maps, candidate gradient maps for the target fused image are generated by selecting the gradient having larger power from different modality images pixel-by-pixel. A key observation is as follows. If the assumed disparity is wrong, the fused image includes ghost edges. If the assumed disparity is correct, the single edge is preserved without the ghost edge in the fused image. Therefore, we evaluate the gradient power in the region-of-interest of the fused image with different diparities. Then, we can align images based on the disparity associated with the minimum gradient power. Finally, we apply gradient-based image fusion with the aligned image pairs. We experimentally validate that the proposed approach can effectively align and fuse the visible and far-infrared images.
15:40IPAS-367
Deep reinforcement learning approach to predict head movement in 360° videos, Tanmay Ambadkar and Pramit Mazumdar, IIIT Vadodara (India) [view abstract]
The popularity of 360° videos have grown immensely in the last few years. A typical 360° video when seen through a Head Mounted Display (HMD) gives an immersive feeling, where the viewer feels like standing within the real environment on a virtual platform. Simulating the real-life behaviour, within an HMD a viewer can view only a particular region of the media and not the entire 360°content. The portion visible within the HMD at a given time is popularly known as the Field-of-View (FOV). The viewer may move his head or perform a physical movement (walk) to explore the entire content. Due to the large volume of 360° media, it faces challenges during transmission, and it is becoming more important to use adaptive compression following the viewing behaviour of a user. In this work, we propose a model to estimate the FOV of a user viewing a 360° video using an HMD, popularly known as Virtual Cinematography. Saliency estimation has been proved to be a good indicator for modelling the attention of a viewer for a particular scene. Therefore, in the proposed model to estimate FOV we exploit the reinforcement learning-based framework that uses meticulously used reward functions and the perceptual saliency as the driving feature extractor.
Wednesday 26 January 2022
Signal and Image Classification I
Session Chair:
Atanas Gotchev, Tampere University (Finland)
07:00 – 08:00
Green Room
07:00IPAS-381
Machine learning with blind imbalanced domains, Hiroshi Kuwajima1, Masayuki Tanaka2, and Masatoshi Okutomi2; 1DENSO Corporation and 2Tokyo Institute of Technology (Japan) [view abstract]
Recently machine learning is used in various applications and has shown success. Machine learning is good at learning the overall characteristics of massive training data. However, for real-world applications, training data often include multiple domains, and some domains have higher importance or risks. In this paper, we first propose a new problem setting: machine learning with blind imbalanced domains. In the proposed problem, the domain assignment of samples is unknown and imbalanced in the training data, and the performance is evaluated for each domain in the test data. Second, we propose an approach for that problem in classification tasks. The proposed approach combines center loss and weighted mini-batch sampling based on distances between samples and centroids in the deep feature space. Experiments on one minor domain and two minor domain settings using three handwritten digit databases (MNIST, EMNIST, and USPS) show that our proposed approach outperforms possible solutions using related methods. Remarkably our approach improves the accuracy in the minor domain by more than 1% on average. Furthermore, it can be inductively estimated that our proposed approach works on multiple domains given the successful results on one and two minor domains.
07:20IPAS-382
Real-time defect detection and classification on wood surfaces using deep learning, Mazhar Mohsin, Oluwafemi Samson Balogun, Keijo Haataja, and Pekka Toivanen, University of Eastern Finland (Finland) [view abstract]
This paper proposes a novel method for automatic real-time defect detection and classification on wood surfaces. Our method uses deep convolutional neural network (CNN) based approach Faster R-CNN (Region-based CNN ) as detector and MobileNetV3 as backbone network for feature extraction. The key difference of our approach from the existing methods is that it detects knots and other type of defects efficiently and does the classification in real-time from the input video frames. Speed and accuracy is the main focus of our work. In the case of the industrial quality control and inspection such as defects detection, the task of detection and classification needs to be done in real-time on a computationally limited processing units or commodity processors. Our trained model is a light weight, and it can even be deployed on systems for example mobile and edge devices. We have pre-trained the MobileNet V3 on large image dataset for feature extraction. We use Faster R-CNN for detection and classification of defects. The system does the real-time detection and classification on an average of 37 frames per second from input video frames, using low cost and low memory GPU (Graphics Processing Unit). Our method has achieved an overall accuracy of 99\% in detecting and classifying defects.
07:40IPAS-383
Hair color digitization through imaging and deep inverse graphics, Robin Kips1,2, Panagiotis-Alexandros Bokaris1, Matthieu Perrot1, Pietro Gori2, and Isabelle Bloch3; 1L'Oréal Research and Innovation, 2LTCI, Telecom Paris, Institut Polytechnique de Paris, and 3Sorbonne Universite CNRS (France) [view abstract]
Hair appearance is a complex phenomenon due to hair geometry and how the light bounces on different hair fibers. For this reason, reproducing a specific hair color in a rendering environment is a challenging task that requires manual work and expert knowledge in computer graphics to tune the result visually. While current hair capture methods focus on hair shape estimation many applications could benefit from an automated method for capturing the appearance of a physical hair sample, from augmented/virtual reality to hair dying development. Building on recent advances in inverse graphics and material capture using deep neural networks, we introduce a novel method for hair color digitization. Our proposed pipeline allows capturing the color appearance of a physical hair sample and renders synthetic images of hair with a similar appearance, simulating different hair styles and/or lighting environments. Since rendering realistic hair images requires path-tracing rendering, the conventional inverse graphics approach based on differentiable rendering is untractable. Our method is based on the combination of a controlled imaging device, a path-tracing renderer, and an inverse graphics model based on self-supervised machine learning, which does not require to use of differentiable rendering to be trained. We illustrate the performance of our hair digitization method on both real and synthetic images and show that our approach can accurately capture and render hair color.
Signal and Image Classification II
Session Chair:
Atanas Gotchev, Tampere University (Finland)
08:30 – 09:30
Green Room
08:30IPAS-390
Deep learning based udder classification for cattle traits analysis, Hina Afridi1,2, Mohib Ullah1, Øyvind Nordbø2, and Faouzi Alaya Cheikh1; 1Norwegian University of Science and Technology and 2GENO SA (Norway) [view abstract]
To improve genetic gain within cattle breeding, deep learning-based automatic methods can be developed to streamline and increase the precision of registrations within cattle breeding. However, prior to this process, relevant data needs to be collected and classified for further processing. For this purpose, we explore a convolution neural network (CNN) namely the VGG-16 model for udder classification using cattle data collected in Norwegian dairy cattle farms. The analysis of the udder images is challenging due to the variations in the captured images of this non-rigid organ, capture in the farm environment, and disturbances in the form of irrelevant segments of other cattle parts. We manually annotate the data pertaining to various udder and non-udder information to construct a dataset for our deep learning analysis. We demonstrate that the VGG-16 model used as the backbone can efficiently give an acceptable performance with training and validation accuracy of 97% and 93% respectively on our custom dataset.
08:50IPAS-392
Expert training: Enhancing AI resilience to image coding artifacts, Alban Marie, Karol Desnos, Luce Morin, and Lu Zhang, Institut National des Sciences Appliquées de Rennes (France) [view abstract]
In the Machine-to-Machine (M2M) transmission context, there is a great need to reduce the amount of transmitted information using lossy compression. However, commonly used image compression methods are designed for human perception, not for Artificial Intelligence (AI) algorithms performances. It is known that these compression distortions affect many deep learning based architectures on several computer vision tasks. In this paper, we focus on the classification task and propose a new approach, named expert training, to enhance Convolutional Neural Networks (CNNs) resilience to compression distortions. We validated our approach using MnasNet and ResNet50 architectures, against image compression distortions introduced by three commonly used methods (JPEG, J2K and BPG), on the ImageNet dataset. The results showed a better robustness of these two architectures against the tested coding artifacts using the proposed expert training approach. Once the paper is accepted, our code will be publicly available at https://github.com/foo-bar-anonym/expert_training.
09:10IPAS-419
Accuracy evaluation of methods for pose estimation from fiducial markers, Ugurcan Budak1, Olli Suominen1, Emilio Ruiz Morales2, and Atanas Gotchev1; 1Tampere University (Finland) and 2Fusion for Energy (F4E) (Spain) [view abstract]
Estimating the pose from fiducial markers is a widely researched topic with practical importance for computer vision, robotics and photogrammetry. In this paper, we aim at quantifying the accuracy of pose estimation in real-world scenarios. More specifically, we investigate six different factors, which impact the accuracy of pose estimation, namely: number of points, depth offset, planar offset, manufacturing error, detection error, and constellation size. Their influence is quantified for four non-iterative pose estimation algorithms, employing direct linear transform, direct least squares, robust perspective n-point, and infinitesimal planar pose estimation, respectively. We present empirical results which are instructive for selecting a well-performing pose estimation method and rectifying the factors causing errors and degrading the rotational and translational accuracy of pose estimation.
KEYNOTE: Perception and Image Quality
Session Chair: Atanas Gotchev, Tampere University (Finland)
10:00 – 11:00
Green Room
IPAS-399
KEYNOTE: Perception-guided image quality measurements: Principles and future trends [PRESENTATION-ONLY], Sos S. Agaian, College of Staten Island and the Graduate Center, CUNY (United States)
Bio-inspired image processing is about learning image algorithms from computational neuroscience, cognitive science, and biology and applying them to the design of real-world image processing-based systems. More specifically, this field is giving computers the ability to "see" just as humans do. Recently, many useful image processing algorithms developed with varying degrees of correspondence with biological vision studies. This is natural since a biological system can provide a source of inspiration for new computational efficient/robust vision models and measurements. Simultaneously, the image processing tools may give new insights for understanding biological visual systems. Digital images are subject to various distortions during acquisition, processing, transmission, compression, storage, and reproduction. How can we automatically predict quantitatively or perceived image quality? In this talk, we present originating in visual perception studies: Visual perception-driven image quality measurements: principles, future trends, applications. We will also give our recent research works and a synopsis of the current state-of-the-art results in image quality measurements and discuss future trends in these technologies and the associated commercial impact and opportunities.
Sos S. Agaian is a distinguished professor of computer science at CSI and the Graduate Center, CUNY. Dr. Agaian was a Peter T. Flawn Professor of the University of Texas at San Antonio. His research sponsors include DARPA, NSF, US Department of Transportation, US Department of Energy, NIJ, and private industry. Dr. Agaian’s research interests are in big and small data analytics, computational vision and sensing, machine learning and urban computing, multimodal biometric and digital forensics, information processing and fusion, and fast algorithms. He has special interests in finding meaning in visual content-examine images for faces, text, objects, action, sciences, and other contents; and in the development of scientific systems and architectures in the theory and practice of engineering and computer sciences (emphasizing complex digital data processing, information sciences and systems technologies in the military, as well as medical and industrial information processing centers). Dr. Agaian has developed applications in healthcare, biomedical data mining, object recognition, signal processing, computer-aided food quality inspection, 3D imaging visible and thermal sensors, computational photography, multimedia security, needs-driven medical and biomedical technology, finance, and other related areas. He has published 750 articles, 10 books, 19 book chapters, and holds more than 56 American and foreign issued or pending patents/ disclosures. Several of Agaian’s IP are commercially licensed. He is an Associate Editor for several journals, including the Image processing Transaction (IEEE) and IEEE Transaction of Cybernetics. He is a fellow of IS&T, SPIE, AAAS, IEEE, and AAI. Dr. Agaian gave more than 15 plenary/keynote speeches and 50+ Invited talks.