Autonomous Vehicles and Machines 2023
Monday 16 January 2023
10:20 – 10:50 AM Coffee Break
12:30 – 2:00 PM Lunch
Monday 16 January PLENARY: Neural Operators for Solving PDEs
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Deep learning surrogate models have shown promise in modeling complex physical phenomena such as fluid flows, molecular dynamics, and material properties. However, standard neural networks assume finite-dimensional inputs and outputs, and hence, cannot withstand a change in resolution or discretization between training and testing. We introduce Fourier neural operators that can learn operators, which are mappings between infinite dimensional spaces. They are independent of the resolution or grid of training data and allow for zero-shot generalization to higher resolution evaluations. When applied to weather forecasting, neural operators capture fine-scale phenomena and have similar skill as gold-standard numerical weather models for predictions up to a week or longer, while being 4-5 orders of magnitude faster.
Anima Anandkumar, Bren professor, California Institute of Technology, and senior director of AI Research, NVIDIA Corporation (United States)
Anima Anandkumar is a Bren Professor at Caltech and Senior Director of AI Research at NVIDIA. She is passionate about designing principled AI algorithms and applying them to interdisciplinary domains. She has received several honors such as the IEEE fellowship, Alfred. P. Sloan Fellowship, NSF Career Award, and Faculty Fellowships from Microsoft, Google, Facebook, and Adobe. She is part of the World Economic Forum's Expert Network. Anandkumar received her BTech from Indian Institute of Technology Madras, her PhD from Cornell University, and did her postdoctoral research at MIT and assistant professorship at University of California Irvine.
3:00 – 3:30 PM Coffee Break
EI 2023 Highlights Session
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
3:30 – 5:00 PM
Cyril Magnin II
Join us for a session that celebrates the breadth of what EI has to offer with short papers selected from EI conferences.
NOTE: The EI-wide "EI 2023 Highlights" session is concurrent with Monday afternoon COIMG, COLOR, IMAGE, and IQSP conference sessions.
IQSP-309
Evaluation of image quality metrics designed for DRI tasks with automotive cameras, Valentine Klein, Yiqi LI, Claudio Greco, Laurent Chanas, and Frédéric Guichard, DXOMARK (France) [view abstract]
Driving assistance is increasingly used in new car models. Most driving assistance systems are based on automotive cameras and computer vision. Computer Vision, regardless of the underlying algorithms and technology, requires the images to have good image quality, defined according to the task. This notion of good image quality is still to be defined in the case of computer vision as it has very different criteria than human vision: humans have a better contrast detection ability than image chains. The aim of this article is to compare three different metrics designed for detection of objects with computer vision: the Contrast Detection Probability (CDP) [1, 2, 3, 4], the Contrast Signal to Noise Ratio (CSNR) [5] and the Frequency of Correct Resolution (FCR) [6]. For this purpose, the computer vision task of reading the characters on a license plate will be used as a benchmark. The objective is to check the correlation between the objective metric and the ability of a neural network to perform this task. Thus, a protocol to test these metrics and compare them to the output of the neural network has been designed and the pros and cons of each of these three metrics have been noted.
SD&A-224
Human performance using stereo 3D in a helmet mounted display and association with individual stereo acuity, Bonnie Posselt, RAF Centre of Aviation Medicine (United Kingdom) [view abstract]
Binocular Helmet Mounted Displays (HMDs) are a critical part of the aircraft system, allowing information to be presented to the aviator with stereoscopic 3D (S3D) depth, potentially enhancing situational awareness and improving performance. The utility of S3D in an HMD may be linked to an individual’s ability to perceive changes in binocular disparity (stereo acuity). Though minimum stereo acuity standards exist for most military aviators, current test methods may be unable to characterise this relationship. This presentation will investigate the effect of S3D on performance when used in a warning alert displayed in an HMD. Furthermore, any effect on performance, ocular symptoms, and cognitive workload shall be evaluated in regard to individual stereo acuity measured with a variety of paper-based and digital stereo tests.
IMAGE-281
Smartphone-enabled point-of-care blood hemoglobin testing with color accuracy-assisted spectral learning, Sang Mok Park1, Yuhyun Ji1, Semin Kwon1, Andrew R. O’Brien2, Ying Wang2, and Young L. Kim1; 1Purdue University and 2Indiana University School of Medicine (United States) [view abstract]
We develop an mHealth technology for noninvasively measuring blood Hgb levels in patients with sickle cell anemia, using the photos of peripheral tissue acquired by the built-in camera of a smartphone. As an easily accessible sensing site, the inner eyelid (i.e., palpebral conjunctiva) is used because of the relatively uniform microvasculature and the absence of skin pigments. Color correction (color reproduction) and spectral learning (spectral super-resolution spectroscopy) algorithms are integrated for accurate and precise mHealth blood Hgb testing. First, color correction using a color reference chart with multiple color patches extracts absolute color information of the inner eyelid, compensating for smartphone models, ambient light conditions, and data formats during photo acquisition. Second, spectral learning virtually transforms the smartphone camera into a hyperspectral imaging system, mathematically reconstructing high-resolution spectra from color-corrected eyelid images. Third, color correction and spectral learning algorithms are combined with a spectroscopic model for blood Hgb quantification among sickle cell patients. Importantly, single-shot photo acquisition of the inner eyelid using the color reference chart allows straightforward, real-time, and instantaneous reading of blood Hgb levels. Overall, our mHealth blood Hgb tests could potentially be scalable, robust, and sustainable in resource-limited and homecare settings.
AVM-118
Designing scenes to quantify the performance of automotive perception systems, Zhenyi Liu1, Devesh Shah2, Alireza Rahimpour2, Joyce Farrell1, and Brian Wandell1; 1Stanford University and 2Ford Motor Company (United States) [view abstract]
We implemented an end-to-end simulation for perception systems, based on cameras, that are used in automotive applications. The open-source software creates complex driving scenes and simulates cameras that acquire images of these scenes. The camera images are then used by a neural network in the perception system to identify the locations of scene objects, providing the results as input to the decision system. In this paper, we design collections of test scenes that can be used to quantify the perception system’s performance under a range of (a) environmental conditions (object distance, occlusion ratio, lighting levels), and (b) camera parameters (pixel size, lens type, color filter array). We are designing scene collections to analyze performance for detecting vehicles, traffic signs and vulnerable road users in a range of environmental conditions and for a range of camera parameters. With experience, such scene collections may serve a role similar to that of standardized test targets that are used to quantify camera image quality (e.g., acuity, color).
VDA-403
Visualizing and monitoring the process of injection molding, Christian A. Steinparz1, Thomas Mitterlehner2, Bernhard Praher2, Klaus Straka1,2, Holger Stitz1,3, and Marc Streit1,3; 1Johannes Kepler University, 2Moldsonics GmbH, and 3datavisyn GmbH (Austria) [view abstract]
In injection molding machines the molds are rarely equipped with sensor systems. The availability of non-invasive ultrasound-based in-mold sensors provides better means for guiding operators of injection molding machines throughout the production process. However, existing visualizations are mostly limited to plots of temperature and pressure over time. In this work, we present the result of a design study created in collaboration with domain experts. The resulting prototypical application uses real-world data taken from live ultrasound sensor measurements for injection molding cavities captured over multiple cycles during the injection process. Our contribution includes a definition of tasks for setting up and monitoring the machines during the process, and the corresponding web-based visual analysis tool addressing these tasks. The interface consists of a multi-view display with various levels of data aggregation that is updated live for newly streamed data of ongoing injection cycles.
COIMG-155
Commissioning the James Webb Space Telescope, Joseph M. Howard, NASA Goddard Space Flight Center (United States) [view abstract]
Astronomy is arguably in a golden age, where current and future NASA space telescopes are expected to contribute to this rapid growth in understanding of our universe. The most recent addition to our space-based telescopes dedicated to astronomy and astrophysics is the James Webb Space Telescope (JWST), which launched on 25 December 2021. This talk will discuss the first six months in space for JWST, which were spent commissioning the observatory with many deployments, alignments, and system and instrumentation checks. These engineering activities help verify the proper working of the telescope prior to commencing full science operations. For the session: Computational Imaging using Fourier Ptychography and Phase Retrieval.
HVEI-223
Critical flicker frequency (CFF) at high luminance levels, Alexandre Chapiro1, Nathan Matsuda1, Maliha Ashraf2, and Rafal Mantiuk3; 1Meta (United States), 2University of Liverpool (United Kingdom), and 3University of Cambridge (United Kingdom) [view abstract]
The critical flicker fusion (CFF) is the frequency of changes at which a temporally periodic light will begin to appear completely steady to an observer. This value is affected by several visual factors, such as the luminance of the stimulus or its location on the retina. With new high dynamic range (HDR) displays, operating at higher luminance levels, and virtual reality (VR) displays, presenting at wide fields-of-view, the effective CFF may change significantly from values expected for traditional presentation. In this work we use a prototype HDR VR display capable of luminances up to 20,000 cd/m^2 to gather a novel set of CFF measurements for never before examined levels of luminance, eccentricity, and size. Our data is useful to study the temporal behavior of the visual system at high luminance levels, as well as setting useful thresholds for display engineering.
HPCI-228
Physics guided machine learning for image-based material decomposition of tissues from simulated breast models with calcifications, Muralikrishnan Gopalakrishnan Meena1, Amir K. Ziabari1, Singanallur Venkatakrishnan1, Isaac R. Lyngaas1, Matthew R. Norman1, Balint Joo1, Thomas L. Beck1, Charles A. Bouman2, Anuj Kapadia1, and Xiao Wang1; 1Oak Ridge National Laboratory and 2Purdue University (United States) [view abstract]
Material decomposition of Computed Tomography (CT) scans using projection-based approaches, while highly accurate, poses a challenge for medical imaging researchers and clinicians due to limited or no access to projection data. We introduce a deep learning image-based material decomposition method guided by physics and requiring no access to projection data. The method is demonstrated to decompose tissues from simulated dual-energy X-ray CT scans of virtual human phantoms containing four materials - adipose, fibroglandular, calcification, and air. The method uses a hybrid unsupervised and supervised learning technique to tackle the material decomposition problem. We take advantage of the unique X-ray absorption rate of calcium compared to body tissues to perform a preliminary segmentation of calcification from the images using unsupervised learning. We then perform supervised material decomposition using a deep learned UNET model which is trained using GPUs in the high-performant systems at the Oak Ridge Leadership Computing Facility. The method is demonstrated on simulated breast models to decompose calcification, adipose, fibroglandular, and air.
3DIA-104
Layered view synthesis for general images, Loïc Dehan, Wiebe Van Ranst, and Patrick Vandewalle, Katholieke University Leuven (Belgium) [view abstract]
We describe a novel method for monocular view synthesis. The goal of our work is to create a visually pleasing set of horizontally spaced views based on a single image. This can be applied in view synthesis for virtual reality and glasses-free 3D displays. Previous methods produce realistic results on images that show a clear distinction between a foreground object and the background. We aim to create novel views in more general, crowded scenes in which there is no clear distinction. Our main contributions are a computationally efficient method for realistic occlusion inpainting and blending, especially in complex scenes. Our method can be effectively applied to any image, which is shown both qualitatively and quantitatively on a large dataset of stereo images. Our method performs natural disocclusion inpainting and maintains the shape and edge quality of foreground objects.
ISS-329
A self-powered asynchronous image sensor with independent in-pixel harvesting and sensing operations, Ruben Gomez-Merchan, Juan Antonio Leñero-Bardallo, and Ángel Rodríguez-Vázquez, University of Seville (Spain) [view abstract]
A new self-powered asynchronous sensor with a novel pixel architecture is presented. Pixels are autonomous and can harvest or sense energy independently. During the image acquisition, pixels toggle to a harvesting operation mode once they have sensed their local illumination level. With the proposed pixel architecture, most illuminated pixels provide an early contribution to power the sensor, while low illuminated ones spend more time sensing their local illumination. Thus, the equivalent frame rate is higher than the offered by conventional self-powered sensors that harvest and sense illumination in independient phases. The proposed sensor uses a Time-to-First-Spike readout that allows trading between image quality and data and bandwidth consumption. The sensor has HDR operation with a dynamic range of 80 dB. Pixel power consumption is only 70 pW. In the article, we describe the sensor’s and pixel’s architectures in detail. Experimental results are provided and discussed. Sensor specifications are benchmarked against the art.
COLOR-184
Color blindness and modern board games, Alessandro Rizzi1 and Matteo Sassi2; 1Università degli Studi di Milano and 2consultant (Italy) [view abstract]
Board game industry is experiencing a strong renewed interest. In the last few years, about 4000 new board games have been designed and distributed each year. Board game players gender balance is reaching the equality, but nowadays the male component is a slight majority. This means that (at least) around 10% of board game players are color blind. How does the board game industry deal with this ? Recently, a raising of awareness in the board game design has started but so far there is a big gap compared with (e.g.) the computer game industry. This paper presents some data about the actual situation, discussing exemplary cases of successful board games.
5:00 – 6:15 PM EI 2023 All-Conference Welcome Reception (in the Cyril Magnin Foyer)
Tuesday 17 January 2023
Sensors (T1)
Session Chair:
Brian Deegan, National University of Ireland, Galway (Ireland)
9:05 – 9:50 AM
Cyril Magnin I
9:05
Conference Welcome
9:30AVM-122
How much depth information can radar contribute to a depth estimation model?, Chen-Chou Lo and Patrick Vandewalle, Katholieke University Leuven (Belgium) [view abstract]
Recently, many works have proposed to fuse radar data as an additional perceptual signal into monocular depth estimation models because radar data is robust against various light and weather conditions. Although positive results were reported in prior works, it is still hard to tell how much depth information radar can contribute to a depth estimation model. In this paper, we propose radar inference and supervision experiments to investigate the intrinsic depth capability of radar data using state-of-the-art depth estimation models on the nuScenes dataset. In the inference experiment, the model predicts depth by taking only radar as input to demonstrate the inference capability of radar data. In the supervision experiment, a monocular depth estimation model is trained under radar supervision to show the intrinsic depth information that radar can contribute. Our experiments demonstrate that the model with only sparse radar input can detect the shape of surroundings to a certain extent in the predicted depth. Furthermore, the monocular depth estimation model supervised by preprocessed radar achieves a good performance compared to the baseline model trained with sparse lidar supervision.
10:00 AM – 7:30 PM Industry Exhibition - Tuesday (in the Cyril Magnin Foyer)
10:20 – 10:40 AM Coffee Break
Camera Performance Evaluation (T2)
Session Chair:
Patrick Denny, University of Limerick (Ireland)
10:40 AM – 12:40 PM
Cyril Magnin I
10:40AVM-123
Update on progress of IEEE P2020 Automotive Image Quality Working Group, The IEEE P2020 Working Group1, Uwe Artmann2, and Darryl Perks3; 1IEEE Standards Association - P2020 Automotive Image Quality Working Group (United States), 2presenter (Image Engineering GmbH & Co KG) (Germany), and 3presenter (onsemi) (United Kingdom) [view abstract]
The IEEE P2020 Automotive Image Quality Working Group was established in order fill a gap in image quality evaluation created by the unique challenges of automotive imaging. These include both external factors such as the environment the cameras must perform in: extreme weather conditions, high dynamic range scenes, quickly changing lighting, low light, etc. As well as the camera systems themselves, which often include fisheye or wide-angle lenses, HDR sensors, and multi-camera systems. Although other image quality standards exist and are extensively leveraged in P2020, they are not able to address the demands of an automotive system. Substantial work has been completed by the P2020 Working Group and a pre-release document is now available to the public from IEEE. The group is continuing to test and refine the proposed methodologies and encourages participation from new members and feedback on the pre-release from the general public. Once validation testing is completed, the group plans to release the final standard for IEEE balloting in 2023. This presentation will provide an overview of P2020 and a brief technical introduction to each metric and how it addresses the needs of the automotive industry.
11:00AVM-124
An investigation into the impact of image compression on image quality prior to image signal processing, Jordan Cahill1, Brian Deegan2, Patrick Denny3, Enda Ward4, Martin Glavin1, and Edward Jones1; 1University of Galway, 2National University of Ireland, Galway, 3University of Limerick, and 4Valeo Vision Systems (Ireland) [view abstract]
An Image Signal Processor (ISP) is an important aspect of the image acquisition process. It is responsible for converting the information captured at a sensor readout level into an image designed to either be viewed by a human, or for computer vision applications. Traditionally, the resultant image is compressed to remove any redundant information to save on storage costs. However, to date, little study has been carried out on the effects of compressing an image prior to the image being processed by the ISP has on image quality. In this study, we look at the impact of raw image compression on subjective and objective measures of image quality for human viewing applications.
11:20AVM-125
Modulation-transfer function as performance indicator for AI algorithms?, Patrick Müller1 and Alexander Braun2; 1Hochschule Düsseldorf, University of Applied Sciences Düsseldorf and 2Düsseldorf University of Applied Sciences (Germany) [view abstract]
The modulation-transfer function (MTF) is a fundamental optical metric to measure the optical quality of an imaging system. In the automotive industry it is used to qualify camera systems for ADAS/AD. Each modern ADAS/AD system includes evaluation algorithms for environment perception and decision making that are based on AI/ML methods and neural networks. The performance of these AI algorithms is measured by established metrics like Average Precision (AP) or precision-recall-curves. In this article we research the robustness of the link between the optical quality metric and the AI performance metric. A series of numerical experiments were performed with object detection and instance segmentation algorithms (cars, pedestrians) evaluated on image databases with varying optical quality. We demonstrate with these that for strong optical aberrations a distinct performance loss is apparent, but that for subtle optical quality differences – as might arise during production tolerances – this link does not exhibit a satisfactory correlation. This calls into question how reliable the current industry practice is where a produced camera is tested end-of-line (EOL) with the MTF, and fixed MTF thresholds are used to qualify the performance of the camera-under-test.
11:40AVM-126
The influence of image capture and processing on MTF for end of line test and validation, Brian Deegan, Martin Glavin, and Edward Jones, University of Galway (Ireland) [view abstract]
Slanted edge MTF measurement as per ISO12233 is the de facto standard for measuring camera sharpness at manufacturing end of line. MTF measured by slanted edge has a number of advantages for measuring sharpness, being scale invariant, and relatively robust to geometric distortion. However, slanted edge MTF measurement is known to be affected by image processing algorithms, including demosaic, edge enhancement, and denoise algorithms. To avoid these confounding factors, it is increasingly common to measure MTF directly from the raw sensor image. This approach is logical if you are assessing the optomechanical lens-imager alignment and focus. However, end-of-line production testing has specific requirements, including speed of execution, repeatability and reproducibility. These requirements are typically not considered when configuring a camera for end-of-line MTF measurement. In this study, the execution time, repeatability and reproducibility of MTF measurement for multiple image capture and image processing combinations are examined.
12:00AVM-127
Comprehensive stray light (flare) testing: Lessons learned, Jackson S. Knappen, Imatest LLC (United States) [view abstract]
Stray light (also called flare) is any light that reaches the detector (i.e., the image sensor) other than through the designed optical path. Depending on the mechanism causing stray light, it can introduce phantom objects (ghosts) within the scene, reduce contrast over portions of the image, and effectively reduce system dynamic range. These factors can adversely affect the application performance of the camera and, therefore, stray light measurement is to be included in the upcoming IEEE-P2020 standard for measuring automotive image quality. The stray light of a camera can be measured by capturing images of a bright light source positioned at different angles in (or outside of) the camera’s field of view and then processing those captured images into metric images with associated summary statistics. However, the setup and light source can have a significant impact on the measurement. In this paper, we present lessons learned and various technical elements to consider for stray light (flare) testing of digital imaging systems. These elements include the radiometric (e.g., brightness) and geometric (e.g., size) qualities of the light source and setup. Results are to be presented at the conference.
12:20AVM-128
Optical flow for autonomous driving: applications, challenges and improvements, Shihao Shen1, Louis Kerofsky2, and Senthil Yogamani3; 1Carnegie Mellon University (United States), 2Qualcomm Technologies Inc. (United States), and 3QT Technologies Ireland Limited (Ireland) [view abstract]
Estimating optical flow presents unique challenges in AV applications: large translational motion, wide variations in depth of important objects, strong lens distortion in commonly used fisheye cameras and rolling shutter artefacts in dynamic scenes. Even simple translational motion can produce complicated optical flow fields. Lack of ground truth data also creates a challenge. We evaluate recent optical flow methods on fisheye imagery found in AV applications. We explore various training techniques in challenging scenarios and domain adaptation for transferring models trained on synthetic data where ground truth is available to real-world data. We propose novel strategies that facilitate learning robust representations efficiently to address low-light degeneracies. Finally, we discuss the main challenges and open problems in this problem domain.
12:40 – 2:00 PM Lunch
Tuesday 17 January PLENARY: Embedded Gain Maps for Adaptive Display of High Dynamic Range Images
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Images optimized for High Dynamic Range (HDR) displays have brighter highlights and more detailed shadows, resulting in an increased sense of realism and greater impact. However, a major issue with HDR content is the lack of consistency in appearance across different devices and viewing environments. There are several reasons, including varying capabilities of HDR displays and the different tone mapping methods implemented across software and platforms. Consequently, HDR content authors can neither control nor predict how their images will appear in other apps.
We present a flexible system that provides consistent and adaptive display of HDR images. Conceptually, the method combines both SDR and HDR renditions within a single image and interpolates between the two dynamically at display time. We compute a Gain Map that represents the difference between the two renditions. In the file, we store a Base rendition (either SDR or HDR), the Gain Map, and some associated metadata. At display time, we combine the Base image with a scaled version of the Gain Map, where the scale factor depends on the image metadata, the HDR capacity of the display, and the viewing environment.
Eric Chan, Fellow, Adobe Inc. (United States)
Eric Chan is a Fellow at Adobe, where he develops software for editing photographs. Current projects include Photoshop, Lightroom, Camera Raw, and Digital Negative (DNG). When not writing software, Chan enjoys spending time at his other keyboard, the piano. He is an enthusiastic nature photographer and often combines his photo activities with travel and hiking.
Paul M. Hubel, director of Image Quality in Software Engineering, Apple Inc. (United States)
Paul M. Hubel is director of Image Quality in Software Engineering at Apple. He has worked on computational photography and image quality of photographic systems for many years on all aspects of the imaging chain, particularly for iPhone. He trained in optical engineering at University of Rochester, Oxford University, and MIT, and has more than 50 patents on color imaging and camera technology. Hubel is active on the ISO-TC42 committee Digital Photography, where this work is under discussion, and is currently a VP on the IS&T Board. Outside work he enjoys photography, travel, cycling, coffee roasting, and plays trumpet in several bay area ensembles.
3:00 – 3:30 PM Coffee Break
5:30 – 7:00 PM EI 2023 Symposium Demonstration Session (in the Cyril Magnin Foyer)
Wednesday 18 January 2023
10:00 AM – 3:30 PM Industry Exhibition - Wednesday (in the Cyril Magnin Foyer)
10:20 – 10:50 AM Coffee Break
End-to-end Systems (W2)
Session Chair:
Patrick Denny, University of Limerick (Ireland)
11:10 AM – 12:30 PM
Cyril Magnin I
11:10AVM-110
tRANSAC: Dynamic feature accumulation across time for stable online RANSAC model estimation in automotive applications, Shimiao Li1, Yang Song2, Ruijiang Luo1, Zhongyang Huang1, and Chengming Liu1; 1OmniVision Technologies (Singapore) and 2OmniVision Technologies Inc. (United States) [view abstract]
RANdom SAmple Consensus (RANSAC) is widely used in computer vision and automotive related applications. It is an iterative method to estimate parameters of mathematical model from a set of observed data that contains outliers. In computer vision, such observed data is usually a set of features (such as feature points, line segments) extracted from images. In automotive related applications, RANSAC can be used to estimate lane vanishing point, camera rotation angles, ground plane etc. In such applications, changing content of road scene makes stable online model estimation very difficult. In this paper, we propose a framework called tRANSAC to dynamically accumulate features across time so that online RANSAC model estimation can be stably performed. Feature accumulation across time is done in such a dynamic way that when RANSAC tends to perform robustly and stably, accumulated features are discarded fast so that fewer redundant features are used for RANSAC estimation; when RANSAC tends to perform poorly, accumulated features are discarded slowly so that more features can be used for better RANSAC estimation. Experiment results on road scene dataset for camera angle estimation show that the proposed method gives more stable and accurate model compared to baseline method in online RANSAC estimation.
11:30AVM-111
End-to-end evaluation of practical video analytics systems for face detection and recognition, Praneet Singh, Edward J. Delp, and Amy R. Reibman, Purdue University (United States) [view abstract]
Practical video analytics systems that are deployed in bandwidth constrained environments like autonomous vehicles perform computer vision tasks such as face detection and recognition. In an end-to-end face analytics system, inputs are first compressed using popular video codecs like HEVC and then passed onto modules that perform face detection, alignment, and recognition sequentially. Previously, the modules of these systems have been evaluated independently using task-specific imbalanced datasets that can misconstrue performance estimates. In this paper, we perform a thorough end-to-end evaluation of a face analytics system using a driving-specific dataset, which enables meaningful interpretations. We demonstrate how independent task evaluations and dataset imbalances can overestimate system performance. We propose strategies to balance the evaluation dataset and to make its annotations consistent across multiple analytics tasks and scenarios. We then evaluate the end-to-end system performance sequentially to account for task interdependencies. Our experiments show that our approach provides a true estimate of the end-to-end performance for critical real-world systems.
11:50AVM-112
Orchestration of co-operative and adaptive multi-core deep learning engines, Mihir Mody1, Kumar Desappan1, Pramod Swami1, David Smith1, Shyam Jagannathan1, Kevin Lavery1, Gregory Shultz1, Jason Jones1, and Jesse Villarreal2; 1Texas Instruments India Ltd (India) and 2Texas Instruments (United States) [view abstract]
Deep learning (DL)-based algorithms are used in many integral modules of ADAS and Automated Driving Systems. Camera based perception, Driver Monitoring, Driving Policy, Radar and Lidar perception are few of the examples built using DL algorithms in such systems. These real-time DL applications requires huge compute requires up to 250 TOPs to realize them on an edge device. To meet the needs of such SoCs efficiently in-terms of Cost and Power silicon vendor provide a complex SoC with multiple DL engines. These SoCs also comes with all the system resources like L2/L3 on-chip memory, high speed DDR interface, PMIC etc to feed the data and power to utilize these DL engines compute efficiently. These system resource would scale linearly with number of DL engines in the system. This paper proposes solutions to optimizes these system resource to provide cost and Power efficient solution. (1) Co-operative and Adaptive asynchronous DL engines scheduling to optimize the peak resources usage in multiple vectors like memory size, throughput, Power/ Current. (2) Orchestration of Co-operative and Adaptive Multi-core DL Engines to achieve synchronous execution to achieve maximum utilization of all the resources. The proposed solution achieves upto 30% power saving or reducing overhead by 75% in 4 core configuration consisting of 32 TOPS.
12:10AVM-113
opTIFlow – An optimized end-to-end dataflow for accelerating deep learning workloads on heterogeneous SoCs, Shyam Jagannathan1, Vijay Pothukuchi2, Jesse Villarreal2, Kumar Desappan1, Manu Mathew1, Rahul Ravikumar1, Aniket Limaye1, Mihir Mody1, Pramod Swami1, Piyali Goswami1,3, Carlos Rodriguez3, Emmanuel Madrigal3, and Marco Herrera3; 1Texas Instruments India Ltd (India), 2Texas Instruments (United States), and 3RidgeRun (United States) [view abstract]
A typical edge compute SoC capable of handling deep learning workloads at low power is usually heterogeneous by design. It typically comprises multiple initiators such as real-time IPs for capture and display, hardware accelerators for ISP, computer vision, deep learning engines, codecs, DSP or ARM cores for general compute, GPU for 2D/3D visualization. Every participating initiator transacts with common resources such as L3/L4/DDR memory systems to seamlessly exchange data between them. A careful orchestration of this dataflow is important to keep every producer/consumer at full utilization without causing any drop in real-time performance which is critical for automotive applications. The software stack for such complex workflows can be quite intimidating for customers to bring-up and more often act as an entry barrier for many to even evaluate the device for performance. In this paper we propose techniques developed on TI’s latest TDA4V-Mid SoC, targeted for ADAS and autonomous applications, which is designed around ease-of-use but ensuring device entitlement class of performance using open standards such as DL runtimes, OpenVx and GStreamer.
12:30 – 2:00 PM Lunch
Wednesday 18 January PLENARY: Bringing Vision Science to Electronic Imaging: The Pyramid of Visibility
Session Chair: Andreas Savakis, Rochester Institute of Technology (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Electronic imaging depends fundamentally on the capabilities and limitations of human vision. The challenge for the vision scientist is to describe these limitations to the engineer in a comprehensive, computable, and elegant formulation. Primary among these limitations are visibility of variations in light intensity over space and time, of variations in color over space and time, and of all of these patterns with position in the visual field. Lastly, we must describe how all these sensitivities vary with adapting light level. We have recently developed a structural description of human visual sensitivity that we call the Pyramid of Visibility, that accomplishes this synthesis. This talk shows how this structure accommodates all the dimensions described above, and how it can be used to solve a wide variety of problems in display engineering.
Andrew B. Watson, chief vision scientist, Apple Inc. (United States)
Andrew Watson is Chief Vision Scientist at Apple, where he leads the application of vision science to technologies, applications, and displays. His research focuses on computational models of early vision. He is the author of more than 100 scientific papers and 8 patents. He has 21,180 citations and an h-index of 63. Watson founded the Journal of Vision, and served as editor-in-chief 2001-2013 and 2018-2022. Watson has received numerous awards including the Presidential Rank Award from the President of the United States.
3:00 – 3:30 PM Coffee Break
Simulation Methods (W3)
Session Chair:
Alexander Braun, Düsseldorf University of Applied Sciences (Germany)
3:30 – 5:10 PM
Cyril Magnin I
3:30AVM-114
Simulation standards and their impact on the quantification of simulation quality, Marius Dupuis, ASAM e.V. (Germany) [view abstract]
Simulation is key to exposing automated driving systems to a large variety of situations in order to verify their correct functioning within their Operational Design Domain. This must be done from the early design phases to the actual verification and system validation and, at some point, might even have to be done for certification itself. Standards for data files and communication protocols enable data creators and consumers to work independently and match their offerings and requirements as needed. However, standard adoption is patchy and sometimes inconsistent. This issue needs to be addressed. Quantification criteria not only for standard adoption but also for the quality of simulation tools seem ever more important. This presentation aims to shed a light on the current situation and possible solutions from a standardization body's perspective. It will also look beyond standards and into various aspects of simulation for automated driving itself.
3:50AVM-116
Design and validation of a rain model for a realistic automotive simulation environment, Tim Brophy1, Brian Deegan1, Martin Glavin1, Javier Salado2, Ángel Tena2, Patrick Denny3, Enda Ward4, Jonathan Horgan4, and Edward Jones1; 1University of Galway (Ireland), 2Anyverse (Spain), 3University of Limerick (Ireland), and 4Valeo (Ireland) [view abstract]
This paper presents the design of an accurate rain model for the commercially-available Anyverse automotive simulation environment. The model incorporates the key physical properties of rain and is validated against real rain. Due to the high computational complexity of ray tracing through a particle-based model a second more computationally efficient model is proposed. For the 2<sup>nd </sup>model, the rain is modelled using a combination of a particle model and an attenuation field. The attenuation field is fine-tuned against the particle-only model to minimize the difference between the models. Finally, the impact of rain on image quality is examined using a series of calibration charts. The charts have been configured in the simulation environment to maintain constant spatial resolution over different distances. Scaling the charts appropriately allows for comparable IQ metrics to be measured across a range of distances.
4:10AVM-117
Simulating motion blur and exposure time and evaluating its effect on image quality and object detection performance., Hao Lin, University of Galway (Ireland) [view abstract]
Optimizing exposure time for low light scenarios involves a trade-off between motion blur and signal to noise ratio. A method for defining the optimum exposure time for a given function has not been described in the literature. This paper presents the design of a simulation of motion blur and exposure time from the perspective of a real-world camera. The model incorporates characteristics of real-world cameras including the light level (quanta), shot noise and lens distortion. In our simulation, an image quality target chart called the Siemens Star chart will be used, and the simulation outputs a blurred image as if captured from a camera of set exposure and set movement speed. The resulting image is then processed in Imatest in which image quality readings will be extracted from the image and consequently the relationship between exposure time, motion blur and the image quality metrics can be evaluated.
4:30AVM-118
Designing scenes to quantify the performance of automotive perception systems, Zhenyi Liu1, Devesh Shah2, Alireza Rahimpour2, Joyce Farrell1, and Brian Wandell1; 1Stanford University and 2Ford Motor Company (United States) [view abstract]
We implemented an end-to-end simulation for perception systems, based on cameras, that are used in automotive applications. The open-source software creates complex driving scenes and simulates cameras that acquire images of these scenes. The camera images are then used by a neural network in the perception system to identify the locations of scene objects, providing the results as input to the decision system. In this paper, we design collections of test scenes that can be used to quantify the perception system’s performance under a range of (a) environmental conditions (object distance, occlusion ratio, lighting levels), and (b) camera parameters (pixel size, lens type, color filter array). We are designing scene collections to analyze performance for detecting vehicles, traffic signs and vulnerable road users in a range of environmental conditions and for a range of camera parameters. With experience, such scene collections may serve a role similar to that of standardized test targets that are used to quantify camera image quality (e.g., acuity, color).
4:50AVM-119
Design of an automotive platform for computer vision research, Dominik Schörkhuber1, Roman Popp2, Oleksandr Chistov3, Fabian Windbacher4, Michael Hödlmoser4, and Margrit Gelautz1; 1Vienna University of Technology, 2ZKW Lichtsysteme, 3ZKW Group GmbH, and 4emotion3d (Austria) [view abstract]
The goal of our work is to design an automotive platform for AD/ADAS data acquisition and to demonstrate its application to behavior analysis of vulnerable road users. We present a novel data capture platform mounted on a Mercedes GLC vehicle. The car is equipped with an array of sensors and recording hardware including multiple RGB cameras, Lidar, GPS and IMU. For subsequent research on human behavior analysis in traffic scenes, we have conducted two kinds of data recordings. Firstly, we have designed a range of artificial test cases which we recorded on a safety regulated proving ground with stunt persons to capture rare events in traffic scenes in a predictable and structured way. Secondly, we have recorded data on public streets of Vienna, Austria, showing unconstrained pedestrian behavior in an urban setting, while also considering European General Data Protection Regulation (GDPR) requirements. We describe the overall framework including data acquisition and ground truth annotation, and demonstrate its applicability for the implementation and evaluation of selected deep learning models for pedestrian behavior prediction.
5:30 – 7:00 PM EI 2023 Symposium Interactive (Poster) Paper Session (in the Cyril Magnin Foyer)
5:30 – 7:00 PM EI 2023 Meet the Future: A Showcase of Student and Young Professionals Research (in the Cyril Magnin Foyer)