Imaging Sensors and Systems 2023
Monday 16 January 2023
10:20 – 10:50 AM Coffee Break
12:30 – 2:00 PM Lunch
Monday 16 January PLENARY: Neural Operators for Solving PDEs
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Deep learning surrogate models have shown promise in modeling complex physical phenomena such as fluid flows, molecular dynamics, and material properties. However, standard neural networks assume finite-dimensional inputs and outputs, and hence, cannot withstand a change in resolution or discretization between training and testing. We introduce Fourier neural operators that can learn operators, which are mappings between infinite dimensional spaces. They are independent of the resolution or grid of training data and allow for zero-shot generalization to higher resolution evaluations. When applied to weather forecasting, neural operators capture fine-scale phenomena and have similar skill as gold-standard numerical weather models for predictions up to a week or longer, while being 4-5 orders of magnitude faster.
Anima Anandkumar, Bren professor, California Institute of Technology, and senior director of AI Research, NVIDIA Corporation (United States)
Anima Anandkumar is a Bren Professor at Caltech and Senior Director of AI Research at NVIDIA. She is passionate about designing principled AI algorithms and applying them to interdisciplinary domains. She has received several honors such as the IEEE fellowship, Alfred. P. Sloan Fellowship, NSF Career Award, and Faculty Fellowships from Microsoft, Google, Facebook, and Adobe. She is part of the World Economic Forum's Expert Network. Anandkumar received her BTech from Indian Institute of Technology Madras, her PhD from Cornell University, and did her postdoctoral research at MIT and assistant professorship at University of California Irvine.
3:00 – 3:30 PM Coffee Break
EI 2023 Highlights Session
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
3:30 – 5:00 PM
Cyril Magnin II
Join us for a session that celebrates the breadth of what EI has to offer with short papers selected from EI conferences.
NOTE: The EI-wide "EI 2023 Highlights" session is concurrent with Monday afternoon COIMG, COLOR, IMAGE, and IQSP conference sessions.
IQSP-309
Evaluation of image quality metrics designed for DRI tasks with automotive cameras, Valentine Klein, Yiqi LI, Claudio Greco, Laurent Chanas, and Frédéric Guichard, DXOMARK (France) [view abstract]
Driving assistance is increasingly used in new car models. Most driving assistance systems are based on automotive cameras and computer vision. Computer Vision, regardless of the underlying algorithms and technology, requires the images to have good image quality, defined according to the task. This notion of good image quality is still to be defined in the case of computer vision as it has very different criteria than human vision: humans have a better contrast detection ability than image chains. The aim of this article is to compare three different metrics designed for detection of objects with computer vision: the Contrast Detection Probability (CDP) [1, 2, 3, 4], the Contrast Signal to Noise Ratio (CSNR) [5] and the Frequency of Correct Resolution (FCR) [6]. For this purpose, the computer vision task of reading the characters on a license plate will be used as a benchmark. The objective is to check the correlation between the objective metric and the ability of a neural network to perform this task. Thus, a protocol to test these metrics and compare them to the output of the neural network has been designed and the pros and cons of each of these three metrics have been noted.
SD&A-224
Human performance using stereo 3D in a helmet mounted display and association with individual stereo acuity, Bonnie Posselt, RAF Centre of Aviation Medicine (United Kingdom) [view abstract]
Binocular Helmet Mounted Displays (HMDs) are a critical part of the aircraft system, allowing information to be presented to the aviator with stereoscopic 3D (S3D) depth, potentially enhancing situational awareness and improving performance. The utility of S3D in an HMD may be linked to an individual’s ability to perceive changes in binocular disparity (stereo acuity). Though minimum stereo acuity standards exist for most military aviators, current test methods may be unable to characterise this relationship. This presentation will investigate the effect of S3D on performance when used in a warning alert displayed in an HMD. Furthermore, any effect on performance, ocular symptoms, and cognitive workload shall be evaluated in regard to individual stereo acuity measured with a variety of paper-based and digital stereo tests.
IMAGE-281
Smartphone-enabled point-of-care blood hemoglobin testing with color accuracy-assisted spectral learning, Sang Mok Park1, Yuhyun Ji1, Semin Kwon1, Andrew R. O’Brien2, Ying Wang2, and Young L. Kim1; 1Purdue University and 2Indiana University School of Medicine (United States) [view abstract]
We develop an mHealth technology for noninvasively measuring blood Hgb levels in patients with sickle cell anemia, using the photos of peripheral tissue acquired by the built-in camera of a smartphone. As an easily accessible sensing site, the inner eyelid (i.e., palpebral conjunctiva) is used because of the relatively uniform microvasculature and the absence of skin pigments. Color correction (color reproduction) and spectral learning (spectral super-resolution spectroscopy) algorithms are integrated for accurate and precise mHealth blood Hgb testing. First, color correction using a color reference chart with multiple color patches extracts absolute color information of the inner eyelid, compensating for smartphone models, ambient light conditions, and data formats during photo acquisition. Second, spectral learning virtually transforms the smartphone camera into a hyperspectral imaging system, mathematically reconstructing high-resolution spectra from color-corrected eyelid images. Third, color correction and spectral learning algorithms are combined with a spectroscopic model for blood Hgb quantification among sickle cell patients. Importantly, single-shot photo acquisition of the inner eyelid using the color reference chart allows straightforward, real-time, and instantaneous reading of blood Hgb levels. Overall, our mHealth blood Hgb tests could potentially be scalable, robust, and sustainable in resource-limited and homecare settings.
AVM-118
Designing scenes to quantify the performance of automotive perception systems, Zhenyi Liu1, Devesh Shah2, Alireza Rahimpour2, Joyce Farrell1, and Brian Wandell1; 1Stanford University and 2Ford Motor Company (United States) [view abstract]
We implemented an end-to-end simulation for perception systems, based on cameras, that are used in automotive applications. The open-source software creates complex driving scenes and simulates cameras that acquire images of these scenes. The camera images are then used by a neural network in the perception system to identify the locations of scene objects, providing the results as input to the decision system. In this paper, we design collections of test scenes that can be used to quantify the perception system’s performance under a range of (a) environmental conditions (object distance, occlusion ratio, lighting levels), and (b) camera parameters (pixel size, lens type, color filter array). We are designing scene collections to analyze performance for detecting vehicles, traffic signs and vulnerable road users in a range of environmental conditions and for a range of camera parameters. With experience, such scene collections may serve a role similar to that of standardized test targets that are used to quantify camera image quality (e.g., acuity, color).
VDA-403
Visualizing and monitoring the process of injection molding, Christian A. Steinparz1, Thomas Mitterlehner2, Bernhard Praher2, Klaus Straka1,2, Holger Stitz1,3, and Marc Streit1,3; 1Johannes Kepler University, 2Moldsonics GmbH, and 3datavisyn GmbH (Austria) [view abstract]
In injection molding machines the molds are rarely equipped with sensor systems. The availability of non-invasive ultrasound-based in-mold sensors provides better means for guiding operators of injection molding machines throughout the production process. However, existing visualizations are mostly limited to plots of temperature and pressure over time. In this work, we present the result of a design study created in collaboration with domain experts. The resulting prototypical application uses real-world data taken from live ultrasound sensor measurements for injection molding cavities captured over multiple cycles during the injection process. Our contribution includes a definition of tasks for setting up and monitoring the machines during the process, and the corresponding web-based visual analysis tool addressing these tasks. The interface consists of a multi-view display with various levels of data aggregation that is updated live for newly streamed data of ongoing injection cycles.
COIMG-155
Commissioning the James Webb Space Telescope, Joseph M. Howard, NASA Goddard Space Flight Center (United States) [view abstract]
Astronomy is arguably in a golden age, where current and future NASA space telescopes are expected to contribute to this rapid growth in understanding of our universe. The most recent addition to our space-based telescopes dedicated to astronomy and astrophysics is the James Webb Space Telescope (JWST), which launched on 25 December 2021. This talk will discuss the first six months in space for JWST, which were spent commissioning the observatory with many deployments, alignments, and system and instrumentation checks. These engineering activities help verify the proper working of the telescope prior to commencing full science operations. For the session: Computational Imaging using Fourier Ptychography and Phase Retrieval.
HVEI-223
Critical flicker frequency (CFF) at high luminance levels, Alexandre Chapiro1, Nathan Matsuda1, Maliha Ashraf2, and Rafal Mantiuk3; 1Meta (United States), 2University of Liverpool (United Kingdom), and 3University of Cambridge (United Kingdom) [view abstract]
The critical flicker fusion (CFF) is the frequency of changes at which a temporally periodic light will begin to appear completely steady to an observer. This value is affected by several visual factors, such as the luminance of the stimulus or its location on the retina. With new high dynamic range (HDR) displays, operating at higher luminance levels, and virtual reality (VR) displays, presenting at wide fields-of-view, the effective CFF may change significantly from values expected for traditional presentation. In this work we use a prototype HDR VR display capable of luminances up to 20,000 cd/m^2 to gather a novel set of CFF measurements for never before examined levels of luminance, eccentricity, and size. Our data is useful to study the temporal behavior of the visual system at high luminance levels, as well as setting useful thresholds for display engineering.
HPCI-228
Physics guided machine learning for image-based material decomposition of tissues from simulated breast models with calcifications, Muralikrishnan Gopalakrishnan Meena1, Amir K. Ziabari1, Singanallur Venkatakrishnan1, Isaac R. Lyngaas1, Matthew R. Norman1, Balint Joo1, Thomas L. Beck1, Charles A. Bouman2, Anuj Kapadia1, and Xiao Wang1; 1Oak Ridge National Laboratory and 2Purdue University (United States) [view abstract]
Material decomposition of Computed Tomography (CT) scans using projection-based approaches, while highly accurate, poses a challenge for medical imaging researchers and clinicians due to limited or no access to projection data. We introduce a deep learning image-based material decomposition method guided by physics and requiring no access to projection data. The method is demonstrated to decompose tissues from simulated dual-energy X-ray CT scans of virtual human phantoms containing four materials - adipose, fibroglandular, calcification, and air. The method uses a hybrid unsupervised and supervised learning technique to tackle the material decomposition problem. We take advantage of the unique X-ray absorption rate of calcium compared to body tissues to perform a preliminary segmentation of calcification from the images using unsupervised learning. We then perform supervised material decomposition using a deep learned UNET model which is trained using GPUs in the high-performant systems at the Oak Ridge Leadership Computing Facility. The method is demonstrated on simulated breast models to decompose calcification, adipose, fibroglandular, and air.
3DIA-104
Layered view synthesis for general images, Loïc Dehan, Wiebe Van Ranst, and Patrick Vandewalle, Katholieke University Leuven (Belgium) [view abstract]
We describe a novel method for monocular view synthesis. The goal of our work is to create a visually pleasing set of horizontally spaced views based on a single image. This can be applied in view synthesis for virtual reality and glasses-free 3D displays. Previous methods produce realistic results on images that show a clear distinction between a foreground object and the background. We aim to create novel views in more general, crowded scenes in which there is no clear distinction. Our main contributions are a computationally efficient method for realistic occlusion inpainting and blending, especially in complex scenes. Our method can be effectively applied to any image, which is shown both qualitatively and quantitatively on a large dataset of stereo images. Our method performs natural disocclusion inpainting and maintains the shape and edge quality of foreground objects.
ISS-329
A self-powered asynchronous image sensor with independent in-pixel harvesting and sensing operations, Ruben Gomez-Merchan, Juan Antonio Leñero-Bardallo, and Ángel Rodríguez-Vázquez, University of Seville (Spain) [view abstract]
A new self-powered asynchronous sensor with a novel pixel architecture is presented. Pixels are autonomous and can harvest or sense energy independently. During the image acquisition, pixels toggle to a harvesting operation mode once they have sensed their local illumination level. With the proposed pixel architecture, most illuminated pixels provide an early contribution to power the sensor, while low illuminated ones spend more time sensing their local illumination. Thus, the equivalent frame rate is higher than the offered by conventional self-powered sensors that harvest and sense illumination in independient phases. The proposed sensor uses a Time-to-First-Spike readout that allows trading between image quality and data and bandwidth consumption. The sensor has HDR operation with a dynamic range of 80 dB. Pixel power consumption is only 70 pW. In the article, we describe the sensor’s and pixel’s architectures in detail. Experimental results are provided and discussed. Sensor specifications are benchmarked against the art.
COLOR-184
Color blindness and modern board games, Alessandro Rizzi1 and Matteo Sassi2; 1Università degli Studi di Milano and 2consultant (Italy) [view abstract]
Board game industry is experiencing a strong renewed interest. In the last few years, about 4000 new board games have been designed and distributed each year. Board game players gender balance is reaching the equality, but nowadays the male component is a slight majority. This means that (at least) around 10% of board game players are color blind. How does the board game industry deal with this ? Recently, a raising of awareness in the board game design has started but so far there is a big gap compared with (e.g.) the computer game industry. This paper presents some data about the actual situation, discussing exemplary cases of successful board games.
5:00 – 6:15 PM EI 2023 All-Conference Welcome Reception (in the Cyril Magnin Foyer)
Tuesday 17 January 2023
Sensor Design I (T1)
Session Chairs:
Jon McElvain, Dolby Laboratories (United States) and Min-Woong Seo, Samsung Electronics (Republic of Korea)
9:05 – 10:10 AM
Powell I/II
9:05
Conference Welcome
9:10ISS-328
Simulation and design of a burst mode 20Mfps global shutter high conversion gain CMOS image sensor in a standard 180nm CMOS image sensor process using sequential transfer gates, Xin Yue and Eric R. Fossum, Dartmouth College (United States) [view abstract]
A sequential transfer-gate and photodiode optimization method for CMOS Image sensors are described in this paper, which enables the design of large-scale ultra-high-speed burst mode CMOS Image sensors in a low-cost standard CMOS Image sensor process without the need for process customization or advanced process. The sequential transfer gates also show a clear advantage in minimizing the floating diffusion capacitance and improving image sensor conversion gain in large-scale pixels.
9:30ISS-329
A self-powered asynchronous image sensor with independent in-pixel harvesting and sensing operations, Ruben Gomez-Merchan, Juan Antonio Leñero-Bardallo, and Ángel Rodríguez-Vázquez, University of Seville (Spain) [view abstract]
A new self-powered asynchronous sensor with a novel pixel architecture is presented. Pixels are autonomous and can harvest or sense energy independently. During the image acquisition, pixels toggle to a harvesting operation mode once they have sensed their local illumination level. With the proposed pixel architecture, most illuminated pixels provide an early contribution to power the sensor, while low illuminated ones spend more time sensing their local illumination. Thus, the equivalent frame rate is higher than the offered by conventional self-powered sensors that harvest and sense illumination in independient phases. The proposed sensor uses a Time-to-First-Spike readout that allows trading between image quality and data and bandwidth consumption. The sensor has HDR operation with a dynamic range of 80 dB. Pixel power consumption is only 70 pW. In the article, we describe the sensor’s and pixel’s architectures in detail. Experimental results are provided and discussed. Sensor specifications are benchmarked against the art.
9:50ISS-330
Highly sensitive mutual-capacitive fingerprint sensor with reference electrode, Junghoon Yang, Sarawut Siracosit, and Sang-Hee Ko Park, Korea Advanced Institute of Science and Technology (Republic of Korea) [view abstract]
The sensitivity of existing fingerprint sensors (FPSs) can decrease considerably owing to environmental factors and parasitic capacitance. To overcome this limitation, this paper proposes a highly sensitive 300 dpi mutual-capacitive fingerprint sensor (FPS) with uniquely designed reference lines for device security. Specifically, the reference lines of the FPS induce noise cancellation. Images of fingertips under dry, wet, and oily surface conditions were obtained in the presence and absence of the reference lines. The results showed that the fingerprints were significantly distorted in anomalous surface environments when the reference lines were not used. However, when the reference lines were used, the sensitivity improved irrespective of the environmental conditions. Furthermore, the proposed FPS exhibited a 165% increase in the signal-to-noise ratio (SNR), which significantly improved the sensing capability. Therefore, we believe the proposed FPS can increase device security owing to its excellent performance.
10:00 AM – 7:30 PM Industry Exhibition - Tuesday (in the Cyril Magnin Foyer)
10:20 – 10:50 AM Coffee Break
KEYNOTE: Innovative Imaging Systems (T2)
Session Chairs: Francisco Imai, Apple Inc. (United States) and Kevin Matherson, Microsoft (United States)
10:50 AM – 12:30 PM
Powell I/II
10:50ISS-331
KEYNOTE: Metaphotonic routers for solid-state imaging: Making every photon count, Peter B. Catrysse, Stanford University (United States) [view abstract]
Dr. Peter B. Catrysse is a Senior Research Scientist in the E. L. Ginzton Laboratory (Stanford University). He holds a PhD and an MSc in Electrical Engineering from Stanford University. With his doctoral research, he pioneered the integration of subwavelength metal optics for color filtering in standard deep-submicron CMOS technology. His recent work focuses on metaphotonics at the interface between fundamental physics and imaging applications. Dr. Catrysse has published more than 120 peer-reviewed papers, presented over 40 invited talks, and has been awarded 8 patents. He was named one of “50 Tech” pioneers by the Belgian Financial Times (2017) and is featured in the top 1% leading Engineering and Technology Scientists on the academic portal Research (2022). Dr. Catrysse is a Fellow of the Optical Society (Optica), a Fellow of the SPIE, a Senior Member of the IEEE, and a Hoover Foundation Brussels Fellow of the BAEF.
Solid-state imaging relies on multiple optical functionalities, which are ideally photon efficient. Color is, for example, an important functionality in visible imaging. Achieving color functionality without loss of photons, however, represents a long standing challenge in integrated imaging systems. The standard approach uses absorbing color filters in a color filter array, which is very photon inefficient. We recently introduced the concept of a metaphotonic color router that overcomes this long standing challenge. A color router exploits the large number of degrees of freedom that are available when the optical stack region above the pixel photodetectors is nanopatterned with dielectric materials. It is a lossless device that routes all incident light based on color content directly, i.e., without any additional propagation, to the photodetectors. As a result, the color router can achieve color functionality without loss of photons, with a broadband, polarization-independent, and angularly robust response. In this talk, I will describe the color router as well as additional opportunities for metaphotonic routers.
11:30ISS-332
DiffuserCam: Multi-dimensional lensless imaging (Invited), Laura Waller, University of California, Berkeley (United States) [view abstract]
We describe a computational camera that enables single-shot multi-dimensional imaging with simple hardware and scalable software for easy reproducibility. We demonstrate compact hardware and compressed sensing reconstructions for 3D fluorescence measurements with high resolution across a large volume, hyper-spectral imaging, and temporal super-resolution – recovering a video from a single-shot capture. Our inverse algorithms are based on large-scale nonlinear non-convex optimization combined with unrolled neural networks. Applications demonstrated include whole organism bioimaging and neural activity tracking in vivo.
11:50ISS-333
Wide-viewing-zone light-field capturing using Turtleback convex reflector (JIST-first), Hiroaki Yano and Tomohiro Yendo, Nagaoka University of Technology (Japan) [view abstract]
Light field cameras have been used for the 3-dimensional geometrical measurement or the refocusing of a captured photo. In this paper, we propose the light field acquiring method using a spherical mirror array. By employing the mirror array and two cameras, a virtual camera array that captures an object from around it can be generated. Since a large number of virtual cameras can be constructed from two real cameras, an affordable high-density camera array can be achieved using this method. Furthermore, the spherical mirrors enable the capturing of large objects as compared to the previous methods. We conducted simulations to capture the light field, and then synthesized arbitrary viewpoint images of the object with an observation from 360 degrees around it. The ability of this system to refocus assuming a large aperture is also confirmed. We have also built a prototype that approximates the proposal to conduct a capturing experiment in order to ensure the system's feasibility.
12:10ISS-334
Digital camera obscuras, Henry G. Dietz, University of Kentucky (United States) [view abstract]
A camera obscura is a darkened chamber in which an image of the scene outside the chamber is projected by a pinhole or other optic onto a screen within the chamber. Early obscuras used pinhole optics, but by the 16th century obscuras with lenses became popular as aids for drawing or painting scenes with the correct perspective. By the late 19th century, the screen had largely been replaced with photo-sensitive materials, and film cameras replaced obscuras. Over the last few decades, digital cameras using electronic sensors have replaced those using film. However, large projections can have significantly different properties from small projections, and it is very difficult to build a large digital image sensor. Thus, there is interest in using a small-sensor digital camera to photograph the large screen of a obscura. For example, it is relatively easy to obtain much shallower depth of field using a large screen, but a small sensor photographing the screen essentially copies that depth of field, so obscuras have often been used as “bokeh adapters” for small-sensor digital cameras. The current work is an experimentally-grounded exploration of the issues that arise in construction of digital camera obscuras, their use, and exposure control and postprocessing of digital images captured using a small sensor to photograph the image projected on an obscura’s screen.
12:30 – 2:00 PM Lunch
Tuesday 17 January PLENARY: Embedded Gain Maps for Adaptive Display of High Dynamic Range Images
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Images optimized for High Dynamic Range (HDR) displays have brighter highlights and more detailed shadows, resulting in an increased sense of realism and greater impact. However, a major issue with HDR content is the lack of consistency in appearance across different devices and viewing environments. There are several reasons, including varying capabilities of HDR displays and the different tone mapping methods implemented across software and platforms. Consequently, HDR content authors can neither control nor predict how their images will appear in other apps.
We present a flexible system that provides consistent and adaptive display of HDR images. Conceptually, the method combines both SDR and HDR renditions within a single image and interpolates between the two dynamically at display time. We compute a Gain Map that represents the difference between the two renditions. In the file, we store a Base rendition (either SDR or HDR), the Gain Map, and some associated metadata. At display time, we combine the Base image with a scaled version of the Gain Map, where the scale factor depends on the image metadata, the HDR capacity of the display, and the viewing environment.
Eric Chan, Fellow, Adobe Inc. (United States)
Eric Chan is a Fellow at Adobe, where he develops software for editing photographs. Current projects include Photoshop, Lightroom, Camera Raw, and Digital Negative (DNG). When not writing software, Chan enjoys spending time at his other keyboard, the piano. He is an enthusiastic nature photographer and often combines his photo activities with travel and hiking.
Paul M. Hubel, director of Image Quality in Software Engineering, Apple Inc. (United States)
Paul M. Hubel is director of Image Quality in Software Engineering at Apple. He has worked on computational photography and image quality of photographic systems for many years on all aspects of the imaging chain, particularly for iPhone. He trained in optical engineering at University of Rochester, Oxford University, and MIT, and has more than 50 patents on color imaging and camera technology. Hubel is active on the ISO-TC42 committee Digital Photography, where this work is under discussion, and is currently a VP on the IS&T Board. Outside work he enjoys photography, travel, cycling, coffee roasting, and plays trumpet in several bay area ensembles.
3:00 – 3:30 PM Coffee Break
Image Processing (T3)
Session Chairs:
Jon McElvain, Dolby Laboratories (United States) and Nitin Sampat, Edmund Optics, Inc (United States)
3:30 – 4:50 PM
Powell I/II
3:30ISS-335
Panoramic Photoacoustic Computed Tomography (PACT): From small-animal wholebody imaging to human breast cancer diagnosis (Invited), Lei Li, Rice University (United States) [view abstract]
We recently developed a dream machine, demonstrating that a stand-alone single-impulse panoramic photoacoustic computed tomography (SIP-PACT) achieves high spatiotemporal resolution, deep penetration, anatomical and functional contrasts, and full-view fidelity. SIP-PACT has imaged in vivo whole-body dynamics of small animals in real time, mapped whole-brain functional connectivity, and tracked circulation tumor cells without labeling. Next, we scaled up SIP-PACT and developed the 2nd generation panoramic PACT, termed single-breath-hold PACT (SBH-PACT), to reveal detailed angiographic structures in human breasts. By scanning the entire breast (4 cm in depth) within a single breath hold (~15 s), a volumetric image can be acquired with negligible breathing-induced motion artifacts. SBH-PACT clearly reveals tumors by observing higher blood vessel densities associated with tumors, showing early promise for high sensitivity in radiographically dense breasts. The high imaging speed enables photoacoustic elastography in the breast, identifying tumors by showing less compliance. We imaged breast cancer patients with breast sizes ranging from B cup to DD cup and skin pigmentations ranging from light to dark. Panoramic PACT provided a promising tool for future clinical use, including screening and diagnostic studies to determine the extent of disease, to assist in surgical treatment planning, and to assess responses to neoadjuvant chemotherapy.
3:50ISS-336
Array camera image fusion using physics-aware transformers (JIST-first), Qian Huang, Minghao Hu, and David J. Brady, The University of Arizona (United States) [view abstract]
We demonstrate a physics-aware transformer for feature-based data fusion from cameras with diverse resolution, color spaces, focal planes, focal lengths, and exposure. We also demonstrate a scalable solution for synthetic training data generation for the transformer using open-source computer graphics software. We demonstrate image synthesis on arrays with diverse spectral responses, instantaneous field of view and frame rate.
4:10ISS-337
Self-supervised intensity-event stereo matching (JIST-first), Jinjin Gu1, Jinan Zhou2, Ringo S. Chu3, Yan Chen4, Jiawei Zhang4, Xuanye Cheng4, Song Zhang4, and Jimmy S. Ren3,5,6; 1The University of Sydney (Australia), 2The Chinese University of Hong Kong (Hong Kong), 3Sensetime Research HK (Hong Kong), 4SenseTime Research (China), 5Qing Yuan Research Institute (Hong Kong), and 6Shanghai Jiao Tong University (China) [view abstract]
Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy with a high dynamic range and low power consumption. Despite these advantages, event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously. This paper aims to connect a standalone event camera and a modern intensity camera so that the applications can take advantage of both two sensors. We establish this connection through a multi-modal stereo matching task. We first convert events to a reconstructed image and extend the existing stereo networks to this multi-modality condition. We propose a self-supervised method to train the multi-modal stereo network without using ground truth disparity data. The structure loss calculated on image gradients is used to enable self-supervised learning on such multi-modal data. Exploiting the internal stereo constraint between views with different modalities, we introduce general stereo loss functions, including disparity cross-consistency loss and internal disparity loss, leading to improved performance and robustness compared to existing approaches. The experiments demonstrate the effectiveness of the proposed method, especially the proposed general stereo loss functions, on both synthetic and real datasets. At last, we shed light on employing the aligned events and intensity images in downstream tasks, e.g., video interpolation application.
4:30ISS-340
Improvement of a facial recognition system based on one shot camera, Médégnonmi E. Houssou1,2, Amadou Tidjani Sanda Mahama1,2, Pierre Gouton1, and Guy Degla2; 1University of Burgundy (France) and 2University of Abomey-Calavi (Benin) [view abstract]
In recent years, one-shot cameras that integrate Multispectral Filter Arrays (MSFA) are used to acquire multispectral images. In a previous paper, we have proposed a multispectral image recognition system based on this type of camera. The images acquired with these cameras are then demosaiced. Multispectral facial images acquired with our MSFA one-shot camera present information redundancy which leads to a strong correlation between bands. A dimensionality reduction is necessary to reduce information redundancy. Dimensionality reduction is a set of techniques that allow to project an initial image of dimension n into a final image of dimension p, while preserving its relevant information. This paper proposes an improvement of facial recognition system using the Multispectral Filter Array one shot camera. A dimensionality reduction module has been added to the system. A comparison of the performance of different dimensionality reduction methods based on the eigenvalues, and VGG19 classification results are conducted. Experimental results on the EXIST database made up with our camera indicate a good decorrelation of the bands leading to the reduction of bands from eight to three with the Karhuen-Love transform an accuracy of 100% with VGG19 and a 15 % gain in processing time.
5:30 – 7:00 PM EI 2023 Symposium Demonstration Session (in the Cyril Magnin Foyer)
Wednesday 18 January 2023
KEYNOTE: Processing at the Edge (W1)
Session Chairs: Stanley Chan, Purdue University (United States) and Boyd Fowler, OmniVision Technologies (United States)
8:45 – 10:20 AM
Market Street
This session is jointly sponsored by: Computational Imaging XXI, Imaging Sensors and Systems 2023, and the International Image Sensor Society (IISS).
8:45
COIMG/ISS Joint Sessions Welcome
8:50COIMG-177
KEYNOTE: Deep optics: Learning cameras and optical computing systems, Gordon Wetzstein, Stanford University (United States) [view abstract]
Gordon Wetzstein is an Associate Professor of Electrical Engineering and, by courtesy, of Computer Science at Stanford University. He is the leader of the Stanford Computational Imaging Lab and a faculty co-director of the Stanford Center for Image Systems Engineering. At the intersection of computer graphics and vision, artificial intelligence, computational optics, and applied vision science, Prof. Wetzstein's research has a wide range of applications in next-generation imaging, wearable computing, and neural rendering systems. Prof. Wetzstein is a Fellow of Optica and the recipient of numerous awards, including an NSF CAREER Award, an Alfred P. Sloan Fellowship, an ACM SIGGRAPH Significant New Researcher Award, a Presidential Early Career Award for Scientists and Engineers (PECASE), an SPIE Early Career Achievement Award, an Electronic Imaging Scientist of the Year Award, an Alain Fournier Ph.D. Dissertation Award as well as many Best Paper and Demo Awards.
Neural networks excel at a wide variety of imaging and perception tasks, but their high performance also comes at a high computational cost and their success on edge devices is often limited. In this talk, we explore hybrid optical-electronic strategies to computational imaging that outsource parts of the algorithm into the optical domain or into emerging in-pixel processing capabilities. Using such a co-design of optics, electronics, and image processing, we can learn application-domain-specific cameras using modern artificial intelligence techniques or compute parts of a convolutional neural network in optics with little to no computational overhead. For the session: Processing at the Edge (joint with ISS).
9:40COIMG-178
Computational photography on a smartphone, Michael Polley, Samsung Research America (United States) [view abstract]
Many of the recent advances in smartphone camera quality and features can be attributed to computational photography. However, the increased computational requirements must be balanced with cost, power, and other practical concerns. In this talk, we look at the embedded signal processing currently applied, including new AI-based solutions in the signal chain. By taking advantage of increasing computational performances of traditional processor cores, and additionally tapping into the exponentially increasing capabilities of the new compute engines such as neural processing units, we are able to deliver on-device computational imaging. For the session: Processing at the Edge (joint with ISS).
10:00COIMG-179
Analog in-memory computing with multilevel RRAM for edge electronic imaging application, Glenn Ge, Teramem Inc. (United States) [view abstract]
Conventional digital processors based on the von Neumann architecture have an intrinsic bottleneck in data transfer between processing and memory units. This constraint increasingly limits performance as data sets continue to grow exponentially for the various applications, especially for the Electronic Imaging Applications at the edge, for instance, the AR/VR wearable and automotive applications. TetraMem addresses this issue by delivering state-of-the-art in-memory computing using our proprietary non-volatile computing devices. This talk will discuss how TetraMem’s solution brings several orders of magnitude improvement in computing throughput and energy efficiency, ideal for those AI fusion sensing applications at the edge. For the session: Processing at the Edge (joint with ISS).
10:00 AM – 3:30 PM Industry Exhibition - Wednesday (in the Cyril Magnin Foyer)
10:20 – 10:50 AM Coffee Break
Processing at the Edge (W2.1)
Session Chairs:
Stanley Chan, Purdue University (United States) and Boyd Fowler, OmniVision Technologies (United States)
10:50 – 11:50 AM
Market Street
This session is jointly sponsored by: Computational Imaging XXI, Imaging Sensors and Systems 2023, and the International Image Sensor Society (IISS).
10:50COIMG-180
Processing of real time, bursty and high compute iToF data on the edge (Invited), Cyrus Bamji, Microsoft Corporation (United States) [view abstract]
In indirect time of flight (iToF), a depth frame is computed from multiple image captures (often 6-9 captures) which are composed together and processed using nonlinear filters. iToF sensor output bandwidth is high and inside the camera special purpose DSP hardware significantly improves power, cost and shuffling around of large amounts of data. Usually only a small percentage of depth frames need application specific processing and highest quality depth data both of which are difficult to compute within the limited hardware resources of the camera. Due to the sporadic nature of these compute requirements hardware utilization is improved by offloading this bursty compute to outside the camera. Many applications in the Industrial and commercial space have a real time requirement and may even use multiple cameras that need to be synchronized. These real time requirements coupled with the high bandwidth from the sensor makes offloading the compute purely into the cloud difficult. Thus, in many cases the compute edge can provide a goldilocks zone for this bursty high bandwidth and real-time processing requirement. For the session: Processing at the Edge (joint with ISS).
11:10COIMG-181
A distributed on-sensor compute system in AR/VR devices and neural architecture search (NAS) framework for optimal workload distribution (Invited), Chiao Liu1, Xin Dong2, Ziyun Li1, Barbara De Salvo3, and H. T. Kung2; 1Reality Labs, 2Harvard University, and 3Meta (United States) [view abstract]
Augmented Reality (AR) will be the next great wave of human oriented computing, dominating our relationship with the digital world for the next 50 years. The combined requirements of lowest power, best performance, and minimal form factor makes AR sensors the new frontier. Previously we presented a digital pixel sensor (DPS) that could be the optimal sensor architecture for AR applications. We further presented a distributed on-sensor compute architecture, coupled with new 3-layer sensor stacking technologies to enable the system to distribute the computation between sensors and main SoC in an AR system. In this talk, we study a deep neural network (DNN) as work load example and the network’s optimal splitting layer location to meet system performance requirements such as inference accuracy and latency under the given hardware resource constraint. We designed a split-aware neural architecture search (NAS) framework, SplitNets, to conduct model design, split, and communication reduction simultaneously. We validated SplitNets on ImageNet, and show that the SplitNets framework achieves state-of-the-art (SOTA) performance and system latency compared with existing approaches. For the session: Processing at the Edge (joint with ISS).
11:30ISS-182
A 2.2um three-wafer stacked back side illuminated voltage domain global shutter CMOS image sensor, Shimpei Fukuoka, OmniVision (Japan) [view abstract]
Due to the emergence of machine vision, augmented reality (AR), virtual reality (VR), and automotive connectivity in recent years, the necessity for chip miniaturization has grown. These emerging, next-generation applications, which are centered on user experience and comfort, require their constituent chips, devices, and parts to be smaller, lighter, and more accessible. AR/VR applications, especially demand smaller components due to their primary application towards wearable technology, in which the user experience would be negatively impacted by large features and bulk. Therefore, chips and devices intended for next-generation consumer applications must be small and modular, to support module miniaturization and promote user comfort. To enable the chip miniaturization required for technological advancement and innovation, we developed a 2.2μm pixel pitch Back Side Illuminated (BSI) Voltage Domain Global Shutter (VDGS) image sensor with the three-wafer stacked technology. Each wafer is connected by Stacked Pixel Level Connection (SPLC) and the middle and logic wafers are connected using a Back side Through Silicon Via (BTSV). The separation of the sensing, charge storage, and logic functions to different wafers allows process optimization in each wafer, improving overall chip performance. The peripheral circuit region is reduced by 75% compared to the previous product without degrading image sensor performance. For the session: Processing at the Edge (joint with COIMG).
12:30 – 2:00 PM Lunch
Wednesday 18 January PLENARY: Bringing Vision Science to Electronic Imaging: The Pyramid of Visibility
Session Chair: Andreas Savakis, Rochester Institute of Technology (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Electronic imaging depends fundamentally on the capabilities and limitations of human vision. The challenge for the vision scientist is to describe these limitations to the engineer in a comprehensive, computable, and elegant formulation. Primary among these limitations are visibility of variations in light intensity over space and time, of variations in color over space and time, and of all of these patterns with position in the visual field. Lastly, we must describe how all these sensitivities vary with adapting light level. We have recently developed a structural description of human visual sensitivity that we call the Pyramid of Visibility, that accomplishes this synthesis. This talk shows how this structure accommodates all the dimensions described above, and how it can be used to solve a wide variety of problems in display engineering.
Andrew B. Watson, chief vision scientist, Apple Inc. (United States)
Andrew Watson is Chief Vision Scientist at Apple, where he leads the application of vision science to technologies, applications, and displays. His research focuses on computational models of early vision. He is the author of more than 100 scientific papers and 8 patents. He has 21,180 citations and an h-index of 63. Watson founded the Journal of Vision, and served as editor-in-chief 2001-2013 and 2018-2022. Watson has received numerous awards including the Presidential Rank Award from the President of the United States.
3:00 – 3:30 PM Coffee Break
KEYNOTE: Sensor Design II (W3)
Session Chairs: Min-Woong Seo, Samsung Electronics (Republic of Korea) and Hari Tagat, Casix (United States)
3:30 – 5:10 PM
Powell I/II
3:30ISS-341
KEYNOTE: Event camera noise and denoising, Tobi Delbrück, Institute of Neuroinformatics, University of Zurich and ETH Zurich (Switzerland) [view abstract]
Tobi Delbrück (IEEE M'99, SM'06, F'13) received his BSc in physics from University of California in 1986 and his PhD from Caltech in 1993 as the first student with the Computation and Neural Systems program with PhD supervisor Carver Mead. He is an ETH Honorary Professor of Physics and Electrical Engineering, and has been with the Institute of Neuroinformatics, University of Zurich and ETH Zurich since 1998. The Sensors Group that he co-directs together with Prof. Shih-Chii Liu works on a broad range of topics covering device physics to computer vision and control, with a theme of efficient neuromorphic processing in hardware. He co-organizes the Telluride Neuromorphic Engineering workshop and has organized live demonstration sessions at ISCAS, NeuIPS, and AICAS and two conference Sessions at ISCAS. Delbrück is past Chair of the IEEE CAS Sensory Systems Technical Committee. He worked on electronic imaging at Arithmos, Synaptics, National Semiconductor, and Foveon and has co-founded 3 companies (Inilabs, Insightness and Inivation). His papers have been awarded 13 IEEE awards and he was named a Fellow of the IEEE Circuits and Systems Society for his work on neuromorphic sensors and processing. He likes to read storybooks, play tennis, and sometimes tries card magic.
Event cameras, like Dynamic Vision Sensors, mimic biology’s eyes. They output sparse, quick events rather than regular Nyquist samples, enabling vision systems that can respond quickly at low average power consumption, so that they can beat the usual power-latency tradeoff of frame-based vision. They have significant amounts of noise. How this noise arises and how to remove the noise without removing the signal is an interesting subject on which we have focused our research.
4:10ISS-342
Quantum efficiency of various miniaturized backside illuminated CMOS pixels under ultraviolet illumination, Nour Fassi1,2, Jean-Pierre Carrère1, Pierre Magnan2, Magali Estribeau2, and Vincent Goiffon2; 1STMicroelectronics and 2ISAE-SUPAERO (France) [view abstract]
Up to now, backside illuminated (BSI) CMOS miniaturized pixels have been used and manufactured for visible (Vis) and/or near infrared (NIR) light. This work is focused on the performance under UV light, in the range of [200 nm, 400 nm], of such BSI CMOS miniaturized pixels, initially developed for the Vis and NIR spectrum in mind, which have been understudied until now. This performance evaluation is based on quantum efficiency (QE) measurements of various pixel types to examine how the good signal to noise ratio (SNR) in the Vis is modified in the UV spectrum. The pixels measured in this campaign are all miniaturized backside illuminated (BSI) CMOS because they have a better light to charge conversion compared to their frontside illuminated counterparts: the architecture offers the advantage of direct access the Si substrate and having a variety of possible thinner antireflection coating stacks (ARC) thanks to the evolution of CMOS BSI passivation techniques. Optical simulations had been performed to identify the key parameters that could play a role for the pixel’s response improvement in the UV. Despite the lack of any process optimization, we can observe significant response of the sensor under UV.
4:30ISS-343
Color performance of 0.8 um CMOS image sensor with CMY color filters, An-Li Kuo, Pohsiang Wang, Hao-Wei Liu, William Tsai, Chia-Ning Hsu, Chien-Wen Lai, Yu C. Chang, Ching-Chiang Wu, and Ken Wu, VisEra Technologies (Taiwan) [view abstract]
Modern digital cameras capture images with a subsampling method via mosaic color filter array (CFA). Of particular interest is the CFA with Cyan-Magenta-Yellow (CMY) color filters. Despite the improvement of the sensitivity, images reconstructed from CMY CFA usually suffer from lower color fidelity compared to the conventional Red-Green-Blue (RGB) color filters. In this paper, we proposed a CMY CFA with novel spectral sensitivities (CMY2.0), which were carefully designed to overcome the shortcomings of previous CMY CFA (CMY1.0) [1]. A CMY CMOS image sensor (CIS) with such optimized spectral sensitivities is then realized in order to evaluate the color performance and signal-to-noise ratio (SNR). As a result, the camera equipped with the CMY CFA with proposed spectral sensitivities (CMY2.0) features both an improved sensitivity and a high color fidelity, which is suitable for a wide range of applications, such as low-light photography, under-screen cameras and automotive cameras.
4:50ISS-344
Reset noise reduction method in 3-T pixels, Kaitlin M. Anagnost, Xin Yue, and Eric R. Fossum, Dartmouth College (United States) [view abstract]
A reset noise reduction method using a feedback amplifier that results in an 80% noise reduction in 3-transistor (3-T) pixels is presented. 3-T pixels are useful for non-visible imaging applications because they have fewer post-processing issues than 4-T pixels and do not require charge transfer. They suffer from reset noise because correlated-double sampling cannot be realized without additional memory. Analysis of the experimental power spectral density indicates potential for further noise cancellation in future devices.
Imaging Sensors and Systems 2023 Interactive (Poster) Paper Session
5:30 – 7:00 PM
Cyril Magnin Foyer
The following works will be presented at the EI 2023 Symposium Interactive (Poster) Paper Session.
ISS-345
DevCAM: An open-source multi-camera development system for embedded vision, Meher Akhil Birlangi, Dominique E. Meyer, and Falko Kuester, University of California, San Diego (United States) [view abstract]
Computer vision algorithms are often burdened for embedded hardware implementation due to integration time and system complexity. Many commercial systems prevent low-level image processing customization and hardware optimization due to the largely proprietary nature of the algorithms and architectures, hindering research development by the larger community. This work presents DevCAM- an open-source multi-camera environment, targeted at hardware-software research for vision systems, specifically for co-located multi-sensor processor systems. The objective being to facilitate the integration of multiple latest generation sensors, abstracting interfacing difficulties to high-bandwidth sensors, enable user defined hybrid processing architectures on FPGA, CPU and GPU, and to unite multi-module systems with networking and high-speed storage storage. The system architecture can accommodate up to six 4-lane MIPI sensor modules which are electronically synchronized, alongside support for an RTK-GPS receiver and a 9-axis IMU. We demonstrate a number of available configurations that can be achieved for stereo, quadnocular, 360, and light-field image acquisition tasks. The development framework includes mechanical, PCB, FPGA and software components for the rapid integration into any system. System capabilities are demonstrated with the focus on opening new research frontiers such as distributed edge processing, inter system synchronization, sensor synchronization, and hybrid hardware acceleration of image processing tasks.
ISS-346
Development of DVS evaluation methods from user perspective, Raeyoung Kim, Jun-seok Kim, Junhyuk Park, Paul K.J. Park, Jaeha Park, Chunghwan Park, Inchun Lim, Seongwook Song, and Juhyun Ko, Samsung (Republic of Korea) [view abstract]
We report measurement methods and metrics for the evaluation of dynamic vision sensor (DVS) pixels. In particular, we developed automated test environments and test metrics which can quantify the sensitivity, latency and background noise of DVS pixels. For sensitivity measurements, response probabilities of pixels were analyzed at various conditions, such as base light intensity and region of interests of a sensor. Pixel latency was measured by varying the duty of light pulse, and noise level were also characterized at different light intensities. We expect the developed methods and metrics can help to clarify the performance of DVS pixels at the user point of view.
ISS-347
Implementation of EMVA 1288 Standard Release 4.0 for Characterization of Image Sensors, Megan E. Borek, Imatest, LLC (United States) [view abstract]
The EMVA 1288 Standard offers a unified method for the objective measurement and analysis of specification parameters for image sensors, particularly those used in the computer vision industry. Models for both linear and non-linear sensor responses are presented in the version 4.0 release of the standard, and are applied in the characterization of a commercial DSLR camera sensor. From image capture to analysis, this paper details the equipment, methodologies, and analyses used in the implementation of the latest standard in a controlled lab setting, serving as both a proof of concept and an evaluation of the presentation and comprehensibility of the standard from a user perspective. Measurements and analyses are made to quantify linearity, sensitivity, noise, nonuniformity, and dark current of the chosen sensor, according to the methods laid out in the EMVA 1288 standard. This paper details the realistic implementation of these processes in a controlled lab environment and discusses potential flaws and difficulties in the standard, as well as complications introduced by nonideal experimental variables.
ISS-348
On quantization of convolutional neural networks for image signal processor, Youngil Seo, Dongpan Lim, Jeongguk Lee, and Seongwook Song, Samsung Electronics (Republic of Korea) [view abstract]
Recently, many deep learning applications have been used on the mobile platform. To deploy them in the mobile platform, the networks should be quantized. The quantization of computer vision networks has been studied well but there have been few studies for the quantization of image restoration networks. In this paper, we studied the effect of the quantization of activations for deep learning network on image quality following previous study for weight quantization for deep learning network. This study is also about the quantization on raw RGBW image demosaicing for 10 bit image while fixing weight bit as 8 bit. Experimental results show that 11 bit activation quantization can sustain image quality at the similar level with floating-point network. Even though the activations bit-depth can be very small bit in the computer vision applications, but image restoration tasks like demosaicing require much more bits than those applications. 11 bit may not fit the general purpose hardware like NPU, GPU or CPU but for the custom hardware it is very important to reduce its hardware area and power as well as memory size.
5:30 – 7:00 PM EI 2023 Symposium Interactive (Poster) Paper Session (in the Cyril Magnin Foyer)
5:30 – 7:00 PM EI 2023 Meet the Future: A Showcase of Student and Young Professionals Research (in the Cyril Magnin Foyer)