Human Vision and Electronic Imaging 2023
Monday 16 January 2023
10:20 – 10:50 AM Coffee Break
12:30 – 2:00 PM Lunch
Monday 16 January PLENARY: Neural Operators for Solving PDEs
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Deep learning surrogate models have shown promise in modeling complex physical phenomena such as fluid flows, molecular dynamics, and material properties. However, standard neural networks assume finite-dimensional inputs and outputs, and hence, cannot withstand a change in resolution or discretization between training and testing. We introduce Fourier neural operators that can learn operators, which are mappings between infinite dimensional spaces. They are independent of the resolution or grid of training data and allow for zero-shot generalization to higher resolution evaluations. When applied to weather forecasting, neural operators capture fine-scale phenomena and have similar skill as gold-standard numerical weather models for predictions up to a week or longer, while being 4-5 orders of magnitude faster.
Anima Anandkumar, Bren professor, California Institute of Technology, and senior director of AI Research, NVIDIA Corporation (United States)
Anima Anandkumar is a Bren Professor at Caltech and Senior Director of AI Research at NVIDIA. She is passionate about designing principled AI algorithms and applying them to interdisciplinary domains. She has received several honors such as the IEEE fellowship, Alfred. P. Sloan Fellowship, NSF Career Award, and Faculty Fellowships from Microsoft, Google, Facebook, and Adobe. She is part of the World Economic Forum's Expert Network. Anandkumar received her BTech from Indian Institute of Technology Madras, her PhD from Cornell University, and did her postdoctoral research at MIT and assistant professorship at University of California Irvine.
3:00 – 3:30 PM Coffee Break
EI 2023 Highlights Session
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
3:30 – 5:00 PM
Cyril Magnin II
Join us for a session that celebrates the breadth of what EI has to offer with short papers selected from EI conferences.
NOTE: The EI-wide "EI 2023 Highlights" session is concurrent with Monday afternoon COIMG, COLOR, IMAGE, and IQSP conference sessions.
IQSP-309
Evaluation of image quality metrics designed for DRI tasks with automotive cameras, Valentine Klein, Yiqi LI, Claudio Greco, Laurent Chanas, and Frédéric Guichard, DXOMARK (France) [view abstract]
Driving assistance is increasingly used in new car models. Most driving assistance systems are based on automotive cameras and computer vision. Computer Vision, regardless of the underlying algorithms and technology, requires the images to have good image quality, defined according to the task. This notion of good image quality is still to be defined in the case of computer vision as it has very different criteria than human vision: humans have a better contrast detection ability than image chains. The aim of this article is to compare three different metrics designed for detection of objects with computer vision: the Contrast Detection Probability (CDP) [1, 2, 3, 4], the Contrast Signal to Noise Ratio (CSNR) [5] and the Frequency of Correct Resolution (FCR) [6]. For this purpose, the computer vision task of reading the characters on a license plate will be used as a benchmark. The objective is to check the correlation between the objective metric and the ability of a neural network to perform this task. Thus, a protocol to test these metrics and compare them to the output of the neural network has been designed and the pros and cons of each of these three metrics have been noted.
SD&A-224
Human performance using stereo 3D in a helmet mounted display and association with individual stereo acuity, Bonnie Posselt, RAF Centre of Aviation Medicine (United Kingdom) [view abstract]
Binocular Helmet Mounted Displays (HMDs) are a critical part of the aircraft system, allowing information to be presented to the aviator with stereoscopic 3D (S3D) depth, potentially enhancing situational awareness and improving performance. The utility of S3D in an HMD may be linked to an individual’s ability to perceive changes in binocular disparity (stereo acuity). Though minimum stereo acuity standards exist for most military aviators, current test methods may be unable to characterise this relationship. This presentation will investigate the effect of S3D on performance when used in a warning alert displayed in an HMD. Furthermore, any effect on performance, ocular symptoms, and cognitive workload shall be evaluated in regard to individual stereo acuity measured with a variety of paper-based and digital stereo tests.
IMAGE-281
Smartphone-enabled point-of-care blood hemoglobin testing with color accuracy-assisted spectral learning, Sang Mok Park1, Yuhyun Ji1, Semin Kwon1, Andrew R. O’Brien2, Ying Wang2, and Young L. Kim1; 1Purdue University and 2Indiana University School of Medicine (United States) [view abstract]
We develop an mHealth technology for noninvasively measuring blood Hgb levels in patients with sickle cell anemia, using the photos of peripheral tissue acquired by the built-in camera of a smartphone. As an easily accessible sensing site, the inner eyelid (i.e., palpebral conjunctiva) is used because of the relatively uniform microvasculature and the absence of skin pigments. Color correction (color reproduction) and spectral learning (spectral super-resolution spectroscopy) algorithms are integrated for accurate and precise mHealth blood Hgb testing. First, color correction using a color reference chart with multiple color patches extracts absolute color information of the inner eyelid, compensating for smartphone models, ambient light conditions, and data formats during photo acquisition. Second, spectral learning virtually transforms the smartphone camera into a hyperspectral imaging system, mathematically reconstructing high-resolution spectra from color-corrected eyelid images. Third, color correction and spectral learning algorithms are combined with a spectroscopic model for blood Hgb quantification among sickle cell patients. Importantly, single-shot photo acquisition of the inner eyelid using the color reference chart allows straightforward, real-time, and instantaneous reading of blood Hgb levels. Overall, our mHealth blood Hgb tests could potentially be scalable, robust, and sustainable in resource-limited and homecare settings.
AVM-118
Designing scenes to quantify the performance of automotive perception systems, Zhenyi Liu1, Devesh Shah2, Alireza Rahimpour2, Joyce Farrell1, and Brian Wandell1; 1Stanford University and 2Ford Motor Company (United States) [view abstract]
We implemented an end-to-end simulation for perception systems, based on cameras, that are used in automotive applications. The open-source software creates complex driving scenes and simulates cameras that acquire images of these scenes. The camera images are then used by a neural network in the perception system to identify the locations of scene objects, providing the results as input to the decision system. In this paper, we design collections of test scenes that can be used to quantify the perception system’s performance under a range of (a) environmental conditions (object distance, occlusion ratio, lighting levels), and (b) camera parameters (pixel size, lens type, color filter array). We are designing scene collections to analyze performance for detecting vehicles, traffic signs and vulnerable road users in a range of environmental conditions and for a range of camera parameters. With experience, such scene collections may serve a role similar to that of standardized test targets that are used to quantify camera image quality (e.g., acuity, color).
VDA-403
Visualizing and monitoring the process of injection molding, Christian A. Steinparz1, Thomas Mitterlehner2, Bernhard Praher2, Klaus Straka1,2, Holger Stitz1,3, and Marc Streit1,3; 1Johannes Kepler University, 2Moldsonics GmbH, and 3datavisyn GmbH (Austria) [view abstract]
In injection molding machines the molds are rarely equipped with sensor systems. The availability of non-invasive ultrasound-based in-mold sensors provides better means for guiding operators of injection molding machines throughout the production process. However, existing visualizations are mostly limited to plots of temperature and pressure over time. In this work, we present the result of a design study created in collaboration with domain experts. The resulting prototypical application uses real-world data taken from live ultrasound sensor measurements for injection molding cavities captured over multiple cycles during the injection process. Our contribution includes a definition of tasks for setting up and monitoring the machines during the process, and the corresponding web-based visual analysis tool addressing these tasks. The interface consists of a multi-view display with various levels of data aggregation that is updated live for newly streamed data of ongoing injection cycles.
COIMG-155
Commissioning the James Webb Space Telescope, Joseph M. Howard, NASA Goddard Space Flight Center (United States) [view abstract]
Astronomy is arguably in a golden age, where current and future NASA space telescopes are expected to contribute to this rapid growth in understanding of our universe. The most recent addition to our space-based telescopes dedicated to astronomy and astrophysics is the James Webb Space Telescope (JWST), which launched on 25 December 2021. This talk will discuss the first six months in space for JWST, which were spent commissioning the observatory with many deployments, alignments, and system and instrumentation checks. These engineering activities help verify the proper working of the telescope prior to commencing full science operations. For the session: Computational Imaging using Fourier Ptychography and Phase Retrieval.
HVEI-223
Critical flicker frequency (CFF) at high luminance levels, Alexandre Chapiro1, Nathan Matsuda1, Maliha Ashraf2, and Rafal Mantiuk3; 1Meta (United States), 2University of Liverpool (United Kingdom), and 3University of Cambridge (United Kingdom) [view abstract]
The critical flicker fusion (CFF) is the frequency of changes at which a temporally periodic light will begin to appear completely steady to an observer. This value is affected by several visual factors, such as the luminance of the stimulus or its location on the retina. With new high dynamic range (HDR) displays, operating at higher luminance levels, and virtual reality (VR) displays, presenting at wide fields-of-view, the effective CFF may change significantly from values expected for traditional presentation. In this work we use a prototype HDR VR display capable of luminances up to 20,000 cd/m^2 to gather a novel set of CFF measurements for never before examined levels of luminance, eccentricity, and size. Our data is useful to study the temporal behavior of the visual system at high luminance levels, as well as setting useful thresholds for display engineering.
HPCI-228
Physics guided machine learning for image-based material decomposition of tissues from simulated breast models with calcifications, Muralikrishnan Gopalakrishnan Meena1, Amir K. Ziabari1, Singanallur Venkatakrishnan1, Isaac R. Lyngaas1, Matthew R. Norman1, Balint Joo1, Thomas L. Beck1, Charles A. Bouman2, Anuj Kapadia1, and Xiao Wang1; 1Oak Ridge National Laboratory and 2Purdue University (United States) [view abstract]
Material decomposition of Computed Tomography (CT) scans using projection-based approaches, while highly accurate, poses a challenge for medical imaging researchers and clinicians due to limited or no access to projection data. We introduce a deep learning image-based material decomposition method guided by physics and requiring no access to projection data. The method is demonstrated to decompose tissues from simulated dual-energy X-ray CT scans of virtual human phantoms containing four materials - adipose, fibroglandular, calcification, and air. The method uses a hybrid unsupervised and supervised learning technique to tackle the material decomposition problem. We take advantage of the unique X-ray absorption rate of calcium compared to body tissues to perform a preliminary segmentation of calcification from the images using unsupervised learning. We then perform supervised material decomposition using a deep learned UNET model which is trained using GPUs in the high-performant systems at the Oak Ridge Leadership Computing Facility. The method is demonstrated on simulated breast models to decompose calcification, adipose, fibroglandular, and air.
3DIA-104
Layered view synthesis for general images, Loïc Dehan, Wiebe Van Ranst, and Patrick Vandewalle, Katholieke University Leuven (Belgium) [view abstract]
We describe a novel method for monocular view synthesis. The goal of our work is to create a visually pleasing set of horizontally spaced views based on a single image. This can be applied in view synthesis for virtual reality and glasses-free 3D displays. Previous methods produce realistic results on images that show a clear distinction between a foreground object and the background. We aim to create novel views in more general, crowded scenes in which there is no clear distinction. Our main contributions are a computationally efficient method for realistic occlusion inpainting and blending, especially in complex scenes. Our method can be effectively applied to any image, which is shown both qualitatively and quantitatively on a large dataset of stereo images. Our method performs natural disocclusion inpainting and maintains the shape and edge quality of foreground objects.
ISS-329
A self-powered asynchronous image sensor with independent in-pixel harvesting and sensing operations, Ruben Gomez-Merchan, Juan Antonio Leñero-Bardallo, and Ángel Rodríguez-Vázquez, University of Seville (Spain) [view abstract]
A new self-powered asynchronous sensor with a novel pixel architecture is presented. Pixels are autonomous and can harvest or sense energy independently. During the image acquisition, pixels toggle to a harvesting operation mode once they have sensed their local illumination level. With the proposed pixel architecture, most illuminated pixels provide an early contribution to power the sensor, while low illuminated ones spend more time sensing their local illumination. Thus, the equivalent frame rate is higher than the offered by conventional self-powered sensors that harvest and sense illumination in independient phases. The proposed sensor uses a Time-to-First-Spike readout that allows trading between image quality and data and bandwidth consumption. The sensor has HDR operation with a dynamic range of 80 dB. Pixel power consumption is only 70 pW. In the article, we describe the sensor’s and pixel’s architectures in detail. Experimental results are provided and discussed. Sensor specifications are benchmarked against the art.
COLOR-184
Color blindness and modern board games, Alessandro Rizzi1 and Matteo Sassi2; 1Università degli Studi di Milano and 2consultant (Italy) [view abstract]
Board game industry is experiencing a strong renewed interest. In the last few years, about 4000 new board games have been designed and distributed each year. Board game players gender balance is reaching the equality, but nowadays the male component is a slight majority. This means that (at least) around 10% of board game players are color blind. How does the board game industry deal with this ? Recently, a raising of awareness in the board game design has started but so far there is a big gap compared with (e.g.) the computer game industry. This paper presents some data about the actual situation, discussing exemplary cases of successful board games.
5:00 – 6:15 PM EI 2023 All-Conference Welcome Reception (in the Cyril Magnin Foyer)
Tuesday 17 January 2023
KEYNOTE: Perceptual Video Quality 1 (T1)
Session Chairs: Lukáš Krasula, Netflix, Inc. (United States) and Mohamed Chaker Larabi, Université de Poitiers (France)
9:05 – 10:10 AM
Cyril Magnin III
This session is jointly sponsored by: Human Vision and Electronic Imaging 2023, and Image Quality and System Performance XX.
Joint Conference Welcome
HVEI-258
KEYNOTE: Bringing joy to Netflix members through perceptual encoding optimization, Anne Aaron, Netflix, Inc. (United States) [view abstract]
As Director of Encoding Technologies, Anne Aaron leads the team responsible for media processing and encoding at Netflix. Her team works on video, audio, images and timed-text, from analysis to processing, encoding, packaging and DRM. On the streaming side, they strive to deliver a compelling viewing experience for millions of Netflix members worldwide, no matter where, how and what they watch. For the Netflix studio, they build media technologies that can improve content production. In her previous role at Netflix, Aaron led the Video Algorithms team. As a team, they researched and deployed innovation in the video encoding space (per-title encoding, video quality assessment and perceptual metrics, shot-based encoding, HDR, next-generation codecs) that benefited Netflix members as well as impacted the rest of the industry. Recent recognitions include: Some recent recognitions: SMPTE 2019 Workflow Systems Medal, Forbes' 2018 America's top women in Tech, Business Insider's 2017 Most powerful female engineers in US tech in 2017.
Audio and video compression are immensely important to Netflix, as well as internet service providers (ISPs). It has been estimated that our codec optimization efforts, together with the Open Connect program, saved ISPs over 1 billion dollars in 2021 alone. The keynote will talk about the importance of perceptual models and optimization for delivering the hits such as Stranger Things, Squid Game, or Red Notice in the highest quality while being mindful of the internet traffic. It will cover the recent advances in audio and video encoding, innovations in the subjective and objective assessment of quality, as well as immediate and future challenges in this area.
10:00 AM – 7:30 PM Industry Exhibition - Tuesday (in the Cyril Magnin Foyer)
10:20 – 10:50 AM Coffee Break
Perceptual Video Quality 2 (T2)
Session Chairs:
Lukáš Krasula, Netflix, Inc. (United States) and Mohamed Chaker Larabi, Université de Poitiers (France)
10:50 AM – 12:30 PM
Cyril Magnin III
This session is jointly sponsored by: Human Vision and Electronic Imaging 2023, and Image Quality and System Performance XX.
10:50HVEI-259
Video quality of video professionals for Video Assisted Referee (VAR), Kjell Brunnström1,2, Anders Djupsjöbacka1, Johsan Billingham3, Katharina Wistel3, Börje Andrén1, Oskars Ozolins1,4, and Nicolas Evans3; 1RISE Research Institutes of Sweden AB (Sweden), 2Mid Sweden University (Sweden), 3Fédération Internationale de Football Association (FIFA) (Switzerland), and 4KTH (Royal Institute of Technology) (Sweden) [view abstract]
Changes in the footballing world’s approach to technology and innovation contributed to the decision by the International Football Association Board (IFAB) to introduce Video Assistant Referees (VAR). The change meant that under strict protocols referees could use video replays to review decisions in the event of a “clear and obvious error” or a “serious missed incident”. This led to the need by Fédération Internationale de Football Association (FIFA) to develop methods for quality control of the VAR-systems, which was done in collaboration with RISE Research Institutes of Sweden AB. One of the important aspects is the video quality. The novelty of this study is that it has performed a user study specifically targeting video experts i.e., to measure the perceived quality of video professionals working with video production as their main occupation. An experiment was performed involving 25 video experts. In addition, six video quality models have been benchmarked against the user data and evaluated to show which of the models could provide the best predictions of perceived quality for this application. Video Quality Metric for variable frame delay (VQM_VFD) had the best performance for both formats, followed by Video Multimethod Assessment Fusion (VMAF) and VQM General model.
11:10HVEI-260
User perception for dynamic video resolution change using VVC, Sachin G. Deshpande and Philip Cowan, Sharp (United States) [view abstract]
We define experiments that measure user perception when video resolution changes dynamically. Versatile Video Coding (VVC) standard was recently finalized and it includes a reference picture resampling (RPR) tool. VVC RPR supports changing spatial resolution in a coded video sequence on a per picture basis. VVC RPR defines the downsampling and upsampling filters to be used when changing resolution. This paper provides results from subjective evaluation when VVC RPR is used for part of the video sequence to dynamically change resolution. The experiments use different QP values (or bitrates), different RPR scale factors and different highest original spatial resolutions. The results compare how users perceive video coded using VVC RPR for some pictures compared to an anchor which does not use RPR. In addition to the subjective results, we also describe performance of various metrics including PSNR, VMAF and MS-SSIM. Our results can help choose the highest RPR scale factor that can be used to achieve/ maintain certain perceived quality when using RPR (for example for bitrate reduction). The study also confirms that MS-SSIM and VMAF match subjective test results more closely compared to PSNR.
11:30IQSP-261
Proposing more ecologically-valid experiment protocol using YouTube platform, Gabriela Wielgus, Lucjan Janowski, Kamil Koniuch, Mikolaj Leszczuk, and Rafal Figlus, AGH University of Science and Technology (Poland) [view abstract]
Video streaming is becoming increasingly popular, and with platforms like YouTube, users do not watch the video passively but seek, pause, and read the comments. The popularity of video services is possible due to the development of compression and quality prediction algorithms. However, those algorithms are developed based on classic experiments, which are non-ecologically valid. Therefore, classic experiments do not mimic real user interaction. Further development of the quality and compression algorithms depends on the results coming from ecologically-valid experiments. Therefore, we aim to propose such experiments. Nevertheless, proposing a new experimental protocol is difficult, especially when there is no limitation on content selection and control of the video. The freedom makes data analysis more challenging. In this paper, we present an ecologically-valid experimental protocol in which the subject assessed the quality while freely using YouTube. To achieve this goal, we developed a Chrome extension that collects objective data and allows network manipulation. Our deep data analysis shows a correlation between MOS and objectively measured results such as resolution, which proves that the ecologically-valid test works. Moreover, we have shown significant differences between subjects, allowing for a more detailed understanding, of how the quality influences the interaction with the service.
11:50IQSP-262
Evaluation of motion blur image quality in video frame interpolation, Hai Dinh, Fangwen Tu, Qinyi Wang, Brett Frymire, and Bo Mu, Omnivision Technology (United States) [view abstract]
While slow motion has become a standard feature in mainstream cell phones, a fast approach without relying on specific training datasets to assess slow motion video quality is not available. Conventionally, researchers evaluate their algorithms with peak signal-to-noise ratio (PSNR) or structural similarity index measure (SSIM) between ground-truth and reconstructed frames. But they are both global evaluation index and more sensitive to noise or distortion brought by the interpolation. For video interpolation, especially for fast moving objects, motion blur as well as ghost problem are more essential to the audience subjective judgment. How to achieve a proper evaluation for Video Frame Interpolation (VFI) task is still a problem that is not well addressed.
12:10IQSP-263
Subjective video quality for 4K HDR-WCG content using a browser-based approach for “at-home” testing, Lukáš Krasula1, Anustup Choudhury2, Scott Daly2, Zhi Li1, Robin Atkins2, Ludovic Malfait2, and Aditya Mavlankar1; 1Netflix, Inc. and 2Dolby Laboratories, Inc. (United States) [view abstract]
A subjective quality study of 4K HDR-WCG (3840 x 2160, High Dynamic Range, Wide Color Gamut) video content was performed in an at-home scenario. There are no available datasets on such content, yet they are crucial for objective quality metrics development and testing. While at-home testing generally implies lack of calibration, we sought to maximize calibration by limiting the displays to a specific model of TV that we have calibrated in our lab and have found that unit to unit deviations are small. Moreover, we performed the experiment in the Dolby Vision mode (where the various enhancements of the TV are turned OFF by default). In addition, we asked subjects to go through procedures to ensure a standard viewing distance of 1.6 picture heights, and to eliminate ambient lighting effects on display contrast by viewing in dark or dim conditions. A browser approach was used which took control of the TV, and ensure the content was viewed at the native resolution of the TV (e.g., dot-on-dot mode). Particular care was given to content selection to probe specific challenge cases of the display behavior as well as human vision (e.g., complex motion effects on eye tracking). Further, several clips were selected that represent the highest quality possible with 2021 technology. We have found the subject response variability was like lab-based experiments, suggesting the noise in the results due to display variability and lack of unit-to-unit calibration, was less than the within-subject variability due to personal physiology or preferences. Several statistical models and subject-rejection strategies will be compared and the usefulness of the data for objective metrics will be presented.
12:30 – 2:00 PM Lunch
Tuesday 17 January PLENARY: Embedded Gain Maps for Adaptive Display of High Dynamic Range Images
Session Chair: Robin Jenkin, NVIDIA Corporation (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Images optimized for High Dynamic Range (HDR) displays have brighter highlights and more detailed shadows, resulting in an increased sense of realism and greater impact. However, a major issue with HDR content is the lack of consistency in appearance across different devices and viewing environments. There are several reasons, including varying capabilities of HDR displays and the different tone mapping methods implemented across software and platforms. Consequently, HDR content authors can neither control nor predict how their images will appear in other apps.
We present a flexible system that provides consistent and adaptive display of HDR images. Conceptually, the method combines both SDR and HDR renditions within a single image and interpolates between the two dynamically at display time. We compute a Gain Map that represents the difference between the two renditions. In the file, we store a Base rendition (either SDR or HDR), the Gain Map, and some associated metadata. At display time, we combine the Base image with a scaled version of the Gain Map, where the scale factor depends on the image metadata, the HDR capacity of the display, and the viewing environment.
Eric Chan, Fellow, Adobe Inc. (United States)
Eric Chan is a Fellow at Adobe, where he develops software for editing photographs. Current projects include Photoshop, Lightroom, Camera Raw, and Digital Negative (DNG). When not writing software, Chan enjoys spending time at his other keyboard, the piano. He is an enthusiastic nature photographer and often combines his photo activities with travel and hiking.
Paul M. Hubel, director of Image Quality in Software Engineering, Apple Inc. (United States)
Paul M. Hubel is director of Image Quality in Software Engineering at Apple. He has worked on computational photography and image quality of photographic systems for many years on all aspects of the imaging chain, particularly for iPhone. He trained in optical engineering at University of Rochester, Oxford University, and MIT, and has more than 50 patents on color imaging and camera technology. Hubel is active on the ISO-TC42 committee Digital Photography, where this work is under discussion, and is currently a VP on the IS&T Board. Outside work he enjoys photography, travel, cycling, coffee roasting, and plays trumpet in several bay area ensembles.
3:00 – 3:30 PM Coffee Break
Computational Models of Vision (T3)
Session Chair:
Rafal Mantiuk, University of Cambridge (United Kingdom)
3:30 – 4:50 PM
Cyril Magnin I
3:30HVEI-246
Modelling contrast sensitivity of discs, Maliha Ashraf1, Rafal Mantiuk2, and Alexandre Chapiro3; 1University of Liverpool (United Kingdom), 2University of Cambridge (United Kingdom), and 3Meta (United States) [view abstract]
Spatial and temporal contrast sensitivity is typically measured using different stimuli. Gabor patterns are used to measure spatial contrast sensitivity and flickering discs are used for temporal contrast sensitivity. The data from both types of studies is difficult to compare as there is no well-established relationship between the sensitivity to disc and Gabor patterns. The goal of this work is to propose a model that can predict the contrast sensitivity of a disc using the more commonly available data and models for Gabors. To that end, we measured the contrast sensitivity for discs of different sizes, shown at different luminance levels, and for both achromatic and chromatic (isoluminant) contrast. We used this data to compare 6 different models, each of which tested a different hypothesis on the detection and integration mechanisms of disc contrast. The results indicate that multiple detectors contribute to the perception of disc stimuli, and each can be modelled either using an energy model, or the peak spatial frequency of the contrast sensitivity function.
3:50HVEI-247
An intrinsic image network evaluated as a model of human lightness perception, Richard F. Murray1, David H. Brainard2, Alban Flachot1, and Jaykishan Y. Patel1; 1York University (Canada) and 2University of Pennsylvania (United States) [view abstract]
We evaluate a recent artificial neural network architecture (InverseRenderNet) in a lightness matching task. We use supervised learning to train the network to map luminance to albedo, using 100,000 images of scenes of cluttered geometric objects, rendered in Blender. Using Thouless ratios to quantify lightness constancy, we find that the network has human-like levels of partial constancy (Thouless ratios around 0.70). Also like human observers, the network's log reflectance matches are a linear function of log illuminance. To provide context, we evaluate three other current computational models of lightness/brightness in the same tasks (ODOG, Dakin-Bex, and retinex). All three models show much lower levels of lightness constancy (Thouless ratios around 0.10), and largely match luminance instead of albedo. Thus we find interesting similarities between InverseRenderNet's behaviour and human lightness perception, and advantages over competing computational models. We potential obstacles and future directions for using neural networks as models of human lightness perception.
4:10HVEI-248
Are unique hues defined by complementary color pairings rather than opponent processes?, Christopher W. Tyler, The Smith-Kettlewell Eye Research Institute (United States) [view abstract]
The current consensus is that there are four unique hues, red, green, blue, and yellow, and that their opponent pairings define the metric of color space. This construct, however, has the problem that the neutral point of the R/G opponency is yellow while that for B/Y opponency is white, making them a complementary pair. The complementary pairings for the red and green extrema in CIE (linear) color space are cyan and magenta, respectively, as confirmed by color afterimage settings. The current extended gamut for visual displays make it clear that these latter two colors have the distinctness properties of unique hues to about the same degree as yellow. It is therefore proposed that these two colors join yellow as three secondary unique hues, with the three extremes of the CIE color space (red, green, and blue) defining the primary unique hues of which they are the complements. Thus, this revised scheme recognizes six unique hues, corresponding to a meld of the RGB and CMYK color primaries.
4:30HVEI-249
Natural scene statistics and distance perception: ground surface and non-ground objects (JPI-first), Xavier Morin Duchesne and Michael Langer, McGill University (Canada) [view abstract]
Both natural scene statistics and ground surfaces have been shown to play important roles in visual perception, in particular, in the perception of distance. Yet, there have been surprisingly few studies looking at the natural statistics of distances to the ground, and the studies that have been done used a loose definition of ground. Additionally, perception studies investigating the role of the ground surface typically use artificial scenes containing perfectly flat ground surfaces with relatively few non-ground objects present, whereas ground surfaces in natural scenes are typically non-planar and have a large number of non-ground objects occluding the ground. Our study investigates the distance statistics of a large number of natural scenes across three datasets, with the goal of separately analyzing the ground surface and non-ground objects. We used a recent filtering method to partition LiDAR-acquired 3D point clouds into ground points and non-ground points. We then examined the way in which distance distributions depend on distance, viewing elevation angle, and simulated viewing height. We found, first, that the distance distribution of ground points shares some similarities with that of a perfectly flat plane, namely with a sharp peak at a near distance that depends on viewing height, but also some differences. Second, we also found that the distribution of non-ground points is flatter and did not vary with viewing height. Third, we found that the proportion of non-ground points increases with viewing elevation angle. Our findings provide further insight into the statistical information available for distance perception in natural scenes, and suggest that studies of distance perception should consider a broader range of ground surfaces and object distributions than what has been used in the past in order to better reflect the statistics of natural scenes.
DISCUSSION: Tuesday End of Day (T4)
Session Chair:
Damon Chandler, Ritsumeikan University (Japan)
4:50 – 5:30 PM
Cyril Magnin I
Please join us for a lively discussion of today's presentations. Participate in an interactive, moderated discussion, where key topics and questions are discussed from many perspectives, reflecting the diverse HVEI community.
5:30 – 7:00 PM EI 2023 Symposium Demonstration Session (in the Cyril Magnin Foyer)
Wednesday 18 January 2023
KEYNOTE: AR/VR Special Session 1 (W1)
Session Chair: Alexandre Chapiro, Meta (United States)
9:05 – 10:10 AM
Cyril Magnin II
This session is jointly sponsored by: Engineering Reality of Virtual Reality 2023, Human Vision and Electronic Imaging 2023, and Stereoscopic Displays and Applications XXXIV.
Joint Conference Welcome
HVEI-219
KEYNOTE: Display consideration for AR/VR systems, Ajit Ninan, Reality Labs at Meta (United States) [view abstract]
Ajit Ninan is a display industry veteran and led the way to the industry adopting HDR. His inventions & innovations are manifest in millions of shipped HDR TV’s and consumer electronics from multiple companies. He holds 400+ granted patents in imaging and display technology and now works in imaging related to AR/VR at Meta as Senior Director of Applied Perceptual Science and Image Quality. His work spans multiple subjects ranging from Displays, Imaging, Color, Video, Compression, Audio and Networking. His career spans early start-ups to public companies. Ninan is the inventor of the local dimmed quantum dot TV and led the way to the industry adoption of quantum dot displays by working with Vizio, Nanosys and 3M to release the first of its kind R-series QD TV with HDR. He also led the effort with the JPEG committee to standardize JPEG-XT to enable JPEG HDR images. Ninan was inducted as a SMPTE Fellow for his contributions to imaging and standards. The display that caused the world to adopt HDR called the “Pulsar” capable of 4000nits down to .005nits with P3 color in 2010, built by Ninan and his team, has received many awards including the Advanced Imaging Society’s Lumiere award which enabled the development of Dolby Vision and earned Ninan an Emmy.
AR and VR displays must take into consideration human perception and image quality factors that are required for a product. At Meta, we study these perceptual factors and determine what quality targets and requirements are needed. This talk will discuss some of these aspects and highlight examples of our process that help us set direction. The presenter, Ajit Ninan, is the director of Engineering, Display and Optics, at Meta.
10:00 AM – 3:30 PM Industry Exhibition - Wednesday (in the Cyril Magnin Foyer)
10:20 – 10:50 AM Coffee Break
AR/VR Special Session 2 (W2)
Session Chairs:
Nicko Caluya, Ritsumeikan University (Japan) and Alexandre Chapiro, Meta (United States)
10:50 AM – 12:30 PM
Cyril Magnin II
This session is jointly sponsored by: Engineering Reality of Virtual Reality 2023, Human Vision and Electronic Imaging 2023, and Stereoscopic Displays and Applications XXXIV.
10:50HVEI-220
Comparison of AR and VR memory palace quality in second-language vocabulary acquisition (Invited), Xiaoyang Tian, Nicko Caluya, and Damon M. Chandler, Ritsumeikan University (Japan) [view abstract]
The method of loci (memory palace technique) is a learning strategy that uses visualizations of spatial environments to enhance memory. One particularly popular use of the method of loci is for language learning, in which the method can help long-term memory of vocabulary by allowing users to associate location and other spatial information with particular words/concepts, thus making use of spatial memory to assist memory typically associated with language. Augmented reality (AR) and virtual reality (VR) have been shown to potentially provide even better memory enhancement due to their superior visualization abilities. However, a direct comparison of the two techniques in terms of language-learning enhancement has not yet been investigated. In this presentation, we present the results of a study designed to compare AR and VR when using the method of loci for learning vocabulary from a second language.
11:10HVEI-221
Projection mapping for enhancing the perceived deliciousness of food (Invited), Yuichiro Fujimoto, Nara Institute of Science and Technology (Japan) [view abstract]
The perceived deliciousness of a food item is highly related to its appearance. Image processing has been widely used to make food images more appealing to the public, such as when capturing and posting images on social networking sites. In this research, I propose a system to enhance the degree of subjective deliciousness of food visually perceived by a person by automatically changing its appearance with spatial augmented reality (SAR) technique in a real environment. The relationship between the degree of subjective deliciousness and four appearance features for each food category is modeled using data gathered via a crowdsourcing-based questionnaire. Using this model, the system generates the appropriate projection image to increase the deliciousness of the food. Experiments verify that the system can actually change and improve the impression of the target food’s deliciousness.
11:30HVEI-222
Real-time imaging processing for low-vision users, Yang Cai, CMU (United States) [view abstract]
We have developed an assistive technology for people with vision disabilities of central field loss (CFL) and low contrast sensitivity (LCS). Our technology includes a pair of holographic AR glasses with enhanced image magnification and contrast, for example, highlighting objects, and detecting signs, and words. In contrast to prevailing AR technologies which project either mixed reality objects or virtual objects to the glasses, Our solution fuses real-time sensory information and enhances images from reality. The AR glasses technology has two advantages: it’s relatively ‘fail-safe.” If the battery dies or the processor crashes, the glasses can still function because it is transparent. The AR glasses can also be transformed into a VR or AR simulator when it overlays virtual objects such as pedestrians or vehicles onto the glasses for simulation. The real-time visual enhancement and alert information are overlaid on the transparent glasses. The visual enhancement modules include zooming, Fourier filters, contrast enhancement, and contour overlay. Our preliminary tests with low-vision patients show that the AR glass indeed improved patients' vision and mobility, for example, from 20/80 to 20/25 or 20/30.
11:50HVEI-223
Critical flicker frequency (CFF) at high luminance levels, Alexandre Chapiro1, Nathan Matsuda1, Maliha Ashraf2, and Rafal Mantiuk3; 1Meta (United States), 2University of Liverpool (United Kingdom), and 3University of Cambridge (United Kingdom) [view abstract]
The critical flicker fusion (CFF) is the frequency of changes at which a temporally periodic light will begin to appear completely steady to an observer. This value is affected by several visual factors, such as the luminance of the stimulus or its location on the retina. With new high dynamic range (HDR) displays, operating at higher luminance levels, and virtual reality (VR) displays, presenting at wide fields-of-view, the effective CFF may change significantly from values expected for traditional presentation. In this work we use a prototype HDR VR display capable of luminances up to 20,000 cd/m^2 to gather a novel set of CFF measurements for never before examined levels of luminance, eccentricity, and size. Our data is useful to study the temporal behavior of the visual system at high luminance levels, as well as setting useful thresholds for display engineering.
12:10HVEI-253
A multichannel LED-based lighting approach to improve color discrimination for low vision people, Linna Yang1, Éric Dinet1, Pichayada Katemake2, Alain Trémeau1, and Philippe Colantoni1; 1University Jean Monnet Saint-Etienne (France) and 2Chulalongkorn University (Thailand) [view abstract]
The population of low vision people increases continuously with the acceleration of aging society. As reported by WHO, most of this population is over the age of 50 years and 81% were not concerned by any visual problem before. A visual deficiency can dramatically affect the quality of life and challenge the preservation of a safe independent existence. This study presents a LED-based lighting approach to assist people facing an age-related visual impairment. The research procedure is based on psychophysical experiments consisting in the ordering of standard color samples. Volunteers wearing low vision simulation goggles performed such an ordering under different illumination conditions produced by a 24-channel multispectral lighting system. A filtering technique using color rendering indices coupled with color measurements allowed to objectively determine the lighting conditions providing the best scores in terms of color discrimination. Experimental results were used to combine 3 channels to produce white light inducing a stronger color perception in a low vision context than white LEDs nowadays available for general lighting. Even if further studies will be required, these first results give hope for the design of smart lighting devices that adapt to the visual needs of the visually impaired.
12:30 – 2:00 PM Lunch
Wednesday 18 January PLENARY: Bringing Vision Science to Electronic Imaging: The Pyramid of Visibility
Session Chair: Andreas Savakis, Rochester Institute of Technology (United States)
2:00 PM – 3:00 PM
Cyril Magnin I/II/III
Electronic imaging depends fundamentally on the capabilities and limitations of human vision. The challenge for the vision scientist is to describe these limitations to the engineer in a comprehensive, computable, and elegant formulation. Primary among these limitations are visibility of variations in light intensity over space and time, of variations in color over space and time, and of all of these patterns with position in the visual field. Lastly, we must describe how all these sensitivities vary with adapting light level. We have recently developed a structural description of human visual sensitivity that we call the Pyramid of Visibility, that accomplishes this synthesis. This talk shows how this structure accommodates all the dimensions described above, and how it can be used to solve a wide variety of problems in display engineering.
Andrew B. Watson, chief vision scientist, Apple Inc. (United States)
Andrew Watson is Chief Vision Scientist at Apple, where he leads the application of vision science to technologies, applications, and displays. His research focuses on computational models of early vision. He is the author of more than 100 scientific papers and 8 patents. He has 21,180 citations and an h-index of 63. Watson founded the Journal of Vision, and served as editor-in-chief 2001-2013 and 2018-2022. Watson has received numerous awards including the Presidential Rank Award from the President of the United States.
3:00 – 3:30 PM Coffee Break
PANEL: AR/VR Special Session (W3.1)
Session Chairs: Nicko Caluya, Ritsumeikan University (Japan) and Alexandre Chapiro, Meta (United States)
Panelists: Alexandre Chapiro, Meta (United States); Yuichiro Fujimoto, Nara Institute of Science and Technology (Japan); Nicolas Holliman, King's College London (United Kingdom); and Ajit Ninan, Reality Labs at Meta (United States)
3:30 – 4:50 PM
Cyril Magnin II
This session is jointly sponsored by: Engineering Reality of Virtual Reality 2023, Human Vision and Electronic Imaging 2023, and Stereoscopic Displays and Applications XXXIV.
DISCUSSION: Wednesday End of Joint Sessions (W3.2)
Session Chair:
Damon Chandler, Ritsumeikan University (Japan)
4:50 – 5:30 PM
Cyril Magnin II
This session is jointly sponsored by: Engineering Reality of Virtual Reality 2023, Human Vision and Electronic Imaging 2023, and Stereoscopic Displays and Applications XXXIV.
Please join us for a lively discussion of today's presentations. Participate in an interactive, moderated discussion, where key topics and questions are discussed from many perspectives, reflecting the diverse HVEI community.
5:30 – 7:00 PM EI 2023 Symposium Interactive (Poster) Paper Session (in the Cyril Magnin Foyer)
5:30 – 7:00 PM EI 2023 Meet the Future: A Showcase of Student and Young Professionals Research (in the Cyril Magnin Foyer)
BANQUET: 2023 Friends of HVEI (W5)
Session Chairs: Damon Chandler, Ritsumeikan University (Japan) and Rafal Mantiuk, University of Cambridge (United Kingdom)
7:00 – 10:00 PM
MISSION I/II/III
Join us for a wonderful evening of conversations, a banquet dinner, and an enlightening speaker. This banquet is associated with the Human Vision and Electronic Imaging Conference (HVEI), but everyone interested in research at the intersection of human perception/cognition, imaging technologies, and art is welcome. Banquet registration required, online or at the registration desk. Location will be provided with registration.
HVEI-250
KEYNOTE: How to let your pictures shine! The impact of high dynamic range imaging on photography, Timo Kunkel, Dolby Laboratories, Inc. (United States) [view abstract]
Dr. Timo Kunkel is director of image technology & standards in the CTO office of Dolby Laboratories, Inc. His fields of expertise include image processing, color science, high dynamic range imaging, color appearance modeling, and advanced display technologies. Kunkel is engaged in developing color management models for both professional and consumer displays (dynamic range and gamut mapping concepts). This involves active research, code development and QA as well as applying metrological and psychophysical concepts for verification, icluding picture quality assessment and tuning for several display technologies from customers all over the world. Additionally, he has experience in neuroscience and psychological concepts related to the Human Visual System (signal processing in the retina and higher visual cortex), and has been involved in developing the core concepts of what is now Dolby Vision. Kunkel is also actively involved with international standards work, serving as technical expert and member of IEC TC100 (Audio, video and multimedia systems and equipment) and TC110 (Electronic displays), the International Color Consortium (ICC), as well as the SID International Committee of Display Metrology (ICDM). Further, Kunkel has a background in Physical Geosciences (remote sensing and geospatial image processing, GIS, Vegetation- and Ecosystem Modeling) and has worked in these fields with research departments at Lund University in Sweden, Lincoln University in New Zealand, and the University of Dar es Salaam in Tanzania. This work is supported by more than 20 years of experience as a freelance landscape and architecture photographer for clients in Europe and the US, winning several prizes with images combining HDR and computational photography aspects. Kunkel served as president of Bristol Chapter, ACM SIGGRAPHACM SIGGRAPH, 2006 - 2008, and was co-founder of the Bruder & Bär publishing company (Germany), serving there as Art Director, 2003 - 2006. Kunkel holds a PhD in computer science from University of Bristol, United Kingdom, and a MSc from University of Freiburg, Germany.
High-dynamic range imaging, better known by its acronym “HDR”, has established itself as a foundational component when looking at the aspects defining today’s image fidelity. Together with the availability of wide color gamut (WCG) approaches, HDR has influenced and shaped both the technical tools and the creative means of photography. This talk will touch on the intersection of HDR technologies and the artistic expression it enables, from scene lighting and composition via camera capture and processing, to print and display.
Thursday 19 January 2023
Creative Intent and Perception in Visualization and Displays (R1)
Session Chair:
Damon Chandler, Ritsumeikan University (Japan)
9:30 – 10:10 AM
Mission I/II
9:30HVEI-251
Am I safe? An examination of how everyday people interpret covid data visualizations, Bernice Rogowitz1 and Paul Borrel2; 1Visual Perspectives (United States) and 2consultant (France) [view abstract]
During these past years, international COVID data have been collected by several reputable organizations and made available to the worldwide community. This has resulted in a wellspring of different visualizations. Many different measures can be selected (e.g., cases, deaths, hospitalizations). And for each measure, designers and policy makers can make a myriad of different choices of how to represent the data. Data from individual countries may be presented on linear or log scales, daily, weekly, or cumulative, alone or in the context of other countries, scaled to a common grid, or scaled to their own range, raw or per capita, etc. It is well known that the data representation can influence the interpretation of data. But, what visual features in these different representations affect our judgments? To explore this idea, we conducted an experiment where we asked participants to look at time-series data plots and assess how safe they would feel if they were traveling to one of the countries represented, and how confident they are of their judgment. Observers rated 48 visualizations of the same data, rendered differently along 6 controlled dimensions. Our initial results provide insight into how characteristics of the visual representation affect human judgments of time series data. We also discuss how these results could impact how public policy and news organizations choose to represent data to the public.
9:50HVEI-254
Biosensors for landing creative intent, Scott Daly, Evan Gitterman, Dan Darcy, and Shane Ruggieri, Dolby Laboratories, Inc. (United States) [view abstract]
The motivation for use of biosensors in audiovisual media is made by highlighting problem of signal loss due to wide variability in playback devices. A metadata system that allows creatives to steer signal modifications as a function of audience emotion and cognition as determined by biosensor analysis.
10:20 – 10:50 AM Coffee Break
EEG/fMRI/Retina (R2)
Session Chair:
Bernice Rogowitz, Visual Perspectives (United States)
10:50 – 11:50 AM
Mission I/II
10:50HVEI-255
Self-regulation of attentional stance facilitates induction of meditative states, Glenn Hartelius1,2, Lora T. Likova3, and Christopher W. Tyler3; 1Alef Trust, 2Naropa University, and 3The Smith-Kettlewell Eye Research Institute (United States) [view abstract]
This study is focused on the novel concept of the origin or seat of the attentional spotlight, the bodily location at which the attended information is felt to impinge. Existing research on the seat of attention, also described as <i>self-location </i>or <i>egocenter</i>, shows that it can be situated in various ways within the experienced body space (Hanley et al., 2020), and that differences in its location have measurable impact on cognitive skill, emotional temperament, and self-construal, as well as social and moral attitudes (Adam et al., 2015; Fetterman et al., 2020; Fetterman & Robinson, 2013). A recent study by Hartelius et al. (2022) showed that this aspect of attention can be volitionally self-regulated into various internal attentional stances, and that these stances are relatively stable as demonstrated by robust within-subject inter-run correlations of EEG-measured patterns of brain activation for each stance; trials with 8 participants showed that most stances were associated with a unique cortical activation pattern in one or more frequency bands. This study also demonstrated that some attentional stances—that is, locations of the seat of attention—can be objectively associated with specific positive emotional states, suggesting that control of attentional stance should provide direct management of specific cognitive and emotional resources. This suggestion is supported by an earlier study with endurance athletes demonstrating that a discrete attentional stance was associated with each of two tasks: a) reading a news story, and b) experiences of a flow state during athletic endurance practice (Hartelius, 2015; Marolt-Sender, 2014).
11:10HVEI-256
Spatial cognition training rapidly induces cortical plasticity in blind navigation, Lora T. Likova, Zhangziyi Zhou, Michael Liang, and Christopher W. Tyler, The Smith-Kettlewell Eye Research Institute (United States) [view abstract]
Successful navigation requires spatial cognition abilities, primarily the development of an accurate and flexible mental, or cognitive, map of the navigational space and of the route trajectory required to travel to the target location. To train the spatial cognition abilities and spatial memory underlying successful navigation, we translated the power of the Likova Cognitive-Kinesthetic Rehabilitation Training, initially developed for the manual domain of operation, to the domain of navigation. In the tasks requiring the mentally-performed navigational decision planning (planning the shortest or the reversed shortest path between newly-specified locations on a just memorized tactile map) and memory-guided motor execution of these decisions (accurate drawing the respective planned paths), the most significant brain activation increase was found in the two medial posterior cortical regions (DLPFC, insula), in contrast to a very little change in the lateral anterior regions (occipital V1-V4, the retrosplenial/precuneus) for most of these tasks. By extending our previous findings from the manual to the navigation domain, these results demonstrate the power of a multidisciplinary approach incorporating art, behavioral and neuroscience methodologies to drive much-needed plasticity in the adult brain.
12:30 – 2:00 PM Lunch
SFMOMA Museum Tour & Casual Dinner (R3)
2:00 – 8:00 PM
OFFSITE - Meet at Registration
Join your HVEI colleagues for an excursion to the SFMOMA after Thursday's lunch recess. Meet and depart from the EI 2023 registration desk at 2:00 pm. Visit SFMOMA 2:30 - 5:00 pm. Gather informally for dinner at 6:00 pm.