Monday 17 January 2022
IS&T Welcome & PLENARY: Quanta Image Sensors: Counting Photons Is the New Game in Town
07:00 – 08:10
The Quanta Image Sensor (QIS) was conceived as a different image sensor—one that counts photoelectrons one at a time using millions or billions of specialized pixels read out at high frame rate with computation imaging used to create gray scale images. QIS devices have been implemented in a CMOS image sensor (CIS) baseline room-temperature technology without using avalanche multiplication, and also with SPAD arrays. This plenary details the QIS concept, how it has been implemented in CIS and in SPADs, and what the major differences are. Applications that can be disrupted or enabled by this technology are also discussed, including smartphone, where CIS-QIS technology could even be employed in just a few years.
Eric R. Fossum, Dartmouth College (United States)
Eric R. Fossum is best known for the invention of the CMOS image sensor “camera-on-a-chip” used in billions of cameras. He is a solid-state image sensor device physicist and engineer, and his career has included academic and government research, and entrepreneurial leadership. At Dartmouth he is a professor of engineering and vice provost for entrepreneurship and technology transfer. Fossum received the 2017 Queen Elizabeth Prize from HRH Prince Charles, considered by many as the Nobel Prize of Engineering “for the creation of digital imaging sensors,” along with three others. He was inducted into the National Inventors Hall of Fame, and elected to the National Academy of Engineering among other honors including a recent Emmy Award. He has published more than 300 technical papers and holds more than 175 US patents. He co-founded several startups and co-founded the International Image Sensor Society (IISS), serving as its first president. He is a Fellow of IEEE and OSA.
08:10 – 08:40 EI 2022 Welcome Reception
Wednesday 19 January 2022
IS&T Awards & PLENARY: In situ Mobility for Planetary Exploration: Progress and Challenges
07:00 – 08:15
This year saw exciting milestones in planetary exploration with the successful landing of the Perseverance Mars rover, followed by its operation and the successful technology demonstration of the Ingenuity helicopter, the first heavier-than-air aircraft ever to fly on another planetary body. This plenary highlights new technologies used in this mission, including precision landing for Perseverance, a vision coprocessor, new algorithms for faster rover traverse, and the ingredients of the helicopter. It concludes with a survey of challenges for future planetary mobility systems, particularly for Mars, Earth’s moon, and Saturn’s moon, Titan.
Larry Matthies, Jet Propulsion Laboratory (United States)
Larry Matthies received his PhD in computer science from Carnegie Mellon University (1989), before joining JPL, where he has supervised the Computer Vision Group for 21 years, the past two coordinating internal technology investments in the Mars office. His research interests include 3-D perception, state estimation, terrain classification, and dynamic scene analysis for autonomous navigation of unmanned vehicles on Earth and in space. He has been a principal investigator in many programs involving robot vision and has initiated new technology developments that impacted every US Mars surface mission since 1997, including visual navigation algorithms for rovers, map matching algorithms for precision landers, and autonomous navigation hardware and software architectures for rotorcraft. He is a Fellow of the IEEE and was a joint winner in 2008 of the IEEE’s Robotics and Automation Award for his contributions to robotic space exploration.
Image Quality and System Performance XIX Posters
08:20 – 09:20
Poster interactive session for all conferences authors and attendees.
P-12: Image quality performance of CMOS image sensor equipped with CMY color filter, Sungho Cha, Samsung Electronics Co, Ltd. (Republic of Korea) [view abstract]
Recently, smartphones are equipped with high resolution mobile camera modules of 100 million pixels or more. After that, it is expected that much higher resolution mobile camera modules will be mounted. However, in order to mount more pixels in a limited space, the pixel size must be reduced. If 1.0 um pixel sensor was the mainstream in the past, now 0.64um pixel sensor has been developed now, and a sensor with smaller pixel will be developed in the future. However, there are technical limitations. In terms of image quality of sensor, if the size of pixel becomes smaller, the amount of light received decreases, and the image quality in terms of noise becomes poor. In order to solve this limitation, an attempt is made to develop a high-sensitivity sensor in various ways. One of them is the image sensor using CMY color filter technology. CMY color filter has higher sensitivity than RGB, so it is advantageous for developing high sensitivity sensors. In this paper, we introduce a method to Image quality evaluate the CMOS image sensor equipped with CMY color filter in mobile devices.
P-13: Visualization for texture analysis of the Shitsukan Research Database based on luminance information, Norifumi Kawabata, Hokkaido University (Japan) [view abstract]
Thus far, there were the difference meaning and interpretation for definition of ``Shitsukan''. Therefore, there were many studies whether ``Shitsukan'' can be evaluated quantitatively or not. As one of the past our study, we carried out texture analysis for classification method of texture types by using the Shitsukan Research Database. As a result, we were able to see characteristics between contrast and correlation for texture types. In this paper, we analyzed statistically after comparing between texture analysis for classification method of texture types and luminance information in the Shitsukan Research Database for the free of charge obtained from Web. And then, we obtained novelty and knowledge discussing characteristics of texture types.
Monday 24 January 2022
High Dynamic Range Quality and Performance
Jonathan Phillips, Imatest, LLC (United States)
07:00 – 08:05
Objective image quality evaluation of HDR videos captured by smartphones, Cyril Lajarge, François-Xavier Thomas, Elodie Souksava, Laurent Chanas, Hoang-Phi Nguyen, and Frédéric Guichard, DXOMARK (France) [view abstract]
High Dynamic Range (HDR) video performances attract industry and consumer markets thanks to their ability to reproduce wider color gamuts, higher luminance ranges and contrast. While the cinema and broadcast industries traditionally go through a manual mastering step on calibrated color grading hardware, consumer camera devices capable of HDR video capture without user intervention are now available. The aim of this article is to review the challenges found in evaluating cameras capturing and encoding videos in a HDR format, and improve existing measurement protocols to objectively quantify the video quality produced by those systems. These protocols study adaptation to static and dynamic HDR scenes with illuminant changes as well as the general consistency and readability of the scene’s dynamic range and color attributes. An experimental study has been made to compare the performances of HDR video capture to Standard Dynamic Range (SDR) video capture, where significant differences are observed, often with scene-specific content adaptation similar to the human visual system.
New visual noise measurement on a versatile laboratory setup in HDR conditions for smartphone camera testing, Thomas Bourbon, Coraline S. Hillairet, Benoit Pochon, and Frédéric Guichard, DXOMARK (France) [view abstract]
Smartphone camera photography uses multi-frame stacking strategies to render scenes with high dynamic range. ISO defined charts for OECF estimation and visual noise measurement are not really designed for these specific use cases. We developed a versatile laboratory setup to evaluate image quality attributes, like autofocus, exposure and details preservation. It is tested in various lighting conditions, with several dynamic ranges up to 7EV difference, under three different illuminants. Latest visual noise measurements proposed by IEEE P1858 or ISO 15739 were not giving satisfactory results on our laboratory scene, due to differences in the chart, framing and lighting conditions used. We performed subjective visual experiments to build a quality ruler of noisy grey patches, to use as a ground truth to validate a new visual noise algorithm. In the experiments we also studied the impact of different environment conditions of the grey patches to assess their relevance to our algorithm. We propose a new visual noise measurement using a luminance sensitivity function multiplied by the square root of the weighted sum of the variances of the Lab coordinates of the patches. We then apply a non-linear JND scaling to get a visual noise measurement in a JND of noisiness unit.
Combined image flare and dynamic range measurement from two test chart images, Norman Koren, Imatest LLC (United States) [view abstract]
Image Flare and Dynamic Range (DR) are tightly coupled in cameras, especially if they have High Dynamic Range (HDR) sensors, where DR in practical, real-world situations is almost entirely limited by image flare. Yet most flare measurements (such as ISO 18844) use test patterns that don’t resemble real-world images and can’t be easily correlated with Dynamic Range, and most DR test charts provide little information on flare. We propose a combined image flare/DR measurement that uses two exposures of a DR test chart, one without and one with a mask that covers most of the light areas of the chart. This produces two DR measurements: standard and low flare, as well as a flare measurement, where image flare (or veiling glare) is proportional to the difference between the mean of the log pixel levels of the darkest patches. This measurement can help optimize imaging systems for performance and cost by enabling direct observation of the effects of lens and sensor performance.
Application-Based Quality Assessment I
Mohamed Chaker Larabi, Université de Poitiers (France)
08:30 – 09:30
Image enhancement dataset for evaluation of image quality metrics, Altynay Kadyrova, Marius Pedersen, Bilal Ahmad, Dipendra J. Mandal, Mathieu Nguyen, and Pauline Hardeberg Zimmermann, Norwegian University of Science and Technology (Norway) [view abstract]
Image enhancement is important in different application areas such as medical imaging, computer graphics, and military applications. In this paper we introduce a dataset with images enhanced. The images have been enhanced by five end users, and these have been evaluated by observers in an online image quality experiment. The enhancement steps by the end users and subjective results are analyzed in detail. Furthermore, 38 image quality metrics have been evaluated on the introduced dataset to reveal their suitability to measure image enhancement. The results show that the image quality metrics have low to average performance on the new dataset.
Image quality evaluation of video conferencing solutions with realistic laboratory scenes, Rafael Falcon, Stanislas Brochard-Garnier, Gabriel P. Gouveia, Mauro Patti, Santiago T. Acevedo, Thelma Bergot, Rick Alarcon, Corentin Bomstein, Hervé Macudzinski, Pierre-Yves Maitre, Benoit Pochon, Laurent Chanas, Hoang-Phi Nguyen, and Frédéric Guichard, DXOMARK (France) [view abstract]
Video conferencing has become extremely relevant in the world in the latest years. Traditional image and video quality evaluation techniques prove insufficient to properly assess the quality of these systems, since they often include special processing pipelines, for example, to improve face rendering. Our team proposes a suite of equipment, laboratory scenes and measurements that include realistic mannequins to simulate a more true-to-life scene, while still being able to reliably measure image quality in terms of exposure, dynamic range, color and skin tone rendering, texture, and noise. These metrics are compared on five external cameras for video conference that are available on the market, and then was extended to laptops, smartphones and smart displays. Our results are consistent with perceptual evaluation and allow for an objective comparison of very different systems.
A continuous bitstream-based blind video quality assessment using multi-layer perceptron, Hugo Merly, Alexandre Ninassi, and Christophe Charrier, University de Caen Basse-Normandie (France) [view abstract]
We propose an efficient, general-purpose, continuous No-Reference Video Quality Assessment (NR-VQA) algorithm, namely CBVQI (Continuous Blinds Video Quality Index). Designed for H.264/AVC contents, CBVQI scheme extracts a set of 27 features for each Group of Pictures (GOP), relevant for perceptual quality prediction, directly from the bitstream. Then a Multi Layer Perceptron is trained to score the quality of the video GOP. This leads us to design a continuous NR-VQA scheme. To assess the performance of the proposed scheme, the Pearson Correlation Coefficient (PCC) and the Spearman Rank Ordered Correlation Coefficient (SROCC) are computed between the predicted values and the Quality Score (QS) of an home-made CCTV database and one public database. As no MOS are available for the CCTV database, one investigates how latest NR-VQA algorithms score the quality as a human being done. It has been shown that negligible bias is introduced when the Video Multi-method Assessment Fusion (VMAF) algorithm is used as ground-truth instead of MOS.
KEYNOTE: Quality and Perception
Session Chair: Mohamed Chaker Larabi, Université de Poitiers (France)
10:00 – 11:00
KEYNOTE: Towards neural representations of perceived visual quality, Sebastian Bosse, Fraunhofer Heinrich Hertz Institute (Germany)
Accurate computational estimation of visual quality as it is perceived by humans is crucial for any visual communication or computing system that has humans as the ultimate receivers. But most importantly besides the practical importance, there is a certain fascination to it: While it is so easy, almost effortless, to assess the visual quality of an image or a video, it is astonishingly difficult to predict it computationally. Consequently, the problem of quality estimation touches on a wide range of disciplines like engineering, psychology, neuroscience, statistics, computer vision, and, since a couple of years now, on machine learning. In this talk, Bosse gives an overview of recent advances in neural network-based-approaches to perceptual quality prediction. He examines and compares different concepts of quality prediction with a special focus on the feature extraction and representation. Through this, Bosse revises the underlying principles and assumptions, the algorithmic details and some quantitative results. Based on a survey of the limitations of the state of the art, Bosse discusses challenges, novel approaches and promising future research directions that might pave the way towards a general representation of visual quality.
Sebastian Bosse is head of the Interactive & Cognitive Systems group at Fraunhofer Heinrich Hertz Institute (HHI), Berlin, Germany. He studied electrical engineering and information technology at RWTH Aachen University, Germany, and Polytechnic University of Catalonia, Barcelona, Spain. Sebastian received the Dr.-Ing. in computer science (with highest distinction) from the Technical University Berlin (2018). During his studies he was a visiting researcher at Siemens Corporate Research, Princeton, (United States). In 2014, Sebastian was a guest scientist in the Stanford Vision and Neuro-Development Lab (SVNDL) at Stanford University, (United States). After 10 years as a research engineer working in the Image & Video Compression group and later in the Machine Learning group, he founded the research group on Interactive & Cognitive Systems at Fraunhofer HHI in 2020 that he has headed since. Sebastian is a lecturer at the German University in Cairo. He is on the board of the Video Quality Expert Group (VQEG) and on the advisory board of the Interational AIQT Foundation. Sebastian is an affiliate member of VISTA, York University, Toronto, and serves as an associate editor for the IEEE Transactions on Image Processing. Since 2021 he has been appointed a chair for the ITU focus group on Artificial Intelligence for Agriculture. His current research interests include the modelling of perception and cognition, machine learning, computer vision, and human-machine interaction over a wide field of applications ranging from multimedia and augmented reality, through medicine to agriculture and industrial production.
Application-Based Quality Assessment II
Susan Farnand, Rochester Institute of Technology (United States)
15:00 – 16:00
Accuracy and precision of an edge-based modulation transfer function measurement method using a variable oversampling ratio, Kenichiro Masaoka, NHK Science & Technology Research Laboratories (Japan) [view abstract]
The ISO 12233 edge-based method approximates the modulation transfer function (MTF) as a function of horizontal or vertical spatial frequency by analyzing a 1D supersampled edge gradient obtained from the captured image of a near-vertical or near-horizontal bi-tonal edge, respectively. The method involves the slanted projection of pixels in a square array into a linear array of subpixel-wide bins. An ad hoc method is available to accommodate diagonal MTF measurements as a function of spatial frequency perpendicular to the edge by scaling the spatial frequency of the MTF estimate using the ISO method depending on the slant angle. However, using a fixed integer oversampling ratio degrades the accuracy and precision of diagonal MTF estimates due to periodic misalignment between the projection paths and the bin array. The edge-based method—called OMNI-sine—uses a variable bin width dependent on the slant angle so that the intervals of each column and row of the square-grid sampling are aligned with the bin array axis that is perpendicular to the edge. In this study, computer simulations were performed to demonstrate that the OMNI-sine method improves the accuracy and precision of the MTF estimates over a full range of slant angles.
Quality-based video bitrate control for WebRTC-based teleconference services, Masahiro Yokota and Kazuhisa Yamagishi, Nippon Telegraph and Telephone Corporation (Japan) [view abstract]
In this article, we propose a quality-based video bitrate control method for web real-time communication (WebRTC)-based tele-conferences. Video bitrate is controlled on the basis of quality of service (QoS) parameters such as delay and packet-loss rate in WebRTC. Therefore, the amount of transferred data may increase because media streams are transmitted at excessive quality levels depending on QoS conditions (e.g., the jitter and packet-loss rate are low). An increase in transferred data leads to higher operational cost (i.e., data transferred cost) and affects profitable growth. In the proposed method, quality desired by a service provider is set as TargetQuality, and the video bitrate of each stream is controlled aiming at TargetQuality, thereby suppressing the amount of transferred data while maintaining sufficient quality. The proposed method is implemented to an actual tele-conference system and is evaluated in terms of its effect at reducing the amount of transferred data. The results show that the amount of transferred data can be reduced by more than 40% by setting the value of TargetQuality appropriately.
Assessing the impact of image quality on object-detection algorithms, Abhinau K. Venkataramanan, Marius Facktor, Praful Gupta, and Alan C. Bovik, The University of Texas at Austin (United States) [view abstract]
The field of image and video quality assessment has enjoyed rapid development over the last two decades. Several datasets and algorithms have been designed to understand the effects of common distortions on the subjective experiences of human observers. The distortions present in these datasets may be synthetic (applying artificially computed blur, compression, noise, etc.) or authentic (in-capture lens flare, motion blur, under/overexposure, etc.). The goal of quality assessment is often to quantify the loss of visual “naturalness” caused by the distortion(s). We have recently created a new resource called LIVE-RoadImpairs, which is a novel image quality dataset consisting of authentically distorted images of roadways. We use the dataset to develop a no-reference quality assessment algorithm that is able to predict the failure rates of object-detection algorithms. This work was among the overall winners of the PSCR Enhancing Computer Vision for Safety Challenge.
Image Quality Assessment Tools
Stuart Perry, University of Technology Sydney (Australia)
16:15 – 17:15
Generation of reference images using filtered radon transform and truncated SVD for structural artifacts, Seungwan Jeon, Yukyung Lee, Kundong Kim, Daeil Yu, Sung-Su Kim, and Joonseo Yim, Samsung Electronics Co., Ltd. (Republic of Korea) [view abstract]
Image quality assessment (IQA) is an effective way to evaluate image/signal processes (ISPs). Here, we present a single value decomposition (SVD)-based IQA method to quantitatively evaluate morphological distortion of chessboard patterns. Incorrect ISP tuning parameters can create suboptimal images with artifacts on the edges of small text or high-frequency patterns. We reproduced those artifacts by using a small chessboard pattern and quantitatively evaluated the morphological distortion in the pattern. Then, we verified our method through qualitative evaluation survey and Pearson correlation. As a result, the score of the proposed method was in good agreement with the qualitative evaluation result and had a Pearson correlation coefficient (PCC) of 0.97.
Color image distortion assessment based on synthetic ground truth recovery, Jungmin Lee, Seunghyeok June, Jiyun Bang, Sung-Su Kim, and Joonseo Yim, Samsung Electronics Co., Ltd. (Republic of Korea) [view abstract]
This paper proposes a quantitative method to measure the color distortion that can occur in color images. There are two main types of color distortion, false color and decolorization. Traditionally, the demosaic process of converting bayer to RGB might results in color distortion, but up-to-date complex color filter array(CFA) can cause even more severe color distortion. Since the conventional method of measuring color distortion requires a reference image, it is difficult to measure color distortion in a situation where the reference image is not secured. We have developed a comprehensive method based on recovering the undistorted color components corresponding to ground truth. Our method uses a chart designed for this purpose and evaluates the color distortion based on this chart.
Image distortion inference based on correlation between line pattern and character, Sungho Gil, Ohyeong Kim, Eunji Yong, Sung-Su Kim, and Joonseo Yim, Samsung Electronics Co., Ltd. (Republic of Korea) [view abstract]
Samsung introduced pixel-merging technologies such as Tetrapixel, and these enabled the mobile image sensors to reproduce colors properly depend on light conditions. High resolution image can be acquired when in well-lit area, by reorganize colors on the color filter array to RGB Bayer pattern. The aforementioned process is called remosaic algorithm, based on estimating direction information. It causes some artifacts on edges with various or unclear direction, for example in text-image, especially at high spatial frequencies. We focused on such artifacts caused by remosaicing, and proposed suitable image quality metric that can measure a degree of the artifacts.
Tuesday 25 January 2022
IS&T Awards & PLENARY: Physics-based Image Systems Simulation
07:00 – 08:00
Three quarters of a century ago, visionaries in academia and industry saw the need for a new field called photographic engineering and formed what would become the Society for Imaging Science and Technology (IS&T). Thirty-five years ago, IS&T recognized the massive transition from analog to digital imaging and created the Symposium on Electronic Imaging (EI). IS&T and EI continue to evolve by cross-pollinating electronic imaging in the fields of computer graphics, computer vision, machine learning, and visual perception, among others. This talk describes open-source software and applications that build on this vision. The software combines quantitative computer graphics with models of optics and image sensors to generate physically accurate synthetic image data for devices that are being prototyped. These simulations can be a powerful tool in the design and evaluation of novel imaging systems, as well as for the production of synthetic data for machine learning applications.
Joyce Farrell, Stanford Center for Image Systems Engineering, Stanford University, CEO and Co-founder, ImagEval Consulting (United States)
Joyce Farrell is a senior research associate and lecturer in the Stanford School of Engineering and the executive director of the Stanford Center for Image Systems Engineering (SCIEN). Joyce received her BS from the University of California at San Diego and her PhD from Stanford University. She was a postdoctoral fellow at NASA Ames Research Center, New York University, and Xerox PARC, before joining the research staff at Hewlett Packard in 1985. In 2000 Joyce joined Shutterfly, a startup company specializing in online digital photofinishing, and in 2001 she formed ImagEval Consulting, LLC, a company specializing in the development of software and design tools for image systems simulation. In 2003, Joyce returned to Stanford University to develop the SCIEN Industry Affiliates Program.
PANEL: The Brave New World of Virtual Reality
08:00 – 09:00
Advances in electronic imaging, computer graphics, and machine learning have made it possible to create photorealistic images and videos. In the future, one can imagine that it will be possible to create a virtual reality that is indistinguishable from real-world experiences. This panel discusses the benefits of this brave new world of virtual reality and how we can mitigate the risks that it poses. The goal of the panel discussion is to showcase state-of-the art synthetic imagery, learn how this progress benefits society, and discuss how we can mitigate the risks that the technology also poses. After brief demos of the state-of-their-art, the panelists will discuss: creating photorealistic avatars, Project Shoah, and digital forensics.
Panel Moderator: Joyce Farrell, Stanford Center for Image Systems Engineering, Stanford University, CEO and Co-founder, ImagEval Consulting (United States)
Panelist: Matthias Neissner, Technical University of Munich (Germany)
Panelist: Paul Debevec, Netflix, Inc. (United States)
Panelist: Hany Farid, University of California, Berkeley (United States)
Image Capture Performance I
Peter Burns, Rochester Institure of Tech. (United States)
09:15 – 10:15
Creation and evolution of ISO 12233, the international standard for measuring digital camera resolution, Ken Parulski1, Dietmar Wueller2, Peter Burns3, and Hideaki Yoshida4; 1aKAP Innovation, LLC (United States), 2Image Engineering GmbH & Co. KG (Germany), 3Burns Digital Imaging (United States), and 4Digital Solutions (Japan) [view abstract]
Thirty years ago, a new ISO working group on digital photography, TC42/WG18, began developing a standard to measure the spatial resolution of digital cameras. After several years of proposals, testing, and analysis, consensus was reached on a combination chart with slightly slanted edge features for measuring spatial frequency response (e-SFR) and hyperbolic wedges for measuring visual resolution. First published in 2000, ISO 12233 is used today to measure cameras in a wide range of imaging applications and referenced in other international standards (e.g., IEC 62676-5, IEEE P2020, ISO 16067). Examples of the challenges of applying ISO 12233 in areas other than photography are described. ISO 12233 was revised in 2014 to define three new charts, a sine wave modulated target in polar format, a low contrast e-SFR target, and the CIPA chart with software which computes a “human equivalent visual resolution” value. ISO 12233 is currently being revised, to provide improved results in challenging applications. This revision adds an optional non-uniformity compensation method and an acutance calculation which converts SFR measurements into a single number, correlated with the visual perception of sharpness. It also includes improvements to the e-SFR measurement algorithm, described in a companion paper by Peter Burns.
Estimation of ISO12233 edge spatial frequency response from natural scene derived step-edge data (JIST-first), Oliver van Zwanenberg1, Sophie Triantaphillidou1, Robin B. Jenkin2, and Alexandra Psarrou1; 1University of Westminster (United Kingdom) and 2NVIDIA Corporation (United States) [view abstract]
The Natural Scene derived Spatial Frequency Response (NS-SFR) is a novel camera system performance measure that derives SFRs directly from images of natural scenes and processes them using ISO12233 edge-based SFR (e-SFR) algorithm. NS-SFR is a function of both camera system performance and scene content. It is measured directly from captured scenes, thus eliminating the use of test charts and strict laboratory conditions. The effective system e-SFR can be subsequently estimated from NS-SFRs using statistical analysis and a diverse dataset of scenes. This paper first presents the NS-SFR measuring framework, which locates, isolates, and verifies suitable step-edges from captures of natural scenes. It then details a process for identifying the most likely NS-SFRs for deriving the camera system e-SFR. The resulting estimates are comparable to standard e-SFRs derived from test chart inputs, making the proposed method a viable alternative to the ISO technique, with potential for real-time camera system performance measurements.
Analysis of natural scene derived spatial frequency responses for estimating camera ISO12233 slanted-edge performance (JIST-first), Oliver van Zwanenberg1, Sophie Triantaphillidou1, Alexandra Psarrou1, and Robin B. Jenkin2; 1University of Westminster (United Kingdom) and 2NVIDIA Corporation (United States) [view abstract]
The Natural Scene derived Spatial Frequency Response (NS-SFR) framework automatically extracts suitable step-edges from natural pictorial scenes and processes these edges via the edge-based ISO12233 (e-SFR) algorithm. Previously, a novel methodology was presented to estimate the standard e-SFR from NS-SFR data. This paper implements this method using diverse natural scene image datasets from three characterized camera systems. Quantitative analysis was carried out on the system e-SFR estimates to validate accuracy of the method. Both linear and non-linear camera systems were evaluated. To investigate how scene content and dataset size affect system e-SFR estimates, analysis was conducted on entire datasets, as well as subsets of various sizes and scene group types. Results demonstrate that system e-SFR estimates strongly correlate with results from test chart inputs, with accuracy comparable to that of the ISO12233. Further work toward improving and fine-tuning the proposed methodology for practical implementation is discussed.
Image Capture Performance II
Elaine Jin, Rivian Automotive, Inc. (United States)
10:45 – 11:45
Updated camera spatial frequency response for ISO 12233, Peter Burns1, Kenichiro Masaoka2, Ken Parulski3, and Dietmar Wueller4; 1Burns Digital Imaging (United States), 2NHK Science & Technology Research Laboratories (Japan), 3aKAP Innovation, LLC (United States), and 4Image Engineering GmbH & Co. KG (Germany) [view abstract]
The edge-based Spatial Frequency Response (e-SFR) method is well established and has been included in the ISO 12233 standard since the first version. A new, fourth, version of the standard is proceeding, with changes that are intended to broaden its application and improve reliability. We report on results for advanced edge-fitting which, was not included in the current standard. The polynomial fitting discussion includes improved statistical estimation based on scaled variables. The application of the e-SFR method to a range of edge-feature angles is also helped by the inclusion of an angle-based correction. In addition, various smoothing windows were investigated, including the current Hamming and Tukey forms. A version of multi-phase binning was also tested and compared with the current method. We present a detailed account of how the testing was completed for a wider range of edge test features than previously addressed by ISO 12233, in particular near-zero and -45 degrees.
Temporal MTF evaluation of slow motion mode in mobile phones, Lin Luo, Celalettin Yurdakul, Kaijun Feng, and Bo Mu, OmniVision Technologies Inc. (United States) [view abstract]
While slow motion has become a standard feature in mainstream cell phones, a fast approach without relying on specific training datasets to assess slow motion video quality is not available. This manuscript proposes a modulation transfer functions (MTF) based approach which is generalized and fast to assess slow motion mode in mobile phones. First, a standard chart embodying slanted edges is used to capture a slow-motion video. Second, the edge spread function is extracted from a region of interest embodying a slanted edge in individual slow motion frames. Then, the line spread functions and MTF are calculated. MTF50/MTF20/(MTF area) are used to quantify the quality of slow motion frames and the final score is obtained by a temporal pooling method. When reference frames are provided, additionally, sharpness loss of a slow motion video can be specified as MTF scores difference between the ground truth and the slow motion video. In the experiment, several mainstream mobile phones are mounted on an upright motorized linear stage apart from the chart. Slow-motion videos are then captured by moving mobile phones at a constant speed while keeping the test chart still. The proposed MTF scores of different cell phones are analyzed and compared.
Optimizing modulation transfer function measurement method for video endoscopes, Chinh V. Tran1,2, Josh Pfefer2, Nader Namazi1, and Quanzeng Wang2; 1The Catholic University of America and 2U.S. Food and Drug Administration (United States) [view abstract]
Medical endoscopes are widely used for early cancer detection and disease diagnosis. Innovative designs for endoscopes are emerging and component technology is evolving to improve performance and/or reduce cost. The only endoscope resolution standard, ISO 8600-5:2020, recommends the use of a modulation transfer function (MTF) approach to evaluate endoscope resolution. However, this document only applies to rigid endoscopes without electronic components, whereas the vast majority of modern video endoscopes based on opto-electronic imaging systems are excluded. A new MTF-based method for video endoscopes is needed to facilitate the progress of endoscopy. MTF measurement can be affected by different factors, such as illumination condition (spectrum, intensity, uniformity), target design and quality (pattern, contrast, resolution, background intensity), and image analysis method (algorithm, size and location of region of interest). In this study, the effects of these factors on the slanted-edge-based MTF approach were evaluated. The results of this work will help to establish best practices for objective, consistent assessment of MTF and facilitate the advancement of high quality endoscopic technology.
Wednesday 26 January 2022
Learning-Based Quality Assessment
Mylène Farias, University of Brasilia (Brazil)
07:00 – 08:00
Multi-gene genetic programming based predictive models for full-reference image quality assessment (JIST-first), Naima Merzougui and Leila Djerou, University of Biskra (Algeria) [view abstract]
Many objective quality metrics have been developed during the last decade. A simple way to improve the efficiency of assessing the visual quality of images is to fuse several metrics into some combined ones. The goal of the fusion approach is to exploit the advantages of the used metrics and diminish the influence of their drawbacks. In this paper, a symbolic regression technique using an evolutionary algorithm known as multi-gene genetic programming (MGGP) is applied for predicting subject scores of images in datasets, by the combination of objective scores of a set of image quality metrics (IQM). By learning from image da-tasets, the MGGP can determine the appropriate image quality metrics, from 21 used metrics, whose objective scores employed as predictors, in the symbolic regression model, by optimizing simultaneously two competing objectives of model 'goodness of fit' to data and model 'complexity'. Six largest publicly available image databases (namely LIVE, CSIQ, TID2008, TID2013, IVC and MDID) are used for learning and testing the predictive models, according the k-fold-cross-validation and the cross dataset strategies. The proposed approach is compared against state-of-the-art objective image quality assessment approaches. Results of comparison reveal that the proposed approach outperforms other state-of-the-art recently developed fusion approaches.
Learning-based 3D point cloud quality assessment using a support vector regressor, Aladine Chetouani1, Maurice Quach2, Giuseppe Valenzise2, and Frédéric Dufaux2; 1Université d'Orléans and 2L2S, Centrale Supélec, Université Paris-Saclay (France) [view abstract]
Recent advances in capture technologies have increased the production of 3D content in the form of Point Clouds (PCs). The perceived quality of such data can be impacted by typical processing including acquisition, compression, transmission, visualization, etc. In this paper, we propose a learning-based method that efficiently predicts the quality of distorted PCs through a set of features extracted from the reference PC and its degraded version. The quality index is obtained here by combining the considered features using a Support Vector Regression (SVR) model. The performance contribution of each considered feature and their combination are compared. We then discuss the experimental results obtained in the context of state-of-the-art methods using 2 publicly available datasets. We also evaluate the ability of our method to predict unknown PCs through a cross-dataset evaluation. The results show the relevance of introducing a learning step to merge features for the quality assessment of such data.
Image quality assessment: Learning to rank image distortion level, Shira Faigenbaum-Golovin1 and Or Shimshi2; 1Duke University (United States) and 2Consultant (Israel) [view abstract]
Over the years, various algorithms were developed, attempting to imitate the Human Visual System (HVS), and evaluate the perceptual image quality. However, for certain image distortions, the functionality of the HVS continues to be an enigma, and echoing its behavior remains a challenge (especially for ill-defined distortions). In this paper, we learn to compare the image quality of two registered images, with respect to a chosen distortion. Our method takes advantage of the fact that at times, simulating image distortion and later evaluating its relative image quality, is easier than assessing its absolute value. Thus, given a pair of images, we look for an optimal dimensional reduction function that will map each image to a numerical score, so that the scores will reflect the image quality relation (i.e., a less distorted image will receive a lower score). We look for an optimal dimensional reduction mapping in the form of a Deep Neural Network which minimizes the violation of image quality order. Subsequently, we extend the method to order a set of images by utilizing the predicted level of the chosen distortion. We demonstrate the validity of our method on Latent Chromatic Aberration and Moire distortions, on synthetic and real datasets.
Immersive Quality of Experience
Sophie Triantaphillidou, University of Westminster (United Kingdom)
08:30 – 09:30
Exploration of comfort factors for virtual reality environments, Thibault Lacharme, Mohamed Chaker Larabi, and Daniel Meneveaux, Université de Poitiers (France) [view abstract]
Virtual reality (VR) is becoming more and more present in our everyday life (education, industry, gaming, etc.). It allows an almost total immersion in the designed environment by using head-mounted displays (HMD). However, depending on the used content and device, the comfort of the user may be significantly altered. It appears very important to understand and model the factors involved in this process in order to improve the quality of experience (QoE) of users. Many authors investigated the potential factors leading to sickness or discomfort in immersive environments. These studies target a limited number of factors (often a single one) and do not propose a model for assessing their impact on comfort and/or QoE. Our aim is to get upstream of the content production workflow and propose a solution to predict the discomfort level that could be generated by VR. This implies building a metric combining different factors having a significant impact on comfort based on psychophysical experiments. In this study, we start by identifying the factors responsible for discomfort. Then, we build a reliable and reproducible experimental protocol to characterize their impact on the general comfort of the user. A deep statistical study of the results allows to understand the nature of this impact and helps in designing a model to predict it at the early stage of the content production for immersive applications.
Designing a user-centric framework for perceptually-efficient streaming of 360° edited videos, Lucas dos Santos Althoff, Myllena Prado, Henrique Garcia, Gabriel Araújo, Israel Nascimento, Dario D. Moraes, Sana Alamgeer, Mylène C. Farias, and Marcelo Carvalho, University of Brasília (Brazil) [view abstract]
In the last few years, the popularity of immersive applications has experienced a major increase because of the introduction of powerful imaging and display devices. The most popular immersive media are 360-degree videos, which provide the sensation of immersion. Naturally, these videos require significantly more data, which is a challenge for streaming applications. In this work, our goal is to design a perceptually efficient streaming protocol based on edited versions of the original content. More specifically, we propose to use visual attention and semantic analysis to implement an automatic perceptual edition of 360-degree videos and design an efficient Adaptive Bit Rate (ABR) streaming scheme. The proposed scheme takes advantage of the fact that movies are made of a sequence of different shots, separated by cuts. Cuts can be used to attract viewer’s attention to important events and objects. In this paper, we report the first stage of this scheme: the content analysis used to select temporal and spatial candidate cuts. For this, we manually selected candidate cuts from a set of 360-degree videos and analyzed the users' quality of experience (QoE). Then, we computed their salient areas and analyzed if these areas are good candidates for the video cuts.
Patch-based CNN model for 360 image quality assessment with adaptive pooling strategies, Abderrezzaq Sendjasni1,2, Mohamed Chaker Larabi1, and Faouzi Alaya Cheikh2; 1Université de Poitiers (France) and 2Norwegian University of Science and Technology (Norway) [view abstract]
360-degree image quality assessment using deep neural networks is usually designed using a multi-channel paradigm exploiting possible viewports. This is mainly due to the high resolution of such images and the unavailability of ground truth labels (subjective quality scores) for individual viewports. The multi-channel model is hence trained to predict the score of the whole 360-degree image. However, this comes with a high complexity cost as multi neural networks run in parallel. In this paper, a patch-based training is proposed instead. To account for the non-uniformity of quality distribution of a scene, a weighted pooling of patches’ scores is applied. The latter relies on natural scene statistics in addition to perceptual properties related to immersive environments.