Monday 17 January 2022
IS&T Welcome & PLENARY: Quanta Image Sensors: Counting Photons Is the New Game in Town
07:00 – 08:10
The Quanta Image Sensor (QIS) was conceived as a different image sensor—one that counts photoelectrons one at a time using millions or billions of specialized pixels read out at high frame rate with computation imaging used to create gray scale images. QIS devices have been implemented in a CMOS image sensor (CIS) baseline room-temperature technology without using avalanche multiplication, and also with SPAD arrays. This plenary details the QIS concept, how it has been implemented in CIS and in SPADs, and what the major differences are. Applications that can be disrupted or enabled by this technology are also discussed, including smartphone, where CIS-QIS technology could even be employed in just a few years.
Eric R. Fossum, Dartmouth College (United States)
Eric R. Fossum is best known for the invention of the CMOS image sensor “camera-on-a-chip” used in billions of cameras. He is a solid-state image sensor device physicist and engineer, and his career has included academic and government research, and entrepreneurial leadership. At Dartmouth he is a professor of engineering and vice provost for entrepreneurship and technology transfer. Fossum received the 2017 Queen Elizabeth Prize from HRH Prince Charles, considered by many as the Nobel Prize of Engineering “for the creation of digital imaging sensors,” along with three others. He was inducted into the National Inventors Hall of Fame, and elected to the National Academy of Engineering among other honors including a recent Emmy Award. He has published more than 300 technical papers and holds more than 175 US patents. He co-founded several startups and co-founded the International Image Sensor Society (IISS), serving as its first president. He is a Fellow of IEEE and OSA.
08:10 – 08:40 EI 2022 Welcome Reception
Wednesday 19 January 2022
IS&T Awards & PLENARY: In situ Mobility for Planetary Exploration: Progress and Challenges
07:00 – 08:15
This year saw exciting milestones in planetary exploration with the successful landing of the Perseverance Mars rover, followed by its operation and the successful technology demonstration of the Ingenuity helicopter, the first heavier-than-air aircraft ever to fly on another planetary body. This plenary highlights new technologies used in this mission, including precision landing for Perseverance, a vision coprocessor, new algorithms for faster rover traverse, and the ingredients of the helicopter. It concludes with a survey of challenges for future planetary mobility systems, particularly for Mars, Earth’s moon, and Saturn’s moon, Titan.
Larry Matthies, Jet Propulsion Laboratory (United States)
Larry Matthies received his PhD in computer science from Carnegie Mellon University (1989), before joining JPL, where he has supervised the Computer Vision Group for 21 years, the past two coordinating internal technology investments in the Mars office. His research interests include 3-D perception, state estimation, terrain classification, and dynamic scene analysis for autonomous navigation of unmanned vehicles on Earth and in space. He has been a principal investigator in many programs involving robot vision and has initiated new technology developments that impacted every US Mars surface mission since 1997, including visual navigation algorithms for rovers, map matching algorithms for precision landers, and autonomous navigation hardware and software architectures for rotorcraft. He is a Fellow of the IEEE and was a joint winner in 2008 of the IEEE’s Robotics and Automation Award for his contributions to robotic space exploration.
Media Watermarking, Security, and Forensics 2022 Posters
08:20 – 09:20
Poster interactive session for all conferences authors and attendees.
P-23: Robust face recognition: How much face is needed?, Niklas Bunzel, Fraunhofer Institute for Secure Information Technology (Germany) [view abstract]
Face recognition systems are used in high security applications for identification, authentication and authorization. Therefore they need to be robust not only to people wearing face accessories and masks like in the COVID19 pandemic, they also need to be robust against adversarial attacks. We have identified three inconspicuous facial areas to wear adversarial examples to attack face recognition. These are the mouth-nose section, the forehead and the eye area. In this paper, we will address the question of how much of a face needs to be present for successful identification and whether removing the critical regions is a viable countermeasure against adversarial examples.
P-24: Using a GAN to generate adversarial examples to facial image recognition, Andrew Merrigan and Alan Smeaton, Dublin City University (Ireland) [view abstract]
Images posted online present a privacy concern in that they may be used as reference examples for a facial recognition system, most of which are now based on using deep neural networks. Such abuse of images is in violation of people’s privacy rights but it is very difficult to counter this. It is well established that adversarial example images can be created for recognition systems which are based on deep neural networks. These adversarial examples can be used to disrupt the utility of the images as reference examples or training data, for an individual. In this work we use a Generative Adversarial Network (GAN) to create adversarial examples to deceive facial recognition and we achieve an acceptable success rate in fooling the face recognition. Our results reduce the training time for the GAN by removing the discriminator component. Furthermore, our results show knowledge distillation can be employed to drastically reduce the size of the resulting model without impacting performance.
Monday 24 January 2022
Video and Image Authentication
Adnan Alattar, Digimarc Corporation (United States) and Nasir Memon, New York University (United States)
08:30 – 09:35
A video auditing system for display-based voting machines, Scott A. Craver and Gurinder Bal, Binghamton University (United States) [view abstract]
The use of general-purpose computers as touch-screen voting machines has created several difficult auditing problems. If voting machines are compromised by malware, they can adapt their behavior to evade testing and auditing, and paper trails are achieved through printing devices under the untrusted machine’s control. In this paper we outline and exhibit a prototype of a device that audits a voting machine through screen capture, sampling the HDMI signal passed from the computer to the display. This is achieved through a standard that requires a compliant voting machine to display signal markers on the summary pages before a vote is cast; compliance is enforced via alerts to the voter with a visual and audible signal while the screen is captured and archived. This direct feedback to the voter prevents a compromised machine from failing to invoke the device. We discuss the design and prototype of this system and possible avenues for attack on it.
Forensic data model for artificial intelligence based media forensics - Illustrated on the example of DeepFake detection, Dennis Siegel, Christian Krätzer, Stefan Kiltz, and Jana Dittmann, Otto-von-Guericke University Magdeburg (Germany) [view abstract]
The recent development of AI systems and their frequent use for classification problems poses a challenge from a forensic perspective. Especially in the field of DeepFake detection, black-box approaches such as neural networks are commonly used. As a result, the underlying classification models lack explainability and interpretability. In addition, there are a variety of requirements for the use of AI systems from different perspectives. In order to increase traceability while also taking requirements for AI systems into account, this work presents an extension of the classical signal processing pipeline for video-based authentication. Here, the use of AI is assumed as decision support, which requires some form of visualization. The designed model focuses on the documentation of each individual processing step, taking into account the occurring data types.
Smartphone-supported integrity verification of printed documents, Waldemar Berchtold, Dani El-Soufi, and Martin Steinebach, Fraunhofer Institute for Secure Information Technology (Germany) [view abstract]
This work discusses document security, use of OCR, and integrity verification related to printed documents. Since the underlying applications are usually documents containing sensitive personal data, a solution that does not require the entire data to be stored in a database is the most compatible. In order to allow verification to be performed by anyone, it is necessary that all the data required for this is contained on the document itself. The approach must be able to cope with different layouts so that the layout does not have to be adapted for each document. In the following, we present a concept and its implementation that allows every smartphone user to verify the authenticity and integrity of a document.
Jennifer Newman, Iowa State University (United States)
10:00 – 11:00
Enhancing PRNU-based image forensics with a non-parametric correlation predictor based on locally weighted regression, Sujoy Chakraborty, Stockton University (United States) [view abstract]
For PRNU-based image manipulation localization, the correlation predictor plays a crucial role to reduce false positives considerably, as well as increasing accuracy of manipulation localization. In this paper, we propose a novel correlation predictor with a non-parametric learning algorithm, which is Locally Weighted Regression. Instead of fitting a global set of model parameters, a non-parametric learning algorithm fits a model dynamically by sampling the training set based on the pixel in the query image at which the correlation needs to be predicted. Our experimental results suggest that building a model dynamically based on the distance of training examples from the query pixel in the feature space helps to predict the correlation more accurately. Experimental results on benchmark datasets indicate that integrating the new predictor significantly improves the accuracy of predicted correlation, as well as image manipulation localization performance of PRNU-based forensic detectors.
Comparative study of DL-based methods performance for camera model identification with multiple databases, Alexandre Berthet and Jean-Luc Dugelay, EURECOM (France) [view abstract]
Camera identification is an important topic in the field of digital image forensics. There are three levels of classification: brand, model, and device. Studies in the literature are mainly focused on camera model identification. These studies are increasingly based on deep learning (DL). DL-based methods are dedicated to three approaches: basic (model classification only), triple (brand, model and device identification) and open-set (known and unknown cameras). Unlike other areas of image processing such as face recognition, most of these methods are only evaluated on a single database (Dresden). The available databases have a diversity in terms of camera content and distribution that is unique to each of them and makes the use of a single database irrelevant. Therefore, we decided to use different publicly available databases (Dresden, SOCRatES, and Forchheim) that combine enough elements to perform a viable comparison of DL-based methods for camera model identification. In addition, to overcome the disparity of approaches among the methods in the literature, we decided to perform basic approach for camera model identification. We also use transfer learning (specifically fine-tuning) to perform our comparative study across databases.
NoiseSeg: An image splicing localization fusion CNN with noise extraction and error level analysis branches, Karol Gotkowski, Huajian Liu, and Martin Steinebach, Fraunhofer Institute for Secure Information Technology (Germany) [view abstract]
Detecting and localizing image splicing is a very practical but challenging task in image forensics. In this paper, we propose a novel CNN-based image splicing detector called NoiseSeg, intending to detect and localize spliced image regions reliably and efficiently. In NoiseSeg, the detection and localization of spliced image regions is viewed as a segmentation problem. Both statistical and CNN-based forensic features are combined and fed into a feature pyramid network, which is trained to pick up different noise artifacts, patterns and statistical abnormalities in order to successfully identify spliced image regions. Experimental results on several popular image splicing datasets show that NoiseSeg significantly outperforms most other state-of-the-art statistical and CNN-based methods.
Watermarking and Steganography
Nasir Memon, New York University (United States)
15:00 – 16:00
Image montage detection based on image segmentation and robust hashing techniques, Martin Steinebach, Tiberius Berwanger, and Huajian Liu, Fraunhofer Institute for Secure Information Technology (Germany) [view abstract]
We provide a method for montage recognition allowing to re-identify background images and inserted objects. It can also be used as a highly robust method for image re-identification beyond montages. To achieve this, we combine three mechanisms: segmentation, orientation detection and robust hashing. We show that this approach provides detection rates similar to more complex algorithms based on feature matching but is significantly more efficient with respect to storage requirements and computational time. We provide test results for various attacks like rotation, scaling, cropping, addition of noise and brightness changes.
Image data hiding with multi-scale autoencoder network, Chen-hsiu Huang, National Taiwan University (Taiwan) [view abstract]
Image steganography or watermarking is the process of hiding secrets inside a cover image for communication or proof of ownership. Zhu et al.  proposed the deep learning-based image data hiding technique, the HiDDeN model, to train encoder and decoder networks jointly with adversarial examples. Their results show visually indistinguishable encoded images, from which the decoder can recover the original message and is robust to noises and image distortions. We improve the HiDDeN model with a multi-scale autoencoder network so that the neural network learns to embed message bits in a higher-level feature space. Compared to the HiDDeN model, learning to hide secrets in both low-level and high-level image features significantly reduces the bit error rate and improves the learning efficiency during training. The robustness of our watermarking scheme against the introduced noise layer is also superior to that of the Conv-BN-ReLU blocks used in the original HiDDeN model. Meanwhile, the autoencoder-induced downsampling convolution layers dramatically reduce the network parameters required for training, thus making training on a larger-sized image possible without dividing the image into smaller blocks.