AI and Visual Content: Advancing Computer Vision and Combating Deepfakes

The way we use visual content is evolving. In part, it has to do with significant advances that have recently occurred in computer vision, a major branch of Artificial Intelligence (AI). Let’s consider how it works and look at related fields such as biometrics and AI security.

Inspiration and method behind computer vision

Recognition and synthesis of objects enables AI to manipulate them, creating images and videos that humans cannot distinguish from reality.

Object recognition is preceded by teaching AI to “see” objects: processing and segmenting captured images, detecting objects, including moving ones, understanding full scenes. For humans, the process of identifying objects in the scene and understanding subtle relationships takes mere seconds, but the AI has to break a 50-second video into 3,000 fames of images to see and understand the video. This is done with the help of computer vision (CV).

It works on convolutional neural networks (CNNs), which were inspired by the way human vision works. As Dr. Kai-Fu Lee and Chen Qiufan also explain in their “AI 2041. Ten Visions for Our Future” book, our visual cortex uses neurons corresponding to multiple receptive fields, which identify basic features reported to the neocortex, where information is stored hierarchically and processed. CNNs have multiple filters acting as receptive fields, and deep learning decides through optimization what each filter learns. The higher layers of the networks are organized like the neocortex, detecting parts of the object to see it and distinguish it from other objects. CNNs were first discussed four decades ago, but there was not enough data and computational power in the 1980s to demonstrate the potential of these networks. Now, they are the best approach to computer vision.

Computer vision is already in use nowadays. Facial recognition, autonomous navigation of drones and cars, driver assistants installed in cars are some of the examples. Computer vision can also be applied to images and videos. Photoshop uses it for its many functions; other examples include medical image analysis and content moderation. Deepfake-making tools recognize faces in videos and replace them with other faces.

The making of deepfakes

The warning about deepfakes reached large audiences in 2018, when Jordan Peele and BuzzFeed created a deepfake video featuring “President Obama.” The AI technology used Peele’s recorded speech and Obama’s real videos in order to morph the former’s voice into the latter’s and modify Obama’s face. In 2019 and 2021, more examples followed: a Chinese app turning its users into main characters of famous films with just a selfie, the Avatarify app that made people in photos sing or laugh.

Jordan Peele, BuzzFeed Produce Obama Fake-News Awareness Video

Dr. Lee and Mr. Quifan use an imaginary story to illustrate how deepfakes are created – “Gods Behind the Masks,” in which the deepfake video created by the main character escapes exposure by ordinary anti-deepfake detection software. The character adds lip-synced speech to a preexisting, real video, matching even the speaker’s pulse and breathing patterns, to get a high-quality deepfake.

Deepfakes are generated with GANs, the generative adversarial networks. In this mechanism, two deep learning neural networks are in a constant battle, as the first one, called the forger network, tries to generate something that looks real, and the other one, the detective network, detects forgery.

Based on the feedback from the second network, the forger one retrains itself to minimize the “loss function” – the difference between real images and those it generates. The detective network is also retraining, but with the opposite goal – to maximize the “loss function.” Most modern deepfakes are detectable by algorithms, as their quality remains rather low due to the limits of current computational power. The problem is that GAN has a built-in mechanism for upgrading the forger network, which means it can be retrained to fool existing and new detective algorithms. More powerful computers can train more complex GANs, which makes the matter one of computational power and turns the process into an arms race.

Another approach, 3D, can build a model of a person purely computationally. 3D is a product of computer graphics, which models everything mathematically. Here, breathing patterns and other features such as light and shadows, hands and face need realistic mathematical models. Computational requirements are much higher for this method, and for now, these films cannot fool even a human eye.

The danger of deepfakes is the possibility of application with malicious intent – for blackmail or election manipulations. A possible solution could be authentication of all photos and videos via blockchain technology, as it guarantees that the original material has not been altered. Another element of successfully combating deepfakes could be legislation that outlaws the making of malicious deepfakes, such as the 2019 California law against using deepfakes for porn and manipulating videos of political candidates near an election.

Verifying and protecting our identities

Biometrics is the use of a person’s physical characteristics, such as fingerprints and irises, for verification of their identity. This field has made a leap forward thanks to recent advances in deep learning and GAN. Now, AI outperforms humans in recognition and verification of any person’s identity, and the accuracy is essentially perfect in situations where many features can be gathered.

As AI goes mainstream, it is bound to suffer from exploitation of its vulnerabilities. Deepfakes are one of them, but there is also the issue of AI’s decision boundaries. AI can be tricked into making mistakes, if the user manages to camouflage the input data and, e.g., trick AI into “recognizing” a tank as an ambulance.

Another possible form of exploitation is “poisoning,” or corruption of AI’s learning process with contamination of the training data, models, or process, which leads to AI’s systematic failure or enables the perpetrator to take control of the networks.

Technological innovations have been the answer to questions of security and reliability that arose about all new technologies. The chapter concludes that with time, if steps are taken toward mitigation of the threats described above, AI security will be largely achieved.

Promsopeak Sean Nuon
Sean Promsopeak Nuon
Lead engineer
Sean is technology-driven and passionate about working with technology that helps people. Now he finds himself as an executive member of Slash, executing the technology operation side from an entrepreneurship point of view. He has over 9 years of working experience dealing with technical problems, project management and team mindset building. He splits time between Solution Architect & Lead developer for enterprise clients and as part of the management team, he helps build future-proof architecture, define quality standards, team culture, and hiring & training practices.
In this article

Explore more resources

Consider these 5 Factors Before You Choose Web and Mobile Apps Service
Consider these 5 factors before you choose web and mobile apps service
Before choosing a web and mobile apps service, consider these 5 essential factors to ensure you pick the right company for your business. This brief guide will help you navigate the selection process effectively.
8 minute read·
by Alex Lossing ·
November 14, 2023
Marc Gamet
From workshop to book: crafting years of design wisdom
3 minute read·
by Marc Gamet ·
April 26, 2024
Skip to content