How programmers are using AI to make deepfakes — and even detect them
Autoencoders
Deep learning algorithms come in different formats. Many people think deepfakes are created withgenerative adversarial networks (GAN), a deep learning algorithm that learns to generate realistic images from noise. And it is true, there are variations of GANs that can create deepfakes.
But the main type of neural network used in deepfakes is the “autoencoder.” An autoencoder is a special type of deep learning algorithm that performs two tasks. First, it encodes an input image into a small set of numerical values. (In reality, it could be any other type of data, but since we’re talking about deepfakes, we’ll stick to images.) The encoding is done through a series of layers that start with many variables and gradually become smaller until they reach a “bottleneck” layer. The bottleneck layer contains the target number of variables.
Next, the neural network decodes the data in the bottleneck layer and recreates the original image.
During the training, the autoencoder is provided with a series of images. The goal of the training is to find a way to tune the parameters in the encoder and decoder layers so that the output image is as similar to the input image as possible.
The narrower the problem domain, the more accurate the results of the autoencoder becomes. For instance, if you train an autoencoder only on the images of your own face, the neural network will eventually find a way to encode the features of your face (mouth, eyes, nose, etc.) in a small set of numerical values and use them to recreate your image with high accuracy.
You can think of an autoencoder as a super-smart compression-decompression algorithm. For instance, you can run an image into the encoding part of the neural network, and use the bottleneck representation for small storage or fast network transfer of data. When you want to view the image, you only need to run the encoded values through the decoding half and return it to its original state.
But there are other things that the autoencoder can do. For instance, you can use it for noise reduction or generating new images.
Deepfake autoencoders
Deepfake applications use a special configuration of autoencoders. In fact, a deepfake generator uses two autoencoders, one trained on the face of the actor and another trained on the target.
After the autoencoders are trained, you switch their outputs, and something interesting happens. The autoencoder of the target takes video frames of the target, and encodes the facial features into numerical values at the bottleneck layer. Then, those values are fed to the decoder layers of the actor autoencoder. What comes out is the face of the actor with the facial expression of the target.
In a nutshell, the autoencoder grabs the facial expression of one person and maps it onto the face of another person.
Training the deepfake autoencoder
The concept of deepfake is very simple. But training it requires considerable effort. Say you want to create a deepfake version of Forrest Gump that stars John Travolta instead of Tom Hanks.
First, you need to assemble the training dataset for the actor (John Travolta) and the target (Tom Hanks) autoencoders. This means gathering thousands of video frames of each person and cropping them to only show the face. Ideally, you’ll have to include images from different angles and lighting conditions so your neural networks can learn to encode and transfer different nuances of the faces and the environments. So, you can’t just take one video of each person and crop the video frames. You’ll have to use multiple videos. There are tools that automate the cropping process, but they’re not perfect and still require manual efforts.
The need for large datasets is why most deepfake videos you see target celebrities. You can’t create a deepfake of your neighbor unless you have hours of videos of them in different settings.
After gathering the datasets, you’ll have to train the neural networks. If you know how tocode machine learning algorithms, you can create your own autoencoders. Alternatively, you can use a deepfake application such as Faceswap, which provides an intuitive user interface and shows the progress of the AI model as the training of the neural networks proceeds.
Depending on the type of hardware you use, the deepfake training and generation can take from several hours to several days. Once the process is over, you’ll have your deepfake video. Sometimes the result will not be optimal and even extending the training process won’t improve the quality. This can be due to bad training data or choosing the wrong configuration of your deep learning models. In this case, you’ll need to readjust the settings and restart the training from scratch.
In other cases, there are minor glitches and artifacts that can be smoothed out with some VFX work in Adobe After Effects.
In any case, at their current stage, deepfakes are not a clickthrough process. They’ve become a lot better, but they still require a good deal of manual effort.
Detecting deepfakes
Manipulated videos are nothing new. Movie studios have been using them in the cinema for decades. But previously, they required tremendous effort from experts and access to expensive studio gear. Although not trivial yet, deepfakes put video manipulation at the disposal of everyone. Basically, anyone who has a few hundred dollars to spare and the nerves to go through the process can create a deepfake from their own basement.
Naturally, deepfakes have become a source of worry and are perceived as a threat to public trust. Government agencies, academic research labs, and social media companies are all engaged in efforts to build tools that can detect AI-doctored videos.
Facebook is looking intodeepfake detectionto prevent the spread of fake news on its social network. The Defense Advanced Research Projects Agency (DARPA), the research arm of the U.S. Department of Defense, has alsolaunched an initiativeto stop deepfakes and other automated disinformation tools. And Microsoft has recently launched adeepfake detection toolahead of the U.S. presidential elections.
AI researchers have already developed various tools to detect deepfakes. For instance, earlier deepfakes contained visual artifacts such asunblinking eyesand unnatural skin color variations. One tool flagged videos in which people didn’t blink or blinked at abnormal intervals.
Another more recent method uses deep learning algorithms todetect signs of manipulationat the edges of objects in images. A different approach is to use blockchain to establish a database of signatures of confirmed videos and apply deep learning to compare new videos against the ground truth.
But the fight against deepfakes has effectively turned into a cat-and-mouse chase. As deepfakes constantly get better, many of these tools lose their efficiency. As one computer vision professortold me last year: “I think deepfakes are almost like an arms race. Because people are producing increasingly convincing deepfakes, and someday it might become impossible to detect them.”
This article was originally published by Ben Dickson onTechTalks, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech and what we need to look out for. You can read the original articlehere.
Story byBen Dickson
Ben Dickson is the founder of TechTalks. He writes regularly about business, technology and politics. Follow him on Twitter and Facebook(show all)Ben Dickson is the founder ofTechTalks. He writes regularly about business, technology and politics. Follow him onTwitterandFacebook
Get the TNW newsletter
Get the most important tech news in your inbox each week.