Facebook contest shows just how hard it is to detect deepfakes

June 12, 2020, 3:00 PM UTC

Facebook has revealed the winners of a competition for software that can detect deepfakes, doctored videos created using artificial intelligence. And the results show that researchers are still mostly groping in the dark when it comes to figuring out how to automatically identify them before they influence the outcome of an election or spark ethnic violence.

The best algorithm in Facebook’s contest could accurately determine if a video was real or a deepfake just 65% of the time.

Facebook chief technology officer Mike Schroepfer said even an imperfect system could help Facebook by deterring less technically skilled people from trying to post malicious deepfakes on the social network and ease the burden on the company’s human content reviewers.

The winning algorithm was created by Selim Seferbekov, a machine learning engineer at a company called Mapbox, in Belarus, according to his LinkedIn profile. He beat out more than 2,100 other contestants who submitted more than 3,500 detection systems, to win the $500,000 top prize. As part of the competition, Seferbekov must make his algorithm publicly available.

Schroepfer said that Facebook’s own engineers would likely borrow ideas and techniques from Seferbekov’s detection algorithm but would not put the same software into production, partly to prevent people from being able to figure out how to trick the system.

The second-place team, which called itself WM, consisted of three graduate machine learning researchers affiliated with the University of Science and Technology of China, in Hefei. Third place went to Azat Davletshin, a senior deep learning engineer at the Russian company NtechLab, which is known for its work on facial recognition.

Deepfakes first came to public attention in 2017 when Reddit users, including one who went by the handle “deepfake,” began using cutting-edge A.I. techniques to put the heads of various celebrities on the bodies of actresses in pornographic films. Since then, they have mostly been used to create short, humorous videos, public service ad campaigns, and, more darkly, revenge porn.

But many people fear it is only matter of time before the manipulated videos, which can be created using simple and cheap off-the-shelf software that requires no technical skills to use, are deployed as part of political disinformation campaigns.

That’s why Schroepfer said he wanted to encourage the development of tools for automatically detecting the fake videos. “It is not currently a big issue,” he said of the use of deepfakes on Facebook’s social media platforms. “But the lesson I learned the hard way over the past few years is that I want to be prepared in advance and not get caught flat-footed.”

He also said he was frustrated that A.I. researchers were spending a lot of time and energy developing techniques for creating ever-more realistic fake videos while spending almost no time at all on systems to detect them.

In September, Facebook teamed with Microsoft, Amazon Web Services, an industry ethics body, and several universities, to announce the competition, backed by $1 million in total prize money.

It hired more than 3,000 actors to create a data set of 115,000 10-second videos, and applied a variety of deepfake techniques to about 85% of them. It then released this data for competitors to train and test their algorithms.

The algorithms were evaluated on two separate tests: a public test set, which consisted of 4,000 video clips that Facebook produced, half of which were deepfakes, and which were not part of the training data set. But the winning algorithm was ultimately chosen by how well it performed on a second data set that contestants could not access, and which included 10,000 videos, half of which were deepfakes. Except in this case, half of the videos—both deepfakes and benign videos—were culled from the Internet, as opposed to ones created specifically for the competition.

While several algorithms achieved scores above 82% accuracy on the public test set, their accuracy dropped significantly on the private test set, an indication that there were probably subtle differences between the videos Facebook created for the competition and genuine deepfakes that the algorithms couldn’t handle.

But the algorithms that scored well on the public test set also tended to score best on the private test set, which Facebook’s researchers said was an indication that the same detection techniques would work in the real world, including on deepfakes created using new methods that the detection systems had not seen examples of during training.

As for the less-than-mind-blowing performance of the A.I. detection systems at the moment, “that just reinforces that building systems that generalize to unseen deepfake generation techniques is still a hard and open research problem,” the company said.

Facebook said that most of the top algorithms worked by analyzing the content of the images, trying to detect subtle aberrations in the faces or parts of the faces in the videos.

The company also said in a blog post about the competition that it was notable that none of the winners attempted to use what are called “digital forensic” techniques, such as looking for so-called “digital fingerprints” that images created using cameras, as opposed to those created entirely by software, leave behind on images. These telltale signs can include elements in the metadata of a file or subtle markers in the pixels themselves. Human experts who analyze videos for manipulation frequently use these markers.

The fact that the top entrants didn’t use these methods, Facebook said, was either an indication that such methods are not useful for finding deepfakes or that top machine learning experts are not familiar enough with these digital forensic signatures to use them in their detection systems.

More must-read tech coverage from Fortune:

Read More

Artificial IntelligenceCryptocurrencyMetaverseCybersecurityTech Forward