New YouTube Metric Tracks 'Quality Watch Time'

YouTube is changing the way it measures success on the world’s biggest video site following a series of scandals. There’s just one problem: The company is still deciding how this new approach works.

The Google division introduced two new internal metrics in the past two years for gauging how well videos are performing, according to people familiar with the company’s plans. One tracks the total time people spend on YouTube, including comments they post and read (not just the clips they watch). The other is a measurement called “quality watch time,” a squishier statistic with a noble goal: To spot content that achieves something more constructive than just keeping users glued to their phones.

The changes are supposed to reward videos that are more palatable to advertisers and the broader public, and help YouTube ward off criticism that its service is addictive and socially corrosive. Creating the right metric for success could help marginalize videos that are inappropriate, or popular among small but active communities with extreme views. It could also help YouTube make up for previous failures in curbing the spread of toxic content.

YouTube, like other parts of Alphabet’s Google, uses these corporate metrics as goal posts for most business and technical decisions—how it pays staff and creates critical software like its recommendation system. But the company has yet to settle on how the “quality watch time” metric works, or communicate how the new measure will impact millions of “creators” who upload videos to the site.

Starting in 2012, YouTube rebuilt its service and business model around “watch time,’’ a measure of how much time users spent viewing footage. A spokeswoman said the change was made to reduce deceptive “clickbait” clips. Critics inside and outside the company said the focus on “watch time” rewarded outlandish and offensive videos.

YouTube declined to comment on the new metrics, but a spokeswoman said that “there are many metrics that we use to measure success.” The company also did not share whether it has abandoned “watch time.” But its leaders have said repeatedly that they are addressing its content problem. They have stressed that they want to do more than punish people who upload or spread nasty videos. Executives there recently began talking about rewarding content based on a rubric for responsibility. The company “saw how the bad actions of a few individuals can negatively impact the entire creator ecosystem, and that’s why we put even more focus on responsible growth,” Susan Wojcicki, YouTube’s chief executive officer, wrote in a February blog post.

To date, most of the efforts YouTube has cited publicly are about its recommendation engine, which promotes videos based on their content and viewer behavior. A spokeswoman said YouTube changed the algorithm for that system in 2016 to “focus on satisfaction.” They gauge this with “several factors,” including viewer surveys, how often people share clips and the “like” and “dislike” buttons on videos.

The two new metrics—tracking total time on site and “quality watch time”—influence a lot more than just YouTube recommendations, according to the people familiar with the plans, who asked not to be identified because the matter was private. The measurements also help dictate how YouTube surfaces videos in search results, runs ads, and pays the creators who make videos.

Deciding what is a “quality” video, or which clips are “responsible,” is difficult even for humans. YouTube is trying to pull off this feat with a combination of software and employees, making the task even harder. It’s a risky change. The video service generates most of its revenue through advertising and the business works best when as many people as possible are spending as much time as possible on YouTube. Hence executives’ obsession with engagement stats. Adding murkier metrics to the mix could crimp ad revenue growth.

Crowd-sourcing the identification of “responsible” videos is particularly tricky. Some popular YouTube stars, such as Logan Paul, upload clips that advertisers and critics see as troubling. But Paul’s fans spend millions of hours watching them. Loyal viewers typically click “like” and give glowing survey responses, particularly for stars. “At a certain point, people are preaching to the choir,” said Becca Lewis, a researcher at Stanford University who studies YouTube.

The YouTube channel Red Ice TV broadcasts political videos with a “pro-European perspective,” which critics label as promoting white supremacy. According to Lewis, there are more than 30 times more “likes” than “dislikes” on its five most recent videos. “If the ‘responsibility’ score is based purely on viewer feedback metrics, it is making an assumption that extremist content will be received negatively by its audience, which is very far from the reality,” she said.

Sometimes the opposite effect happens where viewers will pile on “dislikes” and submit negative survey responses in coordinated sabotage efforts. YouTube felt this pain acutely after the release of its own year-in-review video from December. Legions of viewers hit dislike on YouTube’s originally produced clip to register frustration with the company’s policies for creators.

YouTube declined to share details on how it uses metrics to rank and recommend videos. In a January blog post, the company said it was hiring human reviewers who would train its software based on guidelines that Google’s search business has used for years.

Changes to YouTube’s internal metrics also have long-lasting impacts for creators. Some channels lost millions of dollars in ad sales after YouTube stripped ads from videos it deemed questionable, a response to advertiser boycotts that began in March 2017. The producers of these channels complained that YouTube was too opaque about the changes, and punished videos that had no problematic content.

And the software YouTube uses to analyze these new metrics may not be good enough—despite Google’s prowess in artificial intelligence techniques such as computer vision and natural language processing. AI systems have not progressed enough to identify the intent of a video based on footage alone, said Reza Zadeh of Matroid, a company that sells software for analyzing video.

Current video analysis software can find every video that shows or discusses the moon landing. It struggles with immediately deciphering if the video is questioning the landing or spouting other untruths, according to Zadeh, who worked on Google’s automated translation service as a research intern more than a decade ago.

“In general, we’re very good at detecting nouns using computer vision,” said Zadeh. “But we suck at verbs and adjectives.”

A YouTube spokeswoman said the company now relies on people to review these nuances, but declined to say how many workers are dedicated to the task.