CEO DailyCFO DailyBroadsheetData SheetTerm Sheet

The quest to measure A.I.

August 3, 2021, 8:06 PM UTC

Artificial intelligence is so new that researchers are just now figuring out how companies can best evaluate it and the technologies like computer chips that A.I. relies on.

Semiconductor vendors, for instance, may claim that their computer chips are better than others for powering data training, the important process that “teaches” machine-learning systems to recognize objects in photos. But it’s difficult for companies to assess whether that’s true without independent auditors.

Several efforts are underway, however, to help companies and researchers evaluate A.I.’s performance across different tasks, like data training. One such endeavor is known as MLPerf, a set of software tools and computing methods that help monitor A.I.’s progress using benchmark tests. 

The most recent MLPerf benchmarking test analyzed how different A.I. chips from companies like Nvidia and Google performed at tasks including training a machine learning model to recognize images in photos. The results are highly technical, but they should help with deciding which A.I. chips are best for data training. 

Providing more transparency into how different A.I. chips perform at specific tasks is part of the overall goal of the non-profit consortium MLCommons, which oversees MLPerf. The organization was founded in 2018 by corporate and academic players including Google, Intel, AMD, and Harvard University.

David Kanter, executive director at MLCommons, told Fortune that “benchmarks and metrics are about really defining what ‘better’ means.” Because A.I. is so new, there’s no agreed-upon standard to measure the technology like more conventional ones. Kanter hopes that MLCommons can act as a sort-of Switzerland for the A.I. industry.

Because A.I. depends on data to function correctly, MLCommons is also expanding its mission to create datasets for testing A.I. software. One project the group is working on involves the curation of over 87,000 hours of transcribed speech across several different languages, which Kanter hopes will help researchers create more advanced systems for understanding more languages than just English, the dominant language in the A.I. realm.

“Let’s take a pretty popular language like Portuguese,” Kanter said. “There’s like 300 million people who speak that language, and there’s not much out there,” he said.

Kanter hopes that MLCommons’s upcoming language dataset, which is to be publicly released later this year, becomes as popular as the ImageNet dataset, which contained 14 million photos that humans annotated with descriptions. The ImageNet dataset, overseen by A.I. luminaries like Stanford University’s Fei Fei Li, helped spur the modern-day deep learning renaissance, in which researchers were able to create A.I. systems that can spot dogs in photos, among other tasks. 

If others use the MLCommons-curated language dataset to power their own A.I. tech, MLCommons will be better able to evaluate how those A.I. systems perform, explained MLCommons president Peter Mattson. One problem in evaluating current A.I. systems trained to recognize language is that it’s unclear what data those A.I. systems were trained with. 

It’s important that the “data you use to build the system, and the data used to evaluate the system are drawn from the same source,” Mattson said. 

Jonathan Vanian 


Google’s new Pixel 6 smartphones comes loaded with A.I. Google previewed its upcoming Pixel 6 smartphones, which the company plans to release in fall. The new smartphones contain a custom computer circuit called “system on a chip” that’s tailored for machine learning tasks like more quickly translating languages and improving the quality of photos taken at night. Google calls the circuit the Tensor SoC, in reference to the company’s Tensor Processing Unit A.I. chips the company uses in its data centers for A.I. training.

An A.I. twist to bug bounty programs. Twitter is inviting people to spot problems in its algorithm that’s used to automatically crop images in tweets as part of an “algorithmic bias bounty competition.” The goal is for outside researchers to discover bias problems in Twitter’s tools that the company may overlook. Recently, for instance, Twitters users complained that Twitter’s image-cropping tool chose to highlight white people more often in photos instead of people of color. Twitter said it would pay people to discover the bias problems, similar to how companies pay altruistic hackers to spot security problems in their software.

Consolidation hits A.I. startups. DataRobot, a startup that specializes in tools that help companies build machine-learning software, has raised $300 million and has acquired the A.I. software tool startup Algorithmia, reported tech news website GeekWire. Both DataRobot and Algorithmia focus on selling similar A.I. tools to companies, the report noted. DataRobot has been gobbling up other likeminded startups like Nexosis and Nutonian to help it compete against larger tech vendors that sell machine learning tools like Amazon, Microsoft, and Google.

Teaching teachers how to use A.I. Several companies are increasingly pitching A.I. tools to teachers to help educators create more personalized learning plans, USA Today reported. One such tool, Thinkster Math, is an online tutoring tool that presumably uses machine learning analyze a student's coursework to determine which area the student needs to focus on. From the article: As students complete assignments inside the system, AI automatically recognizes knowledge gaps and retrieves content to address them. Students who understand the material can breeze through and move on, while those who don’t will receive extra, targeted instruction. All the while, the system feeds data to instructors to inform subsequent 


Okta hired Sagnik Nandy as the business software company’s president of technology and chief technology officer. Nandy was previously a vice president of engineering at Google, overseeing several technology initiatives related to the company’s core online ad business.

The Patrick J. McGovern Foundation picked Rebecca Distler as its strategist for A.I., data, and digital health. Distler was previously the director of global health initiatives for the startup Element.

JFrog named Sagi Dudai to be the software developer tool company’s executive vice president of product and engineering. Dudai was previously the CTO and general manager of the communications technology firm Vonage.


A.I. takes games to the next level. Deepmind, the A.I. subsidiary of Google parent Alphabet, published a non-peer reviewed paper that has captured the attention of A.I. researchers who are interested in creating A.I. systems that are more capable than current technologies. The researchers described how they used reinforcement learning to train so-called software agents that are capable of learning multiple tasks in a simulated 3-D world as opposed to one specific task that they were trained to excel in. It’s a big deal because it shows that A.I. systems are capable of generalizing, taking information they learned from one task to inform their decisions on other tasks, without humans specifically programming them to do so.

From the paperIn this work, we introduced an open-ended 3D simulated environment space for training and evaluating artificial agents. We showed that this environment space, XLand, spans a vast, diverse, and smooth task space, being composed of procedurally generated worlds and multiplayer games. We looked to create agents that are generally capable in this environment space – agents which do not catastrophically fail, are competent on many tasks, and exhibit broad ability rather than narrow expertise.


Uber CEO Dara Khosrowshahi on why the ride-hailing giant is betting on flying taxis—By Jeremy Kahn and Katherine Dunn

Don’t buy the ‘big data’ hype, says cofounder of Google Brain—By Nicholas Gordon

How CEOs are tackling the problem of data overload—By Jessica Mathews

How Toyota kept making cars when the chips were down—By Eamon Barrett


A.I. to the skies. After a year of testing A.I. software to help determine the best flight routes for its airplanes, Alaska Airlines is now using machine learning more broadly to help dispatch all of its flights in the continental U.S., reported Fortune’s Jeremy Kahn. The airline giant is using the Flyways-branded software sold by the startup Airspace Intelligence, which incorporates A.I. techniques like reinforcement learning—in which computers learn through trial and error—to discover the most fuel-efficient routes that lead to “lower carbon dioxide emissions” and other benefits, the article said.

Pasha Saleh, Alaska’s director of flight operations, told Fortune that the move is “as game changing for aviation as Google Maps and Waze has been for driving.”

From the article: Unlike a human, the A.I. system—actually a collection of different machine learning modules linked together—can calculate the probable position of every other aircraft in the sky that day and how it will impact congestion along the route. It can better predict how weather systems will form or dissipate, opening new routing possibilities. It can plan a brand new, custom route between waypoints to take advantage of these factors. And it can do all of this in mere seconds. When it sees a custom route that will save fuel compared to the canned route, it will suggest this alternative to the dispatcher.

Although Alaska Airlines seems pleased with using A.I., it’s not replacing human flight dispatchers, who have “a legal duty to make the final call on the flight plan.”

Our mission to make business better is fueled by readers like you. To enjoy unlimited access to our journalism, subscribe today.