Lessons from A.I.'s rare coronavirus pandemic success

To those who believe A.I. is among the most transformative technologies of the last century, millennium, or even epoch, its role in tackling the pandemic has been a disappointment. Where it matters most—developing diagnostics, vaccines or treatments, or even managing vaccine distribution—A.I.’s impact has been marginal.

To A.I. skeptics, this is further evidence that the technology is overhyped: A.I. hasn’t made a big difference because it simply isn’t as transformative as boosters claim. The techno-optimists, meanwhile, tend to say A.I. is simply too immature to have made a big difference. A.I. will be the hero of the next pandemic, these A.I.-optimists say.

I don’t know which camp is correct. But it’s worth highlighting where A.I. has played a key role, especially when those cases have broader lessons for business. Last week, scientific journal Nature examined one such case

This was particularly important because the government had only a limited testing capacity and needed to use it efficiently. Eric Topol, the cardiologist who is both a big believer in the positive impact A.I. could have in medicine and a leading critic of much of today’s existing medical A.I., said on Twitter that the Greek system “may represent the most important successful application of AI in the pandemic (only a few are on the list).”

Called Eva, the system was deployed at 40 airports, ports and land border crossings from August to November 2020. Arriving passengers were grouped into categories according to the country they were arriving from, the region of that country where they had been, their sex and their age. Then, based on the previous prevalence of positive coronavirus tests for that category of passenger, the system decided whether a test should be administered, seeking to achieve two objectives:

•Maximize the number of infected asymptomatic travelers identified.
•Allocate enough tests to traveler categories for which the system lacked a high confidence in its COVID-prevalence data in order to hone those estimates.

Reinforcement learning, where the software trains from its own experience, was used to improve Eva’s performance over time.

The researchers compared Eva’s results to two counterfactual scenarios: one in which passengers were simply tested at random and one in which testing was based on country-level epidemiological metrics (such as a country’s case rate, death rate, or test positivity rate). It found that during peak tourist season, random testing would have only caught about 54% of the cases Eva managed to find. To equal Eva’s performance with random sampling, Greece would have needed to have increased its testing capacity by 85% at each border crossing. Compared to epidemiological metrics, the researchers found Eva identified 25% to 45% more infections during peak tourist season, while using equivalent data and financial resources.

Greece’s experience with Eva holds obvious lessons for other countries trying to implement a testing regime. But the same methodology could also be used for other risk-based assessments where there is considerable doubt about how well the current, crude risk-modeling criteria works (think about checking passengers’ luggage for customs violations, for instance, or doing quality-control screening on products coming from different suppliers.)

The Eva developers also wrote in Nature that they believed there were lessons from their work that could apply to anyone trying to implement an A.I.-based system:

• Data minimization. Eva’s builders met with lawyers, epidemiologists, and policymakers before designing the algorithm to determine the kinds of data they could legally and ethically collect. They tried to design the algorithm using only information they believed would be predictive based on available research at the time (country of origin, age, and gender) while omitting what may have been informative but which they though would be too invasive (such as occupation).

• Prioritize interpretability. Eva developers note that creating a system where the rationale for decisions is easy for users to understand is essential for building trust. In the case of Eva, the designers wanted government officials running the testing program to understand why tests had to be given to categories of people with only moderate, but highly uncertain, prevalence estimates. To do so, they used an algorithm that delivered confidence ranges for its prevalence estimates. For example, policymakers could see on a dashboard how the bands narrowed as additional tests were administered, providing a relatively intuitive way for non-statisticians to grasp the information.

• Design for flexibility. Eva was designed in a modular way, with different components for categorizing passengers, estimating the prevalence for each passenger type, and the test allocation decision. This let a single module be updated without changing the rest of the system and allowed for easy tweaks to the algorithm.

Let’s hope we don’t have to wait until the next pandemic for business to start taking these points to heart.

****
BRAINSTORM A.I. 2021: For more thoughtful discussion about A.I.’s massive impact on business, make sure you attend Fortune’s upcoming Brainstorm A.I. conference, the definitive gathering for all things artificial intelligence. The conference will be in Boston on Nov. 8 and 9, with a slate of speakers that will include Siemens USA CEO Barbara Humpton, PepsiCo chief strategy and transformation officer Athina Kanioura, and Alexa AI Amazon’s head scientist, Rohit Prasad. Apply to attend here.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

A.I. IN THE NEWS

U.K. unveils national A.I. strategy. The British government announced a 10-year plan that it said could position the country to be an "a global A.I. superpower." But, as TechCrunch notes, there's no new money to back up that ambitious goal and instead there's a lot of vague language about "investing in the long-term needs of the ecosystem." It includes a new AI program aimed at increasing coordination among the country's machine-learning researchers and boosting A.I. adoption, and another aimed at helping spread A.I. capabilities beyond just London and South East England.

British court reiterates that A.I. cannot hold a patent. Attempts by a global team of A.I. and legal experts to get countries to let A.I. systems be named as inventors on patents was dealt another blow when a British court upheld a decision by the U.K. Intellectual Property Office to deny a patent to a piece of A.I. software called "Daubus." The court ruled that only "natural persons" could legally qualify as inventors under British patent law, The Financial Times reported. The legal project involving Daubus has so far succeeded in Australia, where a court bucked the trend and ruled that an A.I. can be granted a patent, and in South Africa.

Amazon drivers allege faulty A.I.-enabled safety cameras are docking their pay. Vice quoted several Everything Company drivers who blamed A.I.-equipped cameras, produced by the startup Netradyne and installed in Amazon delivery vans earlier this year, for erroneously faulting them for driving dangerously even when their actions were safe. The erroneous faults lowered a weekly driver safety score that Amazon uses to determine whether the drivers receive bonus pay. Many drivers told Vice they suspected that rather than a software bug, the erroneous faults registered by the camera system were actually intentional, a ploy by Amazon to cut costs. An Amazon spokesman told Vice this wasn't the case and cited figures that the Netradyne cameras had significantly improved safety.

FedEx begins self-driving truck trial. The delivery company is working with self-driving software startup Aurora and heavy truck maker PACCAR to test an autonomous Class 8 truck capable of hauling up to 33,000 pounds, according to an Aurora blog post. The trucks, which carry a safety driver, will navigate a 500-mile roundtrip route along interstate I-45 between Dallas and Houston. Aurora, which has partnerships with Uber, Toyota, and truckmaker Volvo AB, in addition to PACCAR, says it will have fleets of fully-autonomous trucks operating on highways without a safety drivers by 2023.

EYE ON A.I. TALENT

Facebook selected Andrew "Boz" Bosworth to be its new chief technology officer, taking over from outgoing CTO Mike Schroepfer in 2022. Bosworth, a 15-year veteran of the company, has most recently lead Facebook's virtual reality and augmented reality efforts.

Fashion brand Burberry has appointed CP Duggal to be its chief digital and analytics officer, fashion industry publication Fashion United reported. He had been executive vice president, enterprise digital and analytics at American Express.

Hong Kong-based biotech startup Insilico Medicine, which uses A.I. for drug discovery, has hired Nirav Jhaveri as its chief financial officer, the company announced. Jhaveri had been CFO at Journey Medical, a biopharmaceutical company headquartered in Scottsdale, Az.

BrainboxAI, a Montreal-based company that uses predictive analytics and A.I. to help control the heating and air conditioning systems in commercial buildings, has named Francis Trudeau its chief financial officer, the company said. He had most recently been CFO at Logibec, a Montreal IT services company focused on the healthcare sector.

EYE ON A.I. RESEARCH

Beware: Deepfake voices can fool voice-recognition software. That's what University of Chicago researchers have found after testing several voice-synthesis A.I. systems against both open-source and commercially-available voice recognition A.I., including those available as part of Microsoft's Azure cloud offerings, Amazon Alexa, and WeChat's voice-based login system. The best system, which involved several different components and started out with text of what the artificial voice would say, could fool an open-source voice-recognition algorithm called Resemblyzer, 100% of the time when it was trained on a particular dataset. It could fool both WeChat's voice recognition system and Alexa more than 60% of the time. It performed less well on Azure, tricking it about 29% of the time. Human evaluators were able to determine the voice was fake just about half the time. The potential here for fraudsters and cybercriminals is scary. You can read more in this paper on the non-peer reviewed research repository arxiv.org here.

FORTUNE ON A.I.

UiPath CEO: Automation will not replace knowledge workers—by Fortune Editors

Air taxis are coming sooner than you think, aerospace giant Airbus says—by Christiaan Hetzner

Meet Facebook’s new tech chief Andrew ‘Boz’ Bosworth—by Jonathan Vanian

Two new quantum computing breakthroughs reveal the technology’s commercial potential—by Jeremy Kahn

BRAIN FOOD

Can book summaries solve "the alignment problem"? "The alignment problem," is what A.I. researchers call the challenge of teaching A.I. systems to understand and conform to human values. Last week, researchers at OpenAI, the San Francisco A.I. research company that is itself closely aligned with Microsoft, announced the results of a project that came at the alignment problem from an unusual direction: book summaries.

OpenAI has been trying teach GPT-3, its language modeling algorithm, to summarize books. But, there is a problem, according to OpenAI researcher Jan Leike: First, asking humans to evaluate book summaries is not "scalable" because it takes them too long to read a book and there are a lot of books out there. What's more, people are unlikely to agree on what constitutes a good book summary. According to Leike, this is precisely why, in his view, book summarization is a good "test bed" for possible solutions to the alignment problem.

Leike and his team came up with a way to at least partially address these issues: first, they "decompose" each book into smaller pieces—short segments of text—each of which is farmed out to a different human evaluator. To help learn what humans think is a good summary, Leike and his team used a combination of supervised learning, where a human writes a summary of the small chunk of text for GPT-3 to use as a model, and reinforcement learning, where GPT-3 develops two possible summaries and the evaluator is asked to choose which is better. This process is repeated to further hone the summary. Finally, a book-level summary is developed by asking GPT-3 to summarize the section summaries.

While this approach yielded better results than previous methods, it is still imperfect. In fact, it only managed to create book-level summaries that human evaluators judged were as good as those written by people 5% of the time.

The research was perhaps most instructive in spotlighting potential challenges to methods Leike proposed. For instance, if the chunks of text are not chosen carefully, it can be difficult for the human evaluators to assess the computer-generated summary. What's more, the process of building up a book-level summary from summaries of other summaries can lead to a weird compounding of errors, such as key plot points that are completely mangled.

Trendingnow

1

2

3

Lessons from A.I.’s rare pandemic success

A.I. IN THE NEWS

EYE ON A.I. TALENT

EYE ON A.I. RESEARCH

FORTUNE ON A.I.

BRAIN FOOD