How A.I. may make sense of 50,000 coronavirus research papers

April 28, 2020, 3:41 PM UTC

This is the web version of Eye on A.I., Fortune’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.

A little over a month ago, the White House Office of Science and Technology Policy enlisted several big research groups and companies to collaborate on a major A.I. project intended to help experts unravel the mysteries of COVID-19.

The project’s goal was to place thousands of medical papers related to the coronavirus into a single dataset, dubbed CORD-19. By making the collection of scholarly articles available for free, the hope is that A.I. researchers will be able to develop natural language processing techniques that can rapidly scan studies and retrieve valuable information that may be obscured.

As of last week, the CORD-19 dataset had ballooned to over 50,000 medical papers and has been downloaded over 75,000 times, the Allen Institute for AI (AI2) said in an updated paper. That A.I. research group, founded by late Microsoft co-founder Paul Allen, is among the firms working on the project. 

Kyle Lo, an AI2 applied research scientist, told Fortune that one of the challenges was to consolidate tens of thousands of academic papers into something readable that neural networks—software used for deep learning—can understand. Each part of the document, from the chart captions to the annotations, must be preserved for the A.I. technologies reviewing them to work well. While this may seem trivial, anyone who has ever tried copy-and-pasting text from a PDF file into another document can likely tell you how that process introduces errors.

Rens van de Schoot, a professor specializing in statistics at Utrecht University in the Netherlands, explained that his research team is developing machine learning-powered search tools to help scientists retrieve information finely tuned to their specific inquiries. For instance, doctors who are intubating coronavirus patients (i.e., sticking breathing tubes into their throats) may have questions about whether the patients should lie on their backs or stomachs, which is known as prone positioning. Presumably, the search technology would retrieve the most relevant coronavirus-related research papers detailing prone positioning for doctors. 

Currently, van de Schoot said that some European healthcare groups are testing his research team’s coronavirus search tools alongside their more old-school manual searching techniques. These healthcare professionals are still wary about having an A.I. system sift studies for them, partly due to the difficultly researchers have in explaining how exactly their A.I. technologies work.

Although it’s unclear whether the CORD-19 A.I. project will result in any immediate coronavirus breakthroughs, Lo said he hopes that at minimum, the project will lead to more A.I. researchers developing machine-learning tools for rapidly scanning medical literature. He’s also wishing that the CORD-19 project leads to more medical papers being released for free, an idea referred to as “open science.” 

And if the project doesn’t lead to dramatic results for the current coronavirus pandemic, at least “we have the infrastructure for the next big event,” van de Schoot said. The same technology used to analyze the coronavirus papers could perhaps be applied to economic papers probing financial catastrophes. 

“It’s not only the coronavirus crisis,” van de Schoot said. “The next crisis will be the recession.”


If you have a few minutes to spare, we’d really appreciate if you answer a couple questions for us related to our newsletter. I promise you, it won’t take long!


And to be clear: Any data we collect will be used for research purposes only. We will not be training any A.I. systems with your information!

Jonathan Vanian 


Spies on A.I. To deal with potential A.I.-powered cyberattacks from adversaries, government spies and intelligence officials in the United Kingdom will need to harness their own A.I. technologies, the BBC reported citing research from the Royal United Services Institute think tank. The report cited computer-generated deepfakes as a potential way U.K. adversaries could “manipulate public opinion and elections.” “It might also be used to mutate malware for cyber-attacks, making it harder for normal systems to detect - or even to repurpose and control drones to carry out attacks,” the report said.

Deepfake Anchorman. A State Farm commercial featured a computer-generated version of ESPN SportsCenter anchor Kenny Mayne, The New York Times reported in a story about the advertisement industry’s interest in deepfake technologies amid the coronavirus pandemic. From the story: “We’re so restricted in how we can generate content,” said Kerry Hill, the production director for the ad agency FCB in North America. “Anything that can be computer generated is something we’re going to explore.”

Come get some patents. IBM said it would let coronavirus researchers access  thousands of patents and patent applications for free with the promise that it would "not assert IBM patents against entities using them in the fight against coronaviruses." The move is part of the company's participation in the Open COVID Pledge, in which firms make their intellectual property free so that people can use the technologies to aid coronavirus research and treatments. "IBM’s pledge will last for the life of our more than 80,000 patents and patent applications, and any new patent applications filed through the end of 2023 will likewise be covered by this commitment." the company said

Cutting some A.I. ties. Massachusetts Institute of Technology has cut ties with Chinese A.I. startup iFlytek, which, as Wired reported, comes amid allegations that the startup is “supplying technology for surveilling Muslims in the northwestern province of Xinjiang.” MIT didn’t say why it is no longer collaborating with iFlytek, while an executive from the startup said the decision was disappointing, the report said.  

Robots to the potential rescue. The Wall Street Journal examined how startups developing delivery robots or drones are faring amid the coronavirus pandemic. The report explains that some businesses and governments have fast-tracked their initial beta testing of delivery robots. But it’s not a financial victory for these startups yet. From the article: Yet what should be a windfall for startups may have arrived too early—before they are able to ramp up manufacturing of their delivery robots, and ahead of approvals by national and regional governments that determine where and how robots can be deployed.


Netflix hired Elizabeth Stone to be the video-streaming service’s vice president of data science and engineering. Stone was previously the vice president of science at Lyft.

Apple’s A.I. research chief Ruslan “Russ” Salakhutdinov left the iPhone-maker to rejoin academia full-time, tech news publication The Information reported. Salakhutdinov is a computer science professor at Carnegie Mellon University. He became Apple’s director of A.I. research in 2016.

OpenAI hired Kenneth Stanley to lead the research firm’s new group focusing on “achieving open-endedness in machine learning,” Stanley said via Twitter. Stanley is a computer science professor at the University of Central Florida and was a senior research manager at Uber.

Primer, a startup specializing in machine learning, has picked Sue Gordon to be a strategic advisor to its board. Gordon spent nearly 40 years at the Central Intelligence Agency and was the fifth principal deputy director of national intelligence. Brett McGurk, who served in senior positions in the George W. Bush and Barack Obama presidential administrations, has also become an independent board director for the startup.

Pactera EDGE hired Rajeev Sharma to be the enterprise software company’s global head of A.I. solutioning and delivery. Sharma was a former CEO of the software firm nova IQ and the senior vice president and chief solutions officer of Ness Digital Engineering.  


About all that coronavirus misinformation floating around on Twitter. Researchers at the University of Southern California published a preprint about their work using machine learning to track bogus coronavirus-related information on Twitter. The researchers were able to discover how “a false claim circulated about Nevada Governor’s Chief Medical Officer banning the use of Hydroxychloroquine treatments” originated in the U.S., but spread “to other countries within minutes.”

From the paper:

Misinformation during pandemics can impact public health, intervention policies and future elections. Therefore, we maintain separate categories of unreliable, conspiracy, clickbait, political/biased, and provide the list of source tweets identified under each category on the dashboard. In the future, we will update our analysis to include detection based on source and social context information extracted from the network.

The coronavirus’s impact on mental health. Researchers from Yale University and the Barcelona Supercomputing Center in Spain published a non-peer reviewed paper about their work classifying coronavirus-related tweets “into the following emotions: anger, anticipation, disgust, fear, joy, sadness, surprise and trust.” The researchers wrote that tracking “the emotion trend” via Twitter hashtags and topics is important because “it potentially may show the public attitude change within a period of time.”

From the paper:

In the 1 million data, most tweets are classified into negative classes like fear, anger and sadness. But when people are talking about masks, more tweets are classified into anticipation and trust, which is sometimes more neutral and positive.


A.I. will be crucial to companies outside of Silicon Valley—and they need a new playbook for it—By Andrew Ng

There may be 22 hidden coronavirus hotspots in states poised to lift restrictions—By Jeremy Kahn

Is A.I. better at diagnosing illnesses than doctors? Don’t believe all the hype—By Jonathan Vanian

Digital art banks on breakout moment with blockchain’s help—By Jeff John Roberts


So you want to buy some “A.I.”? Dr Hugh Harvey, the managing director at Hardian Health in the U.K., has posted a fascinating article detailing the kinds of questions healthcare organizations should ask when dealing with A.I. vendors. As Harvey notes, some companies that claim to specialize in A.I. embellish the capabilities of their technologies, which makes it difficult for healthcare companies to understand what exactly they are buying. He believes healthcare firms need to ask vendors how they intend for companies to use their A.I. software, what evidence is available that supports their claims, and what the companies’ “post-market plans” are.

As Harvey explains, “As a buyer of clinical AI software, you aren’t just buying a software product, installing it, and letting it run. You are entering into a partnership with the vendor, one that covers the lifetime of the software.” The advice is relatable enough for any industry.

More from Harvey’s post:

The most important thing to remember is that adding AI into the mix does not necessarily make anything easier in terms of procurement, and that deep due diligence into the value proposition and robust quality assurance should remain fundamental, just as they would for any purchase decision for any other medical product.


Note: Last week's newsletter was updated to correct the relationship between Pactera Edge and Pactera Technology International.

Read More

CEO DailyCFO DailyBroadsheetData SheetTerm Sheet