A golden age of natural language processing (NLP) may be dawning

This is the web version of Eye on A.I., Fortune’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.

Back in January, I wrote a big story for Fortune about the ongoing revolution in natural language processing. These are A.I. systems that can manipulate and, to some degree, “understand” language.

Language processing is now entering a kind of golden age, in which once impossible tasks are increasingly within reach. These new systems are already starting to transform how businesses operate—and they stand poised to do so in a much bigger way in the coming years.

This summer has seen some startling examples of what these methods can accomplish. The most discussed breakthrough has been OpenAI’s GPT-3, which can generate long passages of coherent prose from a human-written prompt of just a line or two. In many cases, what the system generates is indistinguishable from human-written text.

GPT-3 is, for the moment, still something of a party trick—it is difficult to control, for instance, whether what the system generates is factually accurate, or to filter out racist or misogynistic ideas that it might have picked up from its large training set (which included not only the complete works of Shakespeare, but such repositories of human virtue as Reddit). But some companies are starting to build real products around it: One is creating a system that will generate complete emails from just a few bullet points. And a legal technology firm is experimenting with GPT-3 to see if it can aid in litigation discovery and compliance.

Another San Francisco A.I. company, Primer, creates software that helps analyze documents. It counts a number of U.S. intelligence agencies among its customers. Today it unveils a website, Primer Labs, that showcases three NLP systems it built in the past year and allows anyone to upload any text to play around with the tech.

I had interviewed John Bohannon, Primer’s Director of Science, back in December for that feature about NLP. Last week, I caught up with him again by Zoom. Bohannon told me things have only accelerated since we first talked.

He describes what is happening in NLP as “an industrial revolution,” where it is now becoming possible to string together multiple NLP tools—much the same way a mechanical engineer might combine boilers, flywheels, conveyor belts and presses—to create systems that can do real work in real businesses. And building these systems is getting easier and easier. “What used to take months,” he says, “now takes a week.”

Bohannon gave me early access to Primer Labs to let me experiment on texts of my own choosing.

The first tool: question-answering.

Upload any document and you can then ask questions in natural language to prompt the system to find an answer in the text. The system also suggests questions that you might want to ask.

The software was fantastic at answering a series of questions about a simple news story on Joe Biden’s selection of Kamala Harris as his veep pick.

However, when I uploaded a 2012 Securities and Exchange Commission filing from the pharmaceutical giant Merck that runs to 159 pages and about 100,000 words, its performance was hit-and-miss. When I asked it what Merck’s sales were in 2011, it returned the correct answer: $48 billion. But when I asked it what the company’s operating profit was, I received a message that the software “was having trouble answering that particular question.” And when I asked it what the company’s revenue recognition policies were, I received the inaccurate but hilarious reply that “non-GAAP EPS is the company’s revenue recognition policies.”

The next Primer tool: “named entity recognition.”

This is the task of identifying all the proper names in a document and figuring out which pronouns in the text refer to which people or which organizations. This task is relatively easy—if time-consuming—for humans, but it’s historically stumped computers. It is a good example of a skill that is now within software’s grasp thanks to the NLP revolution. In benchmark tests Primer has published, its system has outperformed similar software created by Google and Facebook.

I tried to stump Primer’s software by giving it a passage about the 19th-century French authors George Sand and Victor Hugo. I was hoping that the fact Sand is the male nom de plume of a female writer (her real name was Amantine Lucile Aurore Dupin) would confuse the system when it had to decide whether the pronoun “he” belonged to Sand or Hugo. But, to my surprise, the system performed flawlessly, understanding that every “he” in the passage referred to Hugo while “she” referred to Sands.

The final and perhaps most difficult task Primer Labs’ tools perform: summarization.

Accurately summarizing long documents is difficult for humans too. And gauging how useful a summary is can be highly subjective. But Primer came up with a clever way to automatically judge summary quality based on BERT, a very large language model that Google created and has made freely available. BERT is what is known as a “masked language model,” because its training consists of learning how to correctly guess what a hidden word in a text is. Primer’s BLANC judges summaries by assessing how much better BERT performs in this fill-in-the-blank game after having accessed the summary. The better BERT does, the better the summary. Thanks to BLANC, Primer was able to train a summarization tool that can generate pretty fluent summaries.

I fed Primer’s summarization tool a feature story I wrote for Fortune’s August/September double-issue about how AstraZeneca has managed to leap ahead of its Big Pharma rivals in the quest for a COVID-19 vaccine. I was impressed at how well the software did in abstracting the lengthy article. It captured key points about AstraZeneca’s corporate turnaround as well as the importance of a COVID-19 vaccine.

But the system is still far from perfect. Another part of the tool tries to reduce the text to just a handful of key bullet points instead of whole paragraphs. Here the results were strangely off-base: The software fixated on factual information from an anecdote at the beginning of the article that was not essential, and yet missed crucial points contained further down in the body of the piece.

For a laugh, I fed the system T.S. Eliot’s “The Love Song of J. Alfred Prufrock.” Bohannon had warned me that the software would struggle to summarize more creative writing, particularly poetry, and the results were not pretty. Other than the fact that “the women come and go, speaking of Michelangelo,” the system wasn’t really sure what was happening. A lot of high school students could probably sympathize. But no English teacher would give Primer’s results high marks. (Interestingly, GPT-3 isn’t half bad at writing poetry. But that doesn’t mean it has any real understanding of what it’s writing.)

Then again, poetry is probably not the most pressing business case for Primer’s products. Summarization is a huge potential market. In 1995, the average daily reading requirement of a U.S. intelligence analyst assigned to follow the events in one country was just 20,000 words (or about the equivalent of two New Yorker longreads). By 2016, the same analyst’s daily reading load was estimated at 200,000 words—more than the most capable speed reader could possibly skim in 24 hours. This phenomenon is affecting analysts in finance and law too, and is a huge issue for people in the sciences trying to keep up with the explosion in published research. (In fact, to help out during the pandemic, Primer has created a site that summarizes each day’s new research papers on COVID-19.)

So the NLP revolution has arrived not a moment too soon. Automated tools that help condense and summarize and extract information from written text are becoming more and more essential. Today’s NLP isn’t perfect—but it is getting good enough to make a difference.

And with that, here’s the rest of this week’s A.I. news.

Jeremy Kahn
@jeremyakahn
jeremy.kahn@fortune.com

This story has been updated to correct the year in which U.S. intelligence analysts’ average daily reading load was 20,000 words. It was 1995 not 1956.

A.I. IN THE NEWS

Police use of facial recognition is illegal, British court says. An appellate level court in the U.K. has said that police use of facial recognition technology violates privacy, data protection and equality laws in a stunning blow to further deployment of the technology in the country. British police have been ramping up their use of facial recognition systems, despite civil rights complaints. But in a landmark case brought by civil liberties campaigner Ed Bridges, who said Welsh police captured and analyzed his image without permission, the court ruled the way police were using the technology violated multiple laws. The Welsh police said they would not appeal the ruling. My Fortune colleague David Meyer has more on the case and its implications here.

White House unveils investment plan for A.I. and quantum tech. The Trump Administration announced its budget proposal for non-defense artificial intelligence and quantum information technology investment. The administration proposes upping the total spend by about 30% to $2.2 billion in fiscal year 2021, including $1.5 billion for A.I. and $699 million for quantum information science, according to a story in The Wall Street Journal. The funding includes money to set up new A.I. research centers under the auspices of the National Science Foundation, Department of Agriculture and other agencies. While a hefty increase, the amounts pale beside the multi-billion dollar figures the Chinese government is pouring into A.I. each year. The proposed spending, announced Friday, must now be passed by Congress, which can modify the amounts.

"A Level" fiasco in the U.K. is another example of biased algorithms. For the past week, the U.K. has been consumed with news about the British government's mishandling of this year's award of student grades. This year, due to COVID-19, the country's A Level exams were cancelled. Instead, the government used a complex algorithm to assign A Level grades to students, based in part on grades given by their teachers, but also, critically, taking into account the past A Level performance of other students from that same school. Britain has a highly unequal education system—any algorithm that took into account historical school performance was likely to lower the results for students from disadvantaged backgrounds while boosting the grades of students from elite private schools. Which is exactly what happened. After a week of public outcry, the British government was forced to reverse course on Monday and restore all the grades that the algorithm had downgraded. You can read more about the fracas in this New York Times story. And, as ever, David Meyer has a good take on what the story tells us about our algorithmically mediated future in Fortune's CEO Daily newsletter.

Clearview plans to defend face-scraping as free speech. Controversial facial recognition startup Clearview has hired one of America's most prominent First Amendment lawyers, Floyd Abrams, to represent it against a bevy of lawsuits alleging it violated people's privacy rights or various state laws governing the harvesting and processing of biometric data, The New York Times reported. The 84-year-old Abrams has argued 14 cases before the U.S. Supreme Court, including successfully representing The New York Times in the 1971 Pentagon Papers case, which established a newspaper's right to publish classified information. In reference to Clearview, Abrams told The Times that while privacy rights were important, "where there is a direct clash between privacy claims and well-established First Amendment norms, what would otherwise be appropriate manners of protecting privacy have to give way before the constitutional limitations imposed by the First Amendment.”

British government plans automated drones to surveil its coasts. The British government has awarded a $1.3 million contract to the U.K. subsidiary of Israel's Elbit Systems to develop unmanned drones that can keep watch on the nation's coastlines, according to Wired. The new push comes as a growing number of illegal migrants and asylum seekers have sought to enter the U.K. by crossing the English Channel on small boats from France, prompting a major effort to curtail the traffic from the government. The drones would also help in search and rescue and other surveillance operations. A number of drones Elbit makes can operate fully autonomously, and the company also sells computer vision software that can automatically detect small vessels and other objects in the water from the drone's video feeds.

EYE ON A.I. TALENT

London-based global law firm Linklaters has appointed Shilpa Bhandarkar as chief executive officer of its "Nakhoda" legal artificial intelligence platform. Bhandarkar had previously helped develop the technology as Linklater's Global Head of Innovation at the firm. Greg Jackson, former head of strategy for the firm, has been named to a new role as Global Head of Strategy, Innovation and Efficiency.

San Francisco-based robotics and A.I. company Kindred has named Marin Tchakarov its chief executive officer and president, according to a company press release. Tchakarov had previously served as the company's chief operating officer and chief financial officer.

Quantiful.ai, a company based in Auckland, New Zealand that uses A.I. to help companies forecast demand, has appointed Rebecca Kemp as chief product officer for its prediction platform, QU, according to a story in NBR. Kemp was previously a senior product manager at road tolling and services firm ERoad.

EYE ON A.I. RESEARCH

Finding the tax dodgers. Researchers from the University of Mannheim in Germany and the Center for European Economic Research have proposed a way to automatically find companies that may be using aggressive tax avoidance strategies.

In a paper published on the research repository arxiv.org, they propose creating a "knowledge graph" that maps all of a company's known corporate relationships, plus information such as its headquarters address and the names of its directors, into a statistical pattern. Doing this for about 1.5 million companies, the researchers show how they are able to search this graph for certain patterns that closely match firms known to have engaged in certain aggressive tax strategies, such as the so-called "Dutch sandwich" and "Double Irish" that Google used for years to reduce its international tax bill.

The academics note that in the future, it might be possible to train a machine learning system to automatically detect companies that are likely engaged in aggressive tax strategies, as well as find outliers—unusual patterns in the knowledge graph—that could warrant further investigation by tax justice researchers.

FORTUNE ON A.I.

Semiconductors are a weapon in the U.S.-China trade war. Can this chipmaker serve both sides?—by Eamon Barrett

Facebook and NYU researchers discover a way to speed up MRI scans—by Jeremy Kahn

Hateful posts on Facebook and Instagram soar—by Danielle Abril

Why this military museum is using a chatbot during COVID-19—by Jonathan Vanian

U.K. facial-recognition pilots broke privacy, data protection, and equality laws, court rules—by David Meyer

Johnson & Johnson just invested in a telemedicine startup. Here’s why—by Jonathan Vanian

Kamala Harris could be the best thing that ever happened to Big Tech—by Jeff John Roberts

BRAIN FOOD

We've sometimes written here about deploying A.I. in creative areas, such as composing music or, like GPT-3, writing original prose or poetry (of a kind). And researchers have talked about A.I. having "imagination" in the case of deep learning systems that have to project possible actions into the future and choose among the scenarios it deems most likely to maximize a certain reward.

In a paper published this week, Philippe Esling, a professor of mathematics and computer science at the Sorbonne, in Paris, and Ninon Davis, a graduate student, delve into the evolving definitions of creativity and the history of using software towards creative ends. They argue that using A.I. systems to create novel content on their own is problematic: The systems are always limited by the data they've been trained on and whatever optimization function guides that training. They lack any sense of artistic intent and so can't really evaluate the quality of anything they generate. What's worse, many approaches toward A.I. creativity will actually lead, over time, to the production of increasingly conformist works—the exact opposite of what creative processes should be trying to do.

While that assessment is depressing, Esling and Davis say A.I. still has great potential as tools for enhancing human creativity, noting recent advances in using music-generating A.I. systems to co-create compositions alongside human musicians.

Sign up for the Fortune Features email list so you don’t miss our biggest features, exclusive interviews, and investigations.