What an A.I. ‘Miracle’ says about the future of business technology

April 19, 2022, 8:37 PM UTC

AI21 Labs is a bit like an Israeli rejoinder to U.S.-based OpenAI. It is both a research lab, doing cutting-edge work on natural language processing (NLP), and also a commercial business, hoping to quickly push those state-of-the-art developments into products that real businesses can use—and pay for.

AI21 Labs was founded by Yoav Shoham, an emeritus professor of artificial intelligence at Stanford University; Amnon Shashua, a founder of autonomous driving software company Mobileye, which was acquired by Intel; and Ori Goshen, a founder of crowdfunding platform CrowdX. The company’s lofty goal is “reimagining the way people read and write, for the better.”

The lab has built a new system that it somewhat cheekily calls “Miracle,” a friendlier version of MRKL, an acronym for Modular Reasoning, Knowledge and Language system. MRKL is important because of what it says about four key trends in how businesses will use A.I. going forward.

First, MRKL is designed to handle all kinds of natural language tasks, not just one specific job as most such systems have until recently. For instance, if you wanted a customer service chatbot, the same A.I. could not help analyze the sentiment of CEO earnings calls. But now a single NLP engine can help handle both tasks. It is another example of the genuine revolution in NLP and the impact it is starting to have on business.

The second, and closely related, trend to note is that these general-purpose NLP systems will increasingly be built upon “ultra-large language models,” single algorithms that learn billions of statistical relationships between words. They are trained on vast amounts of text scraped from the internet, including books written in English and other languages, as well as public sources like Wikipedia and Reddit threads. Most of these systems are trained either to predict a missing word in a sentence or the next word in a sentence. But it turns out, when you build an A.I. system that big and train it to do one thing, it’s also able to do a lot of other things with little to no additional training: translation, answering questions, and writing original passages of text.

What’s more, with just a little more training on a relatively small number of examples, these large language models can often outperform smaller A.I. systems that were trained on big data sets—often curated at great expense—to accomplish just one narrow task. It is this ability to perform with “little data” that makes ultra-large language models so potentially attractive to business because using them could be faster and cheaper.

Perhaps the best-known example of an ultra-large language model available for commercial use is OpenAI’s GPT-3. OpenAI has a close relationship with Microsoft, which invested more than $1 billion in the company, and, unsurprisingly, Microsoft has incorporated GPT-3 into a product that automatically writes computer code. It also makes the technology available to its Azure cloud customers.

AI21 Labs has its own ultra-large language model called Jurassic-1 that it released commercially last year and that it claims is superior to GPT-3, partly because it has a larger “token vocabulary.” That refers to the number of words and parts of words it knows. Jurassic has a token vocabulary of more than 250,000, five times GPT-3’s.

There are some well-documented problems with these ultra-large language models, including that they can be prompted to spit out toxic language. But another giant flaw is that they have a tendency to produce inaccurate information in response to factual questions.

For instance, ask GPT-3 to add two plus two, and it will confidently tell you four, but ask it to add several four- and five-digit numbers, and chances are that it will just as confidently spit out the wrong answer. Ask it what the weather is like in New York currently, and it will tell you, but it will likely be the temperature in New York whenever data from AccuWeather was scraped into its training set, not today’s weather. The same problem applies to questions about current events or even science. And because these large language models are so big, they are extremely expensive to train—in the millions of dollars—so it is not practical to constantly update to ensure their data is up-to-the-minute.

This is the problem AI21 Labs set out to solve with MRKL (I wrote about one of the lab’s previous innovations here). Which brings us to the third big trend that MRKL represents: MRKL is a hybrid system. It doesn’t only use deep learning, the A.I. method that is responsible for most of the big leaps forward in the technology over the past decade. Instead it combines different modules, some of which use deep learning, and some of which use an older form of A.I., symbolic reasoning, to provide accurate, up-to-date responses to factual questions.

The clever thing about MRKL is a module called a router that takes a question from a user and figures out what kind of information the user is seeking. If the question involves mathematics, it sends that question to a plain, old-fashioned scientific calculator. If it involves exchange rates, it routes it to a currency converter. If it is about weather, it sends it to a forecasting website. There are 55 of these task-specific modules that MRKL currently supports, according to Shoham. If the router is unsure which module is best, it calls on Jurassic-1. Jurassic also helps compose the contextual language around MRKL’s response.

Another clever innovation here is how AI21 Labs is able to elicit the right kind of response from Jurassic. It does this with a method called “prompt tuning,” in which the way an initial question or fragment of text is fed to the ultra-large language model helps determine the nature of the output. It’s one way to adjust the A.I. for a particular kind of task without having to fine-tune it with additional training data. The problem with additional training is that as the system gets better at one narrow task, it actually gets worse at others. Researchers call this problem “catastrophic forgetting.”

Some A.I. researchers overcome catastrophic forgetting by training the model for a variety of disparate tasks at the same time, but that takes a lot of computer power, time, and money. Prompt tuning avoids this. AI21 Labs’ innovation with MRKL is to create small deep learning modules that can automatically prompt tune Jurassic on the fly, taking a user’s query and composing the best set of prompts to nudge Jurassic into coughing up answers in the correct style and format.

And with that here’s the rest of this week’s news in A.I.   

Jeremy Kahn


An algorithmic disaster. An algorithm used by Dutch tax authorities to create risk profiles intended to identify people who engage in childcare benefits fraud had disastrous results, leading to the country’s privacy regulator levying a multimillion-dollar fine on the agency, Politico reported. Because the tax authorities penalized families that the algorithm suspected were committing fraud, “tens of thousands of families—often with lower incomes or belonging to ethnic minorities—were pushed into poverty because of exorbitant debts to the tax agency,” the report said.

An A.I. battle at sea. The U.S. Navy is attempting to build a fleet of autonomous drone ships that would not require human sailors, the Washington Post reported. The Navy plans to develop 21 A.I.-powered ships over the next couple of years, partly as a response to countries like China developing advanced missile technology that could more easily strike ships near their shores. But as the Post noted, critics are worried that the project could spur other nations to build their own fleet of supercharged military technology.

From the article: “It raises the stakes a lot,” said Peter Asaro, an artificial intelligence expert at the New School in New York. “Small countries like North Korea or the Philippines could just crank out a bunch of little [naval] robots and suddenly have a very strong defense mechanism against a big military or Navy like the U.S.’s.”

We got to do something about the chips. Congress is working on legislation intended to boost the production of and research into computer chips in the U.S. in order to address the worldwide semiconductor shortage, the Associated Press reported. From the article: The two bills also establish regional technology hubs—with the Senate dedicating $10 billion to the program and the House dedicating $7 billion. The Senate bill calls for 20 such hubs, while the House bill authorizes at least 10.

Surveillance comes to South Africa. Various cities in South Africa are becoming “modernized” with security cameras and A.I. technologies like license plate scanners that have alarmed privacy experts, according to a report by the MIT Technology Review. From the article: Here in South Africa, where colonial legacies abound, the unfettered deployment of AI surveillance offers just one case study in how a technology that promised to bring societies into the future is threatening to send them back to the past.


OneDigital picked Marcia Calleja-Matsko to be the financial services and HR consulting firm’s chief information officer. Calleja-Matsko was previously the CIO and vice president at Avanos Medical.

Benefits Data Trust hired Stephen Rockwell to be the nonprofit’s chief digital officer. Rockwell was previously the chief product and technology officer at Charity Navigator.

Providence has promoted Sara Vaezy to be the health care system’s chief digital officer. She replaces Aaron Martin who rejoined Amazon.


A.I. as tongue cancer detector. Researchers from Korean institutions like the Ajou University School of Medicine, Seoul National University Hospital, and the Republic of Korea’s National Cancer Center published a paper in Nature’s Scientific Reports detailing how deep learning can be used as a tool to detect tongue cancer. The researchers trained their A.I. system on a data set of 12,400 endoscopic images from five South Korean university hospitals.

From the paper: In conclusion, we have constructed a quality-validated dataset using oral endoscopy images from several medical institutions. A deep learning model based on the dataset showed acceptable performance for application in tongue cancer diagnosis. Compared with human readers, it showed lower diagnostic performance than oncology specialists and higher diagnostic performance than general physicians. Therefore, the developed algorithm could be used as an assistant tool for general physicians to increase the diagnosis and screening of cancer in clinical settings.


The future of video games depends on A.I. —By Bruno Silva

Why open sourcing social media algorithms is a useless sound bite —By Jacob Carpenter

Ford is ‘betting the company’ on a Tesla-style EV truck that could make or break its future —By Marco Quiroz-Gutierrez

Love in the metaverse: Everything you need to know about dating, sex, and marriage in a virtual world —By Mahnoor Khan


Elon Musk wants to open source what now? If Elon Musk is successful in his $43 billion bid to takeover Twitter, the Tesla chief said he plans to open source the social messaging system’s algorithm in order to provide more transparency to users about how their tweets are managed, among other things. But Musk’s idea appears to be a major simplification of how machine learning systems like Twitter function.

As a Twitter technologist told Fortune in a feature about how Twitter employees are faring, there is no single algorithm that powers the social messaging service. Indeed, Twitter is a “a web of interconnected systems that work together to show tweets to each user.”

“It’s a slippery slope to open sourcing Twitter’s entire data pipeline, which isn’t worth the effort unless you can fully understand the scale of potential upsides and downsides,” the technologist, to whom Fortune granted anonymity, said.

Fortune’s Jacob Carpenter summed it up nicely in a recent essay:

But open sourcing the company’s algorithms reeks of empty verbiage, a sentiment summarized well by Twitter vice president of product Steve Teixeira.

“The ‘open source the algorithm’ thing drops more of a potential tactic than a strategy itself,” Teixeira tweeted early Monday morning. “I suspect it would be a pretty inefficient tactic as well, given the scope of moving parts and pace of change.”


This edition of Eye on A.I. was curated by Jonathan Vanian

Our mission to make business better is fueled by readers like you. To enjoy unlimited access to our journalism, subscribe today.

Read More

CEO DailyCFO DailyBroadsheetData SheetTerm Sheet