OpenAI says its making progress on “The Alignment Problem”

January 27, 2022, 5:59 PM UTC
Updated March 21, 2023, 5:58 PM UTC

Hello and welcome to a new, special monthly edition of Fortune’s “Eye on A.I.” newsletter. Today,  OpenAI, the San Francisco A.I. research company, announced that it had made significant progress on something called “The Alignment Problem.”

The term refers to the difficulty of making sure that an A.I. system does what humans want it to do. In traditional software, alignment wasn’t much of an issue, because humans both chose the goal they wanted the software to accomplish and wrote a very specific instruction set, or code, detailing every step the computer should take to achieve it. If the program did something wrong along the way, it was because the instructions were faulty.

With A.I., alignment is harder. While humans might specify the goal, the software itself now learns how best to achieve it. Often, the logic behind the software’s decision in any particular case is opaque, even to the person who created the software. And this problem becomes more challenging the more capable an A.I. system becomes.

OpenAI is interested in alignment because its founding mission is the creation of artificial general intelligence (AGI). That’s the kind of super-intelligent software that, for now, remains the stuff of science fiction—a single system that can perform most cognitive tasks as well or better than a human.

“Alignment is critical to the mission of OpenAI,” Ilya Sutskever, the legendary machine learning researcher who is OpenAI’s co-founder and chief scientist, tells me. “We want to build general purpose A.I. to benefit humanity, so it must not just be smart, but safe, and does the complicated tasks that we want it to do safely.”

OpenAI has not managed to create AGI. But it already has an alignment problem on its hands with its sole commercial product. That product, which it simply calls The API, is an application programming interface that lets paying customers access the company’s algorithm GPT. The best known version of that algorithm is GPT-3, a massive natural language processing system that can compose long blocks of text that are often indistinguishable from human writing. GPT-3 can also perform a lot of other language tasks, including translation, summarization, and answering questions. OpenAI’s API is available to customers of Microsoft’s Azure cloud computing platform as well as to OpenAI’s own customers.

The problem is that it can be very difficult to get GPT-3 to compose text the way a user might want. Prompt the software to “Please explain the moon landing to a six-year old,” and the system might well begin writing similar phrases, such as, “Please explain climate change to a six-year old,” and “Please explain the big bang to a six-year old,” rather than actually summarizing the story of Apollo 11 using age-appropriate language, says Jan Leike, an OpenAI researcher who focuses on The Alignment Problem.

Another issue is that having been trained on a vast amount of written material scraped from the Internet and previously published books, the text GPT-3 generates can be sexist, racist, and Islamophobic. It has tendency to veer into descriptions of violence. It is also difficult to get GPT-3 to answer questions factually, as opposed to just making stuff up.

OpenAI now says that it has made progress towards solving these alignment problems by creating a new version of GPT, which it calls InstructGPT. InstructGPT starts out a bit like GPT-3 in basic design and training. It too initially learns about language by ingesting a giant amount of text scraped from the Internet and books. But InstructGPT is a much smaller piece of software, only handling some 1.5 billion different variables at a time, rather than the 175 billion that GPT-3 uses. That is important because it makes InstructGPT easier and less expensive to train.

After its initial training, InstructGPT is then fine-tuned with two additional steps. First, it is supplied with what Leike says were “a few tens of thousands of examples” of text humans wrote in response to the same sort of prompts that OpenAI’s customers use to try to get GPT-3 to do something. The system has to learn to imitate these human-written responses. Next, the system is further honed by asking it to generate two different responses to a prompt and having human reviewers pick the one they think is best. This information is then used to create an internal reward mechanism where InstructGPT itself has to guess which of the responses it has generated is most likely to be preferred by a human, and that becomes its output.

Leike tells me that InstructGPT has not completely cracked The Alignment Problem. “It will still sometimes ignore an instruction or say something toxic,” he says. It can also sometimes still generate violent prose and false information. He also says that InstructGPT is so good at following human instructions that there is potential for abuse—someone could very easily teach the system to be more racist or sexist, for example. But OpenAI found that the new InstructGPT is so much less likely to run off-the-rails than the original GPT-3, that it has decided to make InstructGPT the default algorithm for all of its customers. People can still opt to use the larger GPT-3 if they wish, but Leike says that so far the human reviewers and beta customers OpenAI has used to test the system much prefer InstructGPT’s responses, even though InstructGPT doesn’t perform quite as well as GPT-3 on some academic natural language processing benchmarks.

That’s not surprising. The Instruct version of GPT is safer and more trustworthy. And to most businesses, as long as a certain performance bar is cleared, that’s what matters. It also shows that academic benchmarks may be a poor proxy for the things businesses actually want natural language processing software to do.

It’s not clear we’re very close to achieving AGI. But it’s good to know that companies like OpenAI are at least thinking hard about The Alignment Problem—and making some progress towards solving it.

Thanks for reading this special edition. Here’s tidbits of A.I. news that have occurred since the last regular edition of the newsletter earlier this week.

Jeremy Kahn


Tesla says its "Tesla Bot" is on track to be "the most powerful A.I. development platform." That's according to Andrej Karpathy, Tesla's director of A.I., in a recent LinkedIn post touting job openings for A.I. and robotics researchers at the company. Meanwhile on an earnings call, Tesla founder and CEO Elon Musk said the humanoid robot, which the company calls "Optimus" internally, could be more important than its lineup of electric vehicles, Bloomberg News reported

Speaking of Elon Musk, his brain-computer interface company Neuralink is getting closer to putting a chip in a person's brain, but former company insiders paint a picture of impossible deadlines, dysfunctional management and an absent CEO. That's what my Fortune colleague and "Eye on A.I." co-writer Jonathan Vanian and I discovered after spending two months digging into the company. We also found that Neuralink has made some genuine advances in brain-computer interface hardware and helped spawn a whole industry of similar startups that are attracting real money from venture capital firms. But that doesn't mean Neuralink will be able to live up to Musk's radical vision for what the technology will do. You can check out our feature story in the current issue of Fortune magazine and on the web here

Donald Trump's new social network plans to use A.I. to moderate content. Trump's new Truth Social network, which will launch on President's Day in February, said it will used technology from San Francisco A.I. company Hive to keep sexually-explicit content, and posts that include violence, bullying, hate speech, and spam, off the site. That's according to a story from Fox Business. "This is not political," Hive CEO and co-founder Kevin Guo told the network. "These are not things that are left or right or have any political baggage." 

Will driverless cars really be more than a gimmick? That's the provocative question The Financial Times tech reporter Patrick McGee asks in a reported essay in the paper's magazine. McGee argues that the problem with driverless cars is not so much getting the tech to work, but the economics. The costs associated with owning and running a fleet of robotaxis in sufficient numbers so they are always available on demand are much, much worse than for standard rides haring "marketplaces" such as Uber and Lyft that are based on a gig economy model. As a result, McGee says, companies like Waymo and Cruise could have real trouble making their business models successful. 

Our mission to make business better is fueled by readers like you. To enjoy unlimited access to our journalism, subscribe today.

Read More

CEO DailyCFO DailyBroadsheetData SheetTerm Sheet