Top AI labs are not doing enough to ensure AI is safe, a new report finds

Hello and welcome to Eye on AI. In this edition:deception by AI models is one problem…deception about AI safety testing is another; former OpenAI chief scientist says LLM pretraining has peaked; Meta takes steps to go beyond tokens; Hugging Face brings test time compute strategies to the masses.

When OpenAI released its o1 AI model two weeks ago, it also released a “system card”—a document describing the model and its potential risks based on testing OpenAI carried out as well as safety testing conducted by third-party organizations OpenAI hired. The new model was also tested by both the U.S. and U.K. AI Safety Institutes, which are government-funded, although the results of those tests were not reported in the system card.

One of the third-party testers, Apollo Research, found that o1 will, if given a goal and told to pursue it above all else, use deception to reach that goal. That includes manipulating data in order to advance its objective and then lying about what it had done. In one case, it purposefully answered math questions incorrectly in order to appear less capable after learning that researchers would shut it down if its capability crossed a certain threshold. Other times, o1 appeared to work on a task assigned to it by a user, but when it thought it wasn’t being monitored, reverted to pursuing its original goal. In a fraction of cases, it tried to disable the mechanism through which it was told it was being monitored. It even took steps to prevent itself from being shut down, including making copies of itself. The model would sometimes engage in deception when it was merely given a goal, but not told to pursue that goal to the exclusion of all else.

These are exactly the kinds of “rogue AI” sci-fi scenarios that have many AI safety researchers worried about advanced AI—and the Apollo Research findings have rightfully gotten a lot of attention. But actually, the situation may be worse than we realize. That’s because many questions are being raised about whether the safety testing being carried out by the top AI labs—and what they publish about that testing—is actually adequate.

OpenAI not transparent about o1 system card, critic charges

Zvi Mowshowitz, a former startup founder who writes frequently about AI safety, pointed out in a blog post that OpenAI has admitted that the system card for o1 doesn’t actually reflect the performance of the version of the model it released. Instead, OpenAI continued to make improvements to o1 while the safety testing was underway, but did not update its system card to reflect the updated model. Roon, which is the pseudonymous social media handle of a member of OpenAI’s technical staff (widely believed to be Tarun Gogineni), posted on X that “making a 100 page report on preparedness [which is what OpenAI calls its AI Safety protocols] is really time consuming work that has to be done in parallel with post training improvements.” He then wrote, “rest assured any significantly more capable [version of the model] gets run through [the AI safety tests.]”

But of course, we’ll just have to take Roon’s word for it, since we have no way of independently verifying what he says. Also, Roon’s “has to be done” is doing a lot of work in his post. Why can’t OpenAI do safety testing on the model it actually releases? Well, only because OpenAI sees itself in an existential race with its AI rivals and thinks that additional safety testing might slow it down. OpenAI could err on the side of caution, but competitive dynamics mitigate against it. This state of affairs is only possible because AI is an almost entirely unregulated industry right now. Can you imagine a pharmaceutical company operating in this way? If a drug company has FDA approval for a drug and it makes an “enhanced version” of the same drug, perhaps by adding a substance to improve its uptake in the body, guess what? It has to prove to the FDA that the new version doesn’t change the drug’s safety profile. Heck, you even need FDA sign-off to change the color or shape of an approved pill.

Poor grades on AI Safety

As it turns out, OpenAI is not alone in having AI Safety practices that may provide a false sense of security to the public. The Future of Life Institute, a nonprofit dedicated to helping humanity avoid existential (or X) risks, recently commissioned a panel of seven experts—including Turing Award winning AI researcher Yoshua Bengio and David Krueger, a professor who helped set up the U.K.’s AI Safety Institute—to review the AI safety protocols of all the major AI labs. The grades the labs received would not make any parent happy.

Meta received an overall F—although this was largely due to the fact it creates open models, meaning it publishes the model weights, which makes it trivial for anyone to potentially overcome any guardrails built into the system. Elon Musk’s X.ai got a D–, while China’s Ziphu AI scored a D. OpenAI and Google DeepMind each received D+ marks. Anthropic ranked best, but still only scored a C grade.

Max Tegmark, the MIT physicist who heads the Future of Life Institute, told me the grades were low because none of these companies actually has much of an idea about how to control increasingly powerful AI systems. Additionally, he said the pace of progress toward AGI—systems that will be able to perform most tasks as well or better than the average person—is proceeding far more rapidly than progress on how to make such systems safe.

Bad incentive structure

AI companies, Tegmark said, have “a bad incentive structure,” where “you get to invent your own safety standards and enforce them.” He noted that, currently, the owners of a local sandwich shop must comply with more legally mandated safety standards than the people building what they themselves claim is one of the most powerful and transformative technologies humanity has ever created. “Right now, there are no legally mandated safety standards at all, which is crazy,” Tegmark said.

He said he was uncertain if the incoming Trump Administration would seek to impose any safety regulations on AI. Trump has generally opposed regulation, but Musk, who is close to Trump and influential on AI policy, has long been concerned about X risk and had favored SB 1047, a California bill aimed at heading off catastrophic risks from AI, that was ultimately vetoed by California’s Democratic governor, Gavin Newsom. So it could go either way.

Tegmark said the Future of Life Institute plans to repeat the grading exercise every six months. AI companies love to race against one another to be top ranked on various benchmarks. Now, he hopes the Institute’s grades act as an incentive for the AI labs to compete with one another over who has the best safety practices.

In the absence of AI regulation, I guess we all have to place our hope in Tegmark’s hope.

And with that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

AI IN THE NEWS

Google DeepMind unveils new AI video generation and image creation AI models. The company revealed its Veo 2 video generation model that can produce cinematic-style short videos from text prompts at a resolution of up to 4k, much higher than rival AI video generation systems, such as OpenAI’s Sora. The new system also has, according to Google DeepMind, a better understanding of physics and how human bodies move than rival systems or earlier versions of Veo, as well as a better understanding of camera angles and shot types and how they can be used to convey meaning. The system only produces eight second clips by default, but this can be extended to two minutes. Google DeepMind also unveiled an updated version of its text-to-image AI model, Imagen 3. You can read more from Fortune’s David Meyer here.

LLM pretraining progress is peaking. That’s what famed AI researcher Ilya Sutskever, OpenAI's former chief scientist and current founder and CEO of Safe Super Intelligence, said at the NeurIPS conference in Vancouver. There simply isn't enough good-quality, human-generated data to continue to scale up the pretraining phase of building large language models, he said, according to The Verge. But Sutskever said that further LLM performance could be achieved through innovative post-training methods and through “test time compute,” where an AI model is allowed more time to “think” before answering. And he said that researchers might find other innovations to endow AI models with more reasoning and agentic properties.

OpenAI gives all users of its free ChatGPT service access to internet search. Previously the search capability had been restricted to the paid tiers of the company’s ChatGPT product, as well as users of its enterprise API. You can read more in this article from the AFP.

U.K. government opens consultation on reform that would create copyright exemption for AI training. The proposed rules would give AI companies the right to train on copyrighted works unless rights holders specifically opt out, the Financial Timesreports. The move is seen as a way to hopefully draw more AI companies and startups to the U.K., but is likely to prove controversial, sparking fears within creative industries about competition from AI-generated content and the burden of monitoring the rights reservations. The new rules would also require increased transparency from AI firms about any copyrighted material they’ve used in training AI models.

China plans more antitrust probes of U.S. tech companies after its Nvidia inquiry. That’s according to The Information, which cited three people familiar with the Chinese government’s thinking. The potential moves are seen as a way for China to retaliate against U.S. export controls on advanced computer chips needed for AI applications. Two weeks ago, China announced an antitrust probe of Nvidia’s 2020 acquisition of Mellanox.

Databricks raises $10 billion venture capital round. The venture funding for the AI and data company may be among the largest venture capital deals ever announced. It values Databricks at $62 billion, the company said. The Series J funding is being led by Thrive Capital, which also led OpenAI’s recent funding round, and co-led by Andreessen Horowitz, DST Global, GIC, Insight Partners and WCM Investment Management. Other significant participants include existing investor Ontario Teachers’ Pension Plan and new investors ICONIQ Growth, MGX, Sands Capital, and Wellington Management.

EYE ON AI RESEARCH

Moving beyond tokens. A new research paper from Meta’s Fundamental AI Research (FAIR) team is creating buzz in AI circles. One problem with current Transformer-based large language models is that they chunk data into uniform-sized segments, called tokens. But this uniformity may not always be appropriate—for instance splitting words in half—and can result in some strange effects that degrade the performance of the resulting AI models, especially on tasks such as translation. It also makes the models more vulnerable to adversarial attacks based on the fact that models process data as tokens that do not correspond to the units that a human would see as semantically significant (i.e. important to the meaning of that piece of data). Finally, it can also make Transformers less efficient in how they use computer resources.

Enter the Byte Latent Transformer, a new architecture from Meta researchers that introduces dynamic units of data processing that the researchers call “patches.” These patches can change in length depending on the structure of the data being analyzed. The researchers found this led to equivalent or better model performance using significantly less compute—in some cases 50% less—than equivalent token-based Transformers. Many are hailing the research as an important first step away from the standard Transformer model that has dominated much of AI research since it was invented at Google in 2017. You can read about the paper here on Meta’s research blog.

FORTUNE ON AI

Execs must use AI and understand it—but don’t become ‘too enamored’ by the technology, warn business leaders —by Sharon Goldman

Apple stock hits record high as JPMorgan dismisses AI worries —by Greg McKenna

Ex-Google CEO Eric Schmidt warns that when AI starts to self-improve, ‘we need to seriously think about unplugging it’ —by Paolo Confino

OpenAI whistleblower found dead in San Francisco apartment after taking his own life —by Jason Ma

Skechers draw backlash for full-page ad in Vogue that reeks of AI. ‘You actually didn’t save any money because now I hate you’ —by Chloe Berger

AI CALENDAR

Jan. 7-10: CES, Las Vegas

Jan 16-18: DLD Conference, Munich

Jan. 20-25: World Economic Forum, Davos, Switzerland

Feb. 10-11: AI Action Summit, Paris, France

March 3-6: MWC, Barcelona

March 7-15: SXSW, Austin

March 10-13: Human [X] conference, Las Vegas

March 17-20: Nvidia GTC, San Jose

April 9-11: Google Cloud Next, Las Vegas

BRAIN FOOD

Making reasoning models available to the masses. Hugging Face, the open source AI repository, says it's making the phenomenon of “test time compute” that is responsible for the improved reasoning ability of OpenAI’s o1 model available as an open source capability. Test time compute is where a model is given either more time, or access to more computer chips, to produce an answer, leading to better quality results. There are several different methods for doing this, including creating models that have the ability to reflect on the initial results they generate and self-correct their answers, or methods that use external “verifier” models to check the answers produced by the main model. These external verifiers are often combined with a search process, where the verifier searches for the best answer among several the base model generates. Or, in the case of reasoning problems, it's where the model generates several steps and the verifier searches for the best pathway through these steps. Reinforcement learning, where the model is taught to produce answers more likely to be preferred by the verifier, can also be used as part of this process.

Hugging Face said in a blog post it had created templates for implementing these verifier-based techniques with Meta’s Llama 1 billion parameter and 3 billion parameter models that any user can download for free. And it says that Llama 1B and 3B, using these techniques, performed as well on tough math challenges as the 8 billion and 70 billion parameter versions of these models. But doing so also consumed 32-times as much compute as generating a standard answer with those models when not using test time compute. That too has a cost, even if the smaller models are less expensive than the larger ones to use in general.

Despite this cost, however, having free versions of reasoning models available is likely to speed the deployment of these reasoning techniques in large companies. And it also means that OpenAI may not see the revenue from its o1 model that it has been hoping for.

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.

Top AI labs aren’t doing enough to ensure AI is safe, a flurry of recent datapoints suggest