Meta’s new LLama model could be a game changer—but there are a lot of unknowns

Meta CEO Mark Zuckerberg said that making models such as Meta's new 405 billion parameter Llama 3.1 available to users for free was key to democratizing access to cutting-edge AI capabilities that would otherwise be monopolized by a few big tech companies.

Jason Henry—Bloomberg via Getty Images

Hello and welcome to Eye on AI.

People are buzzing about today’s release of Meta’s new Llama 3.1 model. What’s notable is that this is Meta’s largest Llama model to date, with 405 billion parameters. (Parameters are the adjustable variables in a neural network, and give a rough sense of how large an AI model is.) And according to benchmark performance figures that conveniently leaked onto Reddit the day ahead of the official release, Llama 3.1 exceeds the capabilities of OpenAI’s latest and greatest model, GPT-4o, by a few percentage points across a number of measures, including some benchmarks designed to test reasoning.

Not only that, but Llama 3.1 is, like the other Llama models Meta has released, an “open model,” meaning anyone can potentially build their own applications on top of it without paying, and even modify the model in any way they desire. But the models Meta has released before have been smaller, and less capable than any of the proprietary models, such as OpenAI’s GPT-4, Anthropic’s Claude 3 Opus, or Gemini’s Ultra or 1.5 Pro models. The fact that Meta’s new Llama 3.1 may have now closed the gap to GPT-4o has a lot of people excited that Llama 3.1 405B will be the model that finally enables many businesses to really unlock the return on investment from generative AI.

Anton McGonnell, head of software products at SambaNova Systems, which builds AI hardware and software for big companies, said in a statement that Llama 3.1 405B might be a game changer because it will allow two things: one is that companies can use the 405B parameter model to create synthetic datasets that can be used to train or fine-tune small open models to hone them for specific applications. This “distillation” process has been possible before but there were often ethical concerns about how the data used for “distillation” had been sourced (with data being scraped from the web without consent, or derived from the use of poorly paid human contractors).

McGonnell also applauded Meta’s decision to release Llama 3.1 405B as part of a family of Llama models of different sizes (there are also upgraded 70 billion- and 8 billion-parameter models) and to release a “Llama stack.” This is a set of related software built on top of and around the AI models themselves. Meta’s AI stack includes guardrails software, to prevent the AI models from generating harmful or dangerous content, and security software to try to prevent prompt injection attacks against the Llama models. The family of models and the AI stack, McGonnell said, create the possibility of chaining open models together in a way that would be especially cost-effective—using a process in which parts of a user’s query or an application are handled by small, fine-tuned models, and only those more difficult aspects that these models can’t handle are handed off to the full-scale 405 billion parameter model.

But McGonnell’s enthusiasm aside, there’s a catch—actually a bunch of them. The model is so big that it can’t easily be hosted on a single GPU or even a dozen of them. (Meta’s 70 billion parameter version of Llama 3 can potentially be run on two high-end Nvidia GPUs.) That means companies might have to pay for a lot of their own very expensive GPUs in the cloud to run the model and they will need a lot of rare technical expertise in how to split an AI workload across those GPUs and then bring the results back together to produce an output. To overcome those two issues, Meta is partnering with a bunch of companies, such as the AI services and data analytics company Databricks and the cloud service providers AWS, Microsoft Azure, Google Cloud, Nvidia Foundry, and others to host the model and offer tools and services around it. It has also partnered with Groq, a hardware company that builds an alternative computer chip to Nvidia’s GPUs that is designed specifically for running AI workloads on trained models, to help try to lower the cost of running such a large model and also speed up the time it takes the model to generate an output.

Such an arrangement starts to make access to Llama 3.1 405B look a lot more like accessing a proprietary model through an application programming interface (API), which is what OpenAI, Anthropic, and Google Gemini offer (Google also offers some open models, called Gemma). It’s not clear yet how the costs of hosting and accessing your own Llama 3.1 model through one of Meta’s partners will compare to simply building on top of OpenAI’s GPT-4o or Claude Opus. Previously, some developers have reportedly complained that hosting their own version of Llama 3’s 70 billion parameter model was sometimes more expensive than simply paying OpenAI on a per-token basis to access the more capable GPT-4 model.

It also isn’t clear yet how much developers will be able to tinker with the parameters of the Llama 3.1 model they are running on the servers of one of Meta’s partners, which presumably may be using the same model to run inference for several customers in order to maximize the return on their own hardware investment needed to host such a big model. If these partners limit how much developers can adjust the model’s weights, that may negate some of the advantages of using the open model. It also isn’t clear yet exactly what commercial licensing restrictions Meta has placed on the use of Llama 3.1 405B.

In the past, the restrictions Meta has placed around the licensing of its Llama models have led open-source software purists to complain that Meta has twisted the meaning of open-source beyond recognition and that these models should not be called “open-source software” at all. Hence the growing use of the term “open model” as opposed to “open-source model.”

As with all open models, there are also some real concerns about AI safety here. Llama has not revealed the results of any red-teaming or safety testing it has done of its own model. More capable models are generally more dangerous—a bad actor could more easily use them to suggest recipes for bioweapons or chemical weapons, to develop malicious software code, or to run highly automated disinformation campaigns, phishing schemes, or frauds. And as with all open models, it is easy for a sophisticated AI developer to remove any guardrails Meta has engineered into the baseline model.

Finally, as capable as Llama 3.1 405B may be, it will likely be superseded soon by even more capable proprietary models. Google is working on Project Astra, an AI model that will be more “agentic”—able to take actions, not just generate text or images. At Fortune’s Brainstorm Tech conference last week, Google’s chief research scientist Jeff Dean told me that Google will likely begin rolling this model out to some test users as soon as the fall. OpenAI is known to be training GPT-5, which will certainly be more capable than GPT-4o and may also have agentic properties. Anthropic is no doubt training a model that goes beyond Claude 3 Opus, its most powerful model, and also working on an AI agent.

All of this just underscores how competitive the AI “foundation model”—models on which many different kinds of AI applications can be built—has become and how difficult it will be for any AI startups working on such models to survive as independent entities. That may not bode well for investors in hot French AI startups Mistral and H, or other independent foundation model companies like Cohere, or even somewhat more specialized AI model companies such as Character AI and Essential AI. It may be that only the biggest tech players, or those closely associated with them, will be able to keep pushing the boundaries of what these models can do.

The good news for the rest of us is that, despite the caveats I’ve listed above, this foundation model race is actually driving down the cost of implementing AI models. While overall AI spending is continuing to climb as companies begin to deploy AI models more widely across their organizations, on a per-output basis, “the cost of intelligence” is falling dramatically. This should mean more companies will begin to see a return on investment from generative AI, accelerating the dawn of this new AI era.

With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

Before we get to the news… If you want to learn more about AI and its likely impacts on our companies, our jobs, our society, and even our own personal lives, please consider picking up a copy of my new book, Mastering AI: A Survival Guide to Our Superpowered Future. It’s out now in the U.S. from Simon & Schuster and you can order a copy today here. If you live in the U.K., the book will be published by Bedford Square Publishers next week and you can preorder a copy today here.

AI IN THE NEWS

Senate Democrats demand AI safety information from OpenAI. Senate Democrats have written OpenAI to demand data on its AI safety efforts following employee warnings about rushed safety testing, according to a story in the Washington Post. Led by Sen. Brian Schatz of Hawaii, the lawmakers asked CEO Sam Altman to outline plans to prevent AI misuse, such as creating bioweapons or aiding cyberattacks, and to disclose information on employee agreements that could stifle whistleblowing. OpenAI has said it has removed non-disparagement terms from staff agreements that might make it difficult for employees to become whistleblowers. The Senate's letter also requests that OpenAI allow independent experts to assess its safety systems and provide AI models to the government for pre-deployment testing. The senators have asked OpenAI to respond to their letter by Aug. 13.

Google appears to limit AI-generated overviews of search results. That’s according to The Verge, which cited data collected by SEO company BrightEdge. The prevalence of Google's AI-generated search results dropped from 11% of queries on June 1 to 7% by June 30, BrightEdge found. This reduction follows adjustments made by Google to address bizarre results, such as suggesting users put glue on pizza or eat rocks. Google disputes the study's methodology, noting it only tracks users who opted into the experimental AI features. Google has said it remains committed to refining AI Overviews to enhance their usefulness and maintain user trust.

Condé Nast asks Perplexity to stop scraping its content. According to a report in The Information, lawyers for the magazine publishing house have sent a cease and desist letter to the buzzy AI generative search company Perplexity asking it to stop scraping data from its magazines' web pages. Previously, Wired, a Condé Nast publication, had reported that Perplexity was continuing to cull data from web pages that had asked web crawling bots not to scrape their data using a protocol called “robots.txt.” Perplexity had publicly stated it would abide by the robots.txt protocol and not scrape data from such pages, but in experiments, the magazine had caught a web crawler scraping newly set up pages immediately after reporters for the publication sent queries to Perplexity’s search engine that included exact passages of text found on those new web pages. Perplexity has previously landed in hot water with Forbes for using information from its web pages without what Forbes considered adequate attribution.

Cohere valued at $5.5 billion in new funding round. The Canadian foundation model company, which is targeting business customers, has been valued at $5.5 billion following a new $500 million investment round led by pension fund PSP Investments. You can read more in this Bloomberg story.

Legal AI company Harvey valued at $1.5 billion in latest funding round. The startup published a blog post announcing a new $100 million Series C funding round led by GV (formerly Google Ventures), with participation from OpenAI, Kleiner Perkins, Sequoia, and other notable Silicon Valley investors. The company said the round valued it at $1.5 billion and that it would use the money to continue to scale. Harvey is an OpenAI partner and has built its legal copilot on top of OpenAI’s GPT models.

EYE ON AI RESEARCH

A new way to solve the reliability problems of today’s LLMs? One of the biggest problems with today’s LLM-based AI models is that they can be maddeningly unreliable. One minute they generate a wonderful and accurate answer to a complex physics problem. The next they can’t answer a much simpler high school mathematics problem or even win a game of tick-tack-toe. They hallucinate, making up information. And it can be very difficult if not impossible to figure out exactly why the models have gone wrong when they do go wrong.

Some AI experts, perhaps most notably Gary Marcus (who has long been a critic of pure deep learning approaches to AI), have suggested for a while now that hybrid systems, that include some elements of a neural network with some elements of deep learning systems or LLMs, could be the key to overcoming the big drawbacks of today’s frontier AI systems.

Now a group of researchers from the KU Leuven in Belgium, the University of Cambridge in England, and a number of Italian universities has proposed such a hybrid that they say achieves exactly this. They have developed an AI model they call a Concept-based Memory Reasoner (CMR). It works by using a neural network to break down a task into small conceptual chunks and then storing these chunks in memory. When confronted with a new task, it then must select from these chunks and combine them using symbolic rules to achieve an output. This allows human experts to inspect and verify the logic behind each decision, ensuring the AI is both accurate and transparent. In essence, it works by constraining what the neural network can do. It has to select from the set concepts it has seen from prior tasks and combine them according to a clear set of rules. This makes the output more reliable and easy for humans to interpret.

The drawback may be that this means the CMR may not be able to deal with all the new situations we would want the model to deal with. But it is an interesting experiment and may point to an approach with which businesses may want to experiment. You can read the paper here on the non-peer-reviewed research repository arxiv.org.

FORTUNE ON AI

The rise of the AI gadget could free us from our smartphones. We just need to find the right device —by David Meyer

The U.S. reigns supreme in AI startups while China ensures chatbots have ‘core socialist values’ —by Jason Ma

Industry leaders say companies are adopting AI, but cost and reliability remain key challenges —by Sharon Goldman

AI CALENDAR

July 21-27: International Conference on Machine Learning (ICML), Vienna, Austria

July 30-31: Fortune Brainstorm AI Singapore (register here)

Aug. 12-14: Ai4 2024 in Las Vegas

Dec. 8-12: Neural Information Processing Systems (Neurips) 2024 in Vancouver, British Columbia

Dec. 9-10: Fortune Brainstorm AI San Francisco (register here)

BRAIN FOOD

The results are in from Sam Altman's big study of universal cash transfers. One of the really big questions among those who think we may be approaching artificial general intelligence (AGI), is what would happen to everyone who might be put out of work? Many, including OpenAI’s Sam Altman, have postulated that some form of universal basic income (UBI) will be necessary—and that you could fund it by taxing the profits of AI companies or the businesses that see their productivity soar and costs decrease because of AI. Altman and his OpenResearch foundation funded the largest randomized study to date of universal cash transfer (or UCT, not UBI exactly because the amounts weren’t large enough to qualify as a full income). It involved some 3,000 people in Illinois and Texas. That’s more people than any previous study. And it also involved much larger cash transfers than prior studies: $1,000 per month for those getting the transfers, and $50 a month for those in the control group. That study just reported its results yesterday.

The findings: The cash transfers did have an impact. People spent more on necessities such as food, rent, and transportation. They were more likely to lend money to friends and relatives. Those getting the larger cash disbursement reported feeling more financially secure and their spending became less volatile month to month which might indicate better financial health. There was some small, but significant increase in people’s willingness to dream of becoming entrepreneurs—a 5% increase compared to the control group over the whole three years of the study—but no impact on people actually following through on those dreams and starting companies. You can read more about the research in this Bloomberg story.

The researchers admitted in their findings that the amounts of money being doled out were not enough to really make an impact on issues such as chronic health issues, lack of childcare, or lack of access to affordable housing. And I think the study also pointed to the real flaws of the idea of UBI. The amount of money you’d have to dole out to really give most Americans a basic income is so high there’s no way you could ever raise it without the debt soaring or the tax burden on companies being suffocatingly high. I asked Altman about this directly last year when I interviewed him for my book, Mastering AI. Could UBI ever be affordable? “Today, no. But if global wealth 10xed, then sure,” he said. I asked if AI could do that. “AI and energy together,” he replied. (He also invested in fusion power to drive down the cost of energy.) Well, I guess we’re still waiting for AGI and fusion then.

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.