There’s a lot of AI news this week from Google, Microsoft, OpenAI, and Anthropic, which we’ll cover in the news section below. Most of the product innovations these companies are rolling out are built on top of a few key “foundation” models. These are large AI models that, once trained, can perform all kinds of different tasks. Today’s large language models are trained to just predict the next word in a sentence, but after that training can perform many language tasks—from translation to answering questions like virtual encyclopedias to summarization.
But there are still some advantages in training more narrowly-tailored foundation models for specific areas. For instance, Google DeepMind’s AlphaFold 3 is a foundation model for biology. It can’t write poetry. But it can predict the structure of proteins and the interactions between two proteins or between any protein and any small molecule. That makes it super useful for tasks like drug design. Wayve, a U.K. self-driving car startup, has built foundation models that can handle many different aspects of driving—identifying objects, deciding the best way to steer the car, and working the accelerator and brake, for instance. Robotics company Physical Intelligence has built foundation models for robotics that can help any kind of robot perform all kinds of different tasks without any additional training.
For businesses, it is often a lot easier to see a path to ROI from these somewhat more narrow foundation models than it is from the completely generalist LLMs. A Swiss Army knife is great. But you probably wouldn’t want to use it to perform surgery. In today’s Eye on AI, I want to introduce you to Kumo, a Silicon Valley company that has built a foundation model that is supposed to make it easy to do something that sits at the heart of business decisions: making accurate predictions.
Saving time, data—and money
Normally, making predictions from data requires painstaking work by data scientists, over days, weeks, or even months. Machine learning and deep learning—the sub-branch of machine learning most closely affiliated with today’s AI—has been applied to predictive analytics for years. But these models were usually tailored to make just one particular kind of prediction in one specific context and have to be trained on a large dataset specific to that use case before they can render accurate predictions. Big technology companies and major retailers often have the reams of data needed to train these kinds of predictive AI models. But a lot of smaller enterprises do not.
Kumo’s new RFM model—which they are announcing and making available to customers today—on the other hand, can handle all kinds of different predictions. From customer churn to credit default risk to the chances that a patient discharged from the hospital will need to be readmitted within 24 hours—KumoRFM can handle all of these different predictions, and can do so almost instantly, without any additional training. “With the foundation model, you point it to your data, you define what you mean by churn, and a second later, you get the prediction,” Jure Leskovec, the Stanford University computer scientist who cofounded Kumo three years ago and serves as its chief scientist, told me, using the example of creating a customer churn model. He said a customer could further fine tune the model on its own data and get about a 10% improvement in the accuracy of its predictions.
Using graph neural networks to discover key correlations
Kumo’s model is based on Leskovec’s research into graph neural networks, which can encode the relationships between things in the structure of the network, and applying this method to data listed across different tables and understanding how the data in those tables changes across time. (RFM, the name of Kumo’s model, stands for Relational Foundation Model.) The model also couples a graph structure with the same kind of Transformer architecture that underpins LLMs. Transformers are particularly good at figuring what data to pay attention to in order to make an accurate prediction, even if the crucial, predictive data occurs far back in a sequence. The foundation model has been trained on publicly-available data as well as what Leskovec said is a large amount of synthetic data.
As long as the time stamps across a user’s tables are correct, Kumo’s model is able to make highly accurate predictions, he said. On benchmark tests that Kumo conducted, RFM without any fine-tuning performs better than some traditional machine learning methods, on par or better than a human data scientist who has hand-crafted a model, and only a bit worse than a graph neural network that has been specifically trained for that task. With additional fine-tuning, RFM performed on par or, for some tasks, significantly better than the graph neural network trained in the traditional way for a single task. And, critically, when compared with trying to use Meta’s Llama 3.2 B large language model and asking it to try to make predictions based on a prompt, RFM performed significantly better. (Kumo’s benchmark results have not been independently replicated and verified.)
KumoRFM’s results can also be more interpretable than many of the hand-engineered models that data analysts construct. That’s because human data analysts sometimes develop signals that they think are predictive—for instance, saying a customer might be more likely to purchase a particular product if they see an advertisement for it after 10 p.m.—but which turn out to be spurious. “Today’s models can only explain through the signals you generated. But in our case, we can go all the way down to the raw data and say, because of these events, because of this information, we made this decision,” Leskovec said.
Kumo has received $37 million in venture capital funding to date from investors including Sequoia Capital, and currently employs a team of around 50 people split between Silicon Valley and Europe. Its models have been used so far by companies including food delivery app DoorDash, Reddit, and U.K. grocery chain Sainsbury’s, among others.
For enterprises struggling to extract value from their data and frustrated by the lengthy process of building predictive models, Kumo’s approach could represent a significant efficiency breakthrough. (Amazon Web Services offers a foundation model called Chronos for making predictions about things that occur in a timed sequence, but it still requires fine-tuning to achieve accurate results. The data monitoring software company Datadog also offers a similar foundation model called Toto.) It also suggests that while much attention focuses on general-purpose AI, there remains enormous potential in more specialized frontier models that solve specific, high-value business problems. With that, here’s the rest of this week’s AI news.
With that, here’s more AI news.
Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn
Before we get to the news, the latest Fortune Most Powerful Women list is out today and it includes a number of important figures to the AI industry, including AMD CEO Lisa Su, Huawei deputy chairwoman Meng Wanzhou, Anthropic president Daniela Amodei, and Thinking Machines Lab founder and CEO Mira Murati. You can check out the list here. There’s also a great interview with New York Times’ CEO Meredith Kopit Levien by Fortune’s Ruth Umoh that touches on how the publisher sees AI as both an opportunity and a threat, and why it’s suing OpenAI. You can check that out here.
AI IN THE NEWS
Microsoft announces a clutch of new AI agent tools. At its annual Build developer conference, Microsoft made a dozen AI-related product announcements. Several involved products and features to make it easier for developers to create AI agents to perform business tasks. These include a new AI Foundry Agent Service and new agent creation tools in Copilot Studio that will support multi-agent workflows, make it easier to train agents on an enterprises’ own data and workflows, and a new Microsoft Entra Agent ID that will make it easier to securely control what data agents can access and to track what they are doing across a network. The company said it would support Model Context Protocol (MCP), making it easier for third-party agents and models to access Microsoft tools. The company also unveiled a new “coding agent” mode for GitHub Copilot that will perform tasks for a developer in the background, including building entire applications, running tests, and debugging. And it announced Microsoft Discovery, which it says is an entire platform that will make it easier for scientists to use AI to make progress in areas like materials science and drug development. You can read about many of the announcements in The Verge’s coverage here.
Google also announces a slew of AI models and features. At its I/O developer conference, Google unveiled a large number of new AI models and AI-powered features. The company released a new video generation model, Veo 3, and a new image generation model, Imagen 4, as well as a product called Flow that makes it easier for creators and would-be filmmakers to produce and edit videos using Google’s AI models. It announced an expansion of its capsule AI Overviews in Search as well as a new AI Mode that offers a more “AI native” search experience powered by its most advanced Gemini models. It is introducing real-time translation, with AI-generated voices designed to capture the tone of the speaker, into Google Meet. And it is gradually rolling out more agentic properties to perform shopping and booking tasks for users—capabilities Google is developing under what it calls Project Mariner—to search and other applications soon. You can read more about Google’s announcements from Fortune’s Sharon Goldman here.
xAI announces Microsoft Azure availability and plans to go after enterprise customers. At its Build developer conference Microsoft announced that it was making AI models from Elon Musk’s xAI available on Microsoft’s Azure cloud computing service. Developers will be able to access xAI’s Grok 3 and Grok 3 mini models through Azure’s AI Foundry service. During his Build keynote, Microsoft CEO Satya Nadella interviewed Musk, who appeared via video link, with Musk saying that his companies Tesla and Space X had found Grok 3 useful for business tasks and that he wants to sell the models to more companies going forward. Although the move was presented by Microsoft execs, including CTO Kevin Powers, as a wise technical and business decision by Microsoft, which offers AI models from many different third-party developers on Azure, it was hard to escape the political context of the move, given Musk’s influence with the Trump administration. You can read more from Bloomberg here.
OpenAI debuts Codex coding assistant. The new coding agent is designed to autonomously handle complex programming tasks, rather than simply helping a developer complete lines of code. Users can even assign Codex tasks through platforms like Asana or Slack and receive completed solutions without direct code interaction. Exactly how well the tool performs in the real world is unclear. Some developers complained on social media that the use cases OpenAI highlighted in its unveiling demo didn’t seem that interesting or realistic, while others reported difficulty setting up and using the new tool. Here’s Tech Crunch’s story on Codex.
Meta delays release of its “Behemoth” AI model. That’s according to an exclusive story in the Wall Street Journal that cites unnamed “people familiar” with the development of “Behemoth,” which is supposed to be Meta largest and most powerful AI model to date. The newspaper reported that Meta’s engineers are struggling to significantly improve Behemoth’s capabilities compared to the latest version of Meta’s Llama AI models, with the launch of Behemoth pushed back from June 2025 to the autumn or even later. A Meta spokesperson declined to comment. A number of leading AI labs have found that simply making AI models larger and feeding them more data is no longer sufficient to deliver big leaps in performance.
Trump signs law aimed at combating non-consensual deepfake porn. President Donald Trump signed the Take It Down Act, which criminalizes the distribution of non-consensual intimate imagery, including AI-generated deepfakes and requires websites to remove such content within 48 hours of a victim's request. The legislation was co-sponsored by Republican Sen. Ted Cruz of Texas and Democratic Sen. Amy Klobuchar of Minnesota and championed by First Lady Melania Trump. The law passed Congress with near-unanimous support and backing from over 100 organizations including major tech companies, but digital rights groups have expressed concerns that the bill’s broad language could lead to censorship of legitimate content. You can read more from the Associated Press here.
EYE ON AI RESEARCH
How good are AI agents at using a computer? That’s an important question as one of the easiest ways to get AI agents to perform tasks for us across the internet is to simply have them use a computer like a person would. But it turns out AI agents are not so good at that—which is one of the reasons that Model Context Protocol (MCP), which Sharon Goldman covered in this newsletter last week, is gaining in popularity. But MCP is a nascent standard, so it would still be good to get a sense of how well AI agents do interacting with the graphical user interfaces (GUI) that dominate computing—performing tasks such as reading information on a screen, using a mouse to complete a task, filling in data fields, etc.
Enter a new benchmark called OSUniverse that is designed to test agents’ GUI skills. The benchmark was developed by Kentauros AI, a startup working on a platform for AI agents. (You can read more about OSUniverse in this research paper on arxiv.org.) It grades tasks into a number of different levels of difficulty. Humans can generally perform 100% of these tasks without a problem. But even the best AI agents struggle. In tests Kentauros conducted, OpenAI’s Computer Use agent did the best, and it could only perform 47.8% of the tasks correctly. Anthropic’s Claude 3.5 Sonnet only got 28.36% right. So, unless MCP takes off much faster, it’s going to be a while until we count on AI agents to book those flights for us!
FORTUNE ON AI
Why AI isn’t fully replacing jobs—but is still reshaping the workforce —by Allie Garfinkle
Exclusive: Circle cofounder raises $18 million to build ‘AI-native bank’ —by Ben Weiss
Startup working on ‘reversible computing’ chip for AI says initial tests show a 50% energy savings —by Jeremy Kahn
Commentary: The EU should cut actual red tape, not AI safeguards —by Risto Uuk and Sten Tamkivi
AI CALENDAR
May 19-22: Microsoft Build, Seattle
May 20-21: Google IO, Mountain View, Calif.
May 20-23: Computex, Taipei
June 9-13: WWDC, Cupertino, Calif.
July 13-19: International Conference on Machine Learning (ICML), Vancouver
July 22-23: Fortune Brainstorm AI Singapore. Apply to attend here.
Sept. 8-10: Fortune Brainstorm Tech, Park City, Utah. Apply to attend here.
BRAIN FOOD
Can chatbots become a way to preserve and share collective memories? That’s the idea behind an intriguing project called The Big Pool Story that is trying to drum up support to preserve a public pool in Oak Ridge, Tenn. The Oak Ridge Outdoor Municipal Community Pool, affectionately known to area residents as “The Big Pool,” is one the largest spring-fed swimming pools in the U.S. and was built during World War II to serve employees of the Manhattan Project and their families. Over the ensuing decades, it was a touchstone for many growing up in the area. But the pool has fallen into disrepair and was slated for possible closure. A group seeking to preserve the pool has decided to gather memories of what the Big Pool meant to people who swam there over the years. It has created an LLM-powered chatbot that will answer questions about the pool and its history, but through which people can also submit stories about their memories of the pool and what it meant to them. The chatbot will then incorporate these memories into the responses it gives others who ask it questions. It’s an intriguing example of how an LLM can be used to increase community engagement and also possibly preserve memories, relaying them across generations. You can read more about the project here.