OpenAI and Google lean in to AI personal assistants. Is this AI’s killer app?

Jeremy KahnBy Jeremy KahnEditor, AI
Jeremy KahnEditor, AI

Jeremy Kahn is the AI editor at Fortune, spearheading the publication's coverage of artificial intelligence. He also co-authors Eye on AI, Fortune’s flagship AI newsletter.

OpenAI CTO Mira Murati
OpenAI CTO Mira Murati presented the company's newest AI model, GPT-4o, yesterday.
Patrick T. Fallon—AFP/Getty Images

Hello and welcome to Eye on AI.

The big news in AI this week are the dueling product announcements from OpenAI and Google.

OpenAI has consistently tried to steal the news cycle from rivals by jumping out in front of their big product reveals with its own product releases, and this week was no different. The AI startup had built expectations around yesterday’s announcement so high—with rampant speculation that OpenAI would debut GPT-5 or a generative AI search engine—that CEO Sam Altman took to social media platform X on Friday to disabuse people of those ideas, while still trying to build excitement for Monday’s product reveal. 

What the company did announce was a souped-up version of GPT-4 called GPT-4o—the “o” stands for omni—that is designed to act as a personal assistant on a phone or tablet, with improved voice interaction, the ability to interpret and reason about pictures from a device’s camera, more capable language translation, and much faster response times. The assistant, with a default female voice, is apparently explicitly modeled on the digital assistant in the 2013 Spike Jonze movie Her.

OpenAI may have misplayed the expectations game a bit since compared to the hype it drummed up, many viewers of its livestream event seemed underwhelmed by the announcement. (To combat this, Altman and OpenAI also published blog posts as well as short videos showcasing a variety of use cases for the new model.)

The technological innovations behind GPT-4o are impressive. The model is natively multimodal—trained to take in voice and then produce voice, for example—as opposed to taking in the user’s voice, turning it into text that is fed to GPT-4 to create a prompt, and then feeding the resulting output to a text-to-speech model to produce a voice response. This speeds up the entire cycle. OpenAI has also impressively shrunk the number of tokens—segments of data that the model processes (in the case of English text, a token is usually equal to a word and a half)—the model requires to perform a task. This also makes the model considerably faster and cheaper to run than GPT-4 Turbo, OpenAI’s previous best model. This, in turn, has enabled OpenAI to make GPT-4o available for free to all ChatGPT users, as well as to offer enterprise customers and developers use of the model through OpenAI’s API for half the cost of GPT-4 Turbo.

Then today, at Google’s I/O developer conference, the search giant announced a raft of new AI features and upcoming product releases, from the integration of generative AI capsule answers into its main search engine, a way to query the photos saved on Google Photos, and improvements to its Gemini chatbot. As my colleague Sharon Goldman, who is at I/O, relays, Google’s version of the AI personal assistant is being developed through what it’s calling “Project Astra,” with capabilities the company said will come to Google products, like the Gemini app, later this year. Demo videos that the company emphasized were done live in one take showed someone using a smartphone camera to show the AI what was around them. While right now OpenAI’s GPT-4o can only process still images, Astra can handle video. In addition, Google also unveiled improvements to its already very capable Gemini 1.5 Pro model so that it can have more natural-sounding, longer dialogues, better understanding of audio and images, more logical reasoning and planning capabilities, and better computer code generation. 

This is the sort of AI software that Google teased in December with a canned demonstration that was panned by reporters for being misleading about AI model Gemini’s video processing capabilities. Well, now Google is saying it has these capabilities for real. The company has also announced a doubling of the context window—how much data its models can process—for Gemini 1.5 Pro to 2 million tokens. That means the model can take in many books’ worth of text or the video equivalent of a feature film. Larger context windows don’t just allow the models to process more information, they also tend to reduce the model’s tendency to hallucinate (i.e. provide plausible but inaccurate outputs). Google also teased a future AI “agent” model that will be able to perform actions for users—such as booking movie tickets and flights—not simply generate text.

There are a few things to say about these announcements from OpenAI and Google. One is that they clearly put Apple and Amazon on the back foot. They need to upgrade Siri and Alexa to match these new rival capabilities or those products will be in trouble. We know both companies are working on it, and Amazon has Anthropic’s powerful Claude AI models to use. Apple is by all accounts much further behind on its generative AI efforts—which is why there are reports it was negotiating with OpenAI to license its technology in the near term. My colleague David Meyer has more on this in today’s Data Sheet newsletter.

More broadly, are these new personal assistants AI’s killer app? I think the verdict is very much still out—and depends entirely on what comes next. Most of the use cases OpenAI showcased so far seem fun and somewhat helpful, especially to parents, such as tutoring your kids or telling bedtime stories. But it’s unclear whether they are the sort of thing that will make such assistants ubiquitous, must-have products. The one exception might be translation—the ability to have a universal interpreter in your pocket wherever in the world you go could be transformative. But almost none of the use cases OpenAI or Google highlighted for the new assistants were around helping people in their jobs. That may change when these assistants have more “agentic” properties—and also when they can actually learn more about our personal preferences—and then complete tasks to our liking. We could all use a personal assistant that can actually do things for us in our daily lives—do our online grocery shopping for us, fill out insurance forms, book our vacations, etc. That really is likely to be a killer app.

How quickly those agents are coming is unclear. Google says it’s working on them but has not put a timeline on a product release. On Monday, OpenAI continued to tease exciting future announcements “coming soon”—possibly next week when its partner Microsoft holds its Build developer conference—but what they are is still a secret.

In the meantime, the question is, as with so much of the generative AI revolution, whether the benefits are worth the costs—to the companies, to consumers, and to society. While OpenAI has clearly made some technological breakthroughs that have reduced the costs of GPT-4o enough that it can make the model available at no charge, it’s definitely still costing them something to run. Altman recently said he wasn’t worried about OpenAI’s burn rate—”$500 million a year or $5 billion or $50 billion a year, I don’t care,” he said—but at some point his investors will care. And his business customers probably care too. (The pricing of GPT-4o to enterprise developers through OpenAI’s API is half what GPT-4 Turbo goes for, which may indicate the startup’s own costs are similarly about half. Still, the model isn’t cheap. So it’s unclear whether the use cases that businesses will be able to address with the new model will justify the price tag.)

While OpenAI is offering GPT-4o to consumers for free, users are essentially paying with their personal data, including their voice, and depending on how they use the model, images of their face or their family and friends, too. So there are definitely data privacy implications.

There may also be big societal costs that we aren’t aware of or anticipating. For instance, because OpenAI has said very little about how big a model GPT-4o is and how it was trained, we have little idea about what its lifetime carbon footprint and water usage will likely be. The electricity and water consumption of running AI models in the cloud is becoming an increasing concern as the adoption of the technology takes off. Will our glorious AI future be worth the damage to the planet? We don’t really know because the benefits are still uncertain, and tech companies are being less than transparent about the environmental bill.

We also don’t know how these AI personal assistants might subtly influence our thoughts and behaviors. People tend to be more influenced by voice-based interactions than they are when reading text. Can we trust that tech companies making these personal assistants will show us information that is in our best interest? Or will what they tell us be influenced by commercial partnerships the tech companies have struck? Last week, AdWeek reported on an OpenAI pitch deck it had obtained that revealed details of partnership agreements the company was offering media companies. It included priority placement and “better brand expression” in chatbot conversations. (OpenAI told AdWeek the documents were outdated.) While the publishers OpenAI has been talking to so far all have reputations for high journalistic standards and quality content, the idea of allowing partners and advertisers to pay to be featured more prominently in chatbot responses raises the specter of personal assistants that will subtly steer us to buy products, or even hold certain political views because that is what the tech companies are being paid to do. (Or, in some countries, it is easy to imagine that governments will mandate that personal assistants only express certain “politically correct” views.)

In the movie Her, Theodore (played by Joaquin Phoenix) falls madly in love with his AI assistant Samantha (voiced by Scarlett Johansson), and his obsession with the chatbot leads him to neglect real human relationships. When the chatbot is temporarily unavailable due to a systems upgrade, he is distraught. Versions of this have already happened in real life for some people, who have formed romantic bonds with chatbots from Replika and character.ai. And we don’t have good research yet on whether AI chatbots are a cure for loneliness—as some tech companies claim—or a crutch that substitutes for and ultimately impedes real human connection. My guess, judging from our experience with social media, is the latter.

Either way, I guess we are about to find out. With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

AI IN THE NEWS

U.S. and Chinese officials meet to discuss AI regulation. American national security officials and diplomats met their Chinese counterparts in Geneva today to discuss ways to reduce AI’s risks. The U.S. wants China to commit, as it has, to keeping AI out of the command-and-control systems for nuclear weapons. An unnamed U.S. official told Reuters that the talks were also an opportunity for U.S. officials to express their concern about China deploying AI in ways that undermined the security of the U.S. and its allies. The official also emphasized that the talks were not aimed at technical collaboration or joint research on highly capable AI models. “Our technology protection policies are not up for negotiation,” the unnamed official said.

French AI darling Mistral looks for $6 billion valuation in new fund raise. The Paris-based company, founded by former Google DeepMind researchers and which has released a number of impressive open-source LLMs, is looking to raise $600 million from venture capital investors including General Catalyst and Lightspeed, in a deal that would value the startup at $6 billion, the Wall Street Journal reported. To date, Mistral has raised about $500 million in two funding rounds, the last of which was only completed in December and valued it at $2 billion. The new funding round indicates both how fast the burn rate of AI startups working on foundation models may be—paying for all that GPU time is extremely expensive—and how much hype there continues to be around the sector.

Artists’ lawsuit against Midjourney, Stability AI to go ahead. A federal judge ruled that there is sufficient evidence for a copyright infringement case brought by artists against Stability AI and Midjourney to proceed, Reuters reported. Both companies produce popular text-to-image generation software. The case is being closely watched for the precedent it may set about whether there should be a “fair use” exception for training AI models or if the creation of digital copies of copyrighted works during the training process constitutes infringement.

Meta considers AI “camera earphones.” That’s according to a story in The Information that cited three unnamed current Meta employees. The idea is that the earphones, which would also be equipped with cameras, could allow users to translate foreign languages and identify objects. The devices would be somewhat similar in capabilities Meta is looking to build into its Ray-Ban smart glasses and that other companies are rolling out in glasses and other wearables. But, according to the article, Meta engineers have highlighted several potential problems with the earphones, including the problem that users’ long hair might obscure the cameras.

AI marketing software firm Typeface acquires chatbot maker Cypher. That’s according to The Information. Typeface is a unicorn startup whose generative AI software is designed to help companies with marketing efforts by creating customer-facing chatbots able to answer questions about a company’s products and align with their brand and corporate voice. Cypher is a small startup that allows users to create chatbot versions of themselves.

EYE ON AI RESEARCH

Using LLMs to simulate real-world interactions can help train better LLMs. That's the surprising conclusion reached by scientists at China’s Tsinghua University who created a simulated hospital, called Agent Hospital, in software where they had some LLMs play the role of patients and others the role of nurses and doctors. The doctor agent learns through reinforcement learning in the simulation based on their interaction with the simulated patients. After this training, it turns out that the doctor agent (which the researchers call “MedAgent-Zero,”) performs considerably better on tests of medical knowledge than before training. After 10,000 simulated patient interactions, the doctor agent topped 93% on a commonly used medical knowledge assessment benchmark, better than any previous LLM. This same method might be applicable in many areas outside of medicine, using role-play and reinforcement learning in simulations to train LLMs to be better copilots and decision-support systems for professional workers. It also shows how simulations can be used to generate synthetic data that can overcome shortages of real-world data. You can read the scientists’ paper on the Agent Hospital experiment on the non-peer-reviewed research repository arxiv.org here.

FORTUNE ON AI

The AI data center revolution is happening right in your backyard —by Dylan Sloan

Language learning app Duolingo is taking on music and math lessons —by Emma Burleigh

Apple says privacy is a ‘core value.’ Tim Cook shouldn’t compromise it to bridge the gap on AI —by Katie Paul

AI will hit the labor market like a ‘tsunami,’ IMF chief warns. ‘We have very little time to get people ready for it’ —by Paolo Confino

AI CALENDAR

May 21-22: AI Seoul Summit on AI Safety in Seoul, South Korea

May 21-23: Microsoft Build in Seattle

June 5: FedScoop’s FedTalks 2024 in Washington, D.C.

June 25-27: 2024 IEEE Conference on Artificial Intelligence in Singapore

July 15-17: Fortune Brainstorm Tech in Park City, Utah (register here)

July 30-31: Fortune Brainstorm AI Singapore (register here)

Aug. 12-14: Ai4 2024 in Las Vegas

BRAIN FOOD

The danger of cracking down on deepfakes and disinformation. In this year of global elections, everyone wants to do something about the problem of online disinformation and election interference and how AI is likely to supercharge those phenomena. But simply passing laws cracking down on the distribution of disinformation may not be the best way to go about it. The recent election in Senegal provides a telling, illustrative example, as Bloomberg reports. There, the incumbent president used an old, vaguely worded law against spreading fake news to silence and jail an opposition candidate for almost a year. The candidate, Bassirou Diomaye Faye, was released just days ahead of the presidential vote, yet improbably managed to win anyway. Despite that outcome, human rights groups warn that disinformation laws, unless coupled with free speech protections and civic institutions capable of high-quality, professional fact-checking, are often being used by governments to silence legitimate dissent. 

This is the online version of Eye on AI, Fortune's weekly newsletter on how AI is shaping the future of business. Sign up for free.