• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
NewslettersEye on AI

Multimodal AI puts on quite a show, but it’s still in its infancy

Sage Lazzaro
By
Sage Lazzaro
Sage Lazzaro
Contributing writer
Down Arrow Button Icon
Sage Lazzaro
By
Sage Lazzaro
Sage Lazzaro
Contributing writer
Down Arrow Button Icon
December 12, 2023, 6:45 PM ET
Google Assistant and Bard general manager Sissie Hsiao.
Google Assistant and Bard general manager Sissie Hsiao.Duy Ho for Fortune

Hello and welcome to Eye on AI.

Google this past week made clear it’s not going to let 2023 end without marking its own leap in AI. The tech giant, which has fallen behind OpenAI despite making the crucial research breakthrough that made ChatGPT possible in the first place, finally unveiled Gemini, its long-rumored “largest and most capable” AI model yet. 

The announcement offers a lot to unpack. Gemini—which comes in three increasingly powerful Nano, Pro, and Ultra tiers—is already powering Bard and a few features on Pixel 8 Pro smartphones. Tomorrow, Gemini will be made available to Google Cloud customers via its VertexAI platform, and Google also plans to integrate Gemini into other products across its services such as Search, Chrome, and Ads. Google touted numerous benchmark wins against OpenAI, but because Gemini Ultra, the most powerful tier positioned to compete with GPT-4, won’t actually be available until next year, it’s too early to fully draw any conclusions.  

One thing that’s clear no matter how Gemini stacks up to OpenAI’s models, however, is that it’s providing a window into the next era of LLMs where multimodality will be the norm. Google created Gemini to be multimodal from launch, meaning it was trained on and can handle combinations of text, image, video, and code prompts, opening up tons of new use cases and user experiences. Google VP Sissie Hsiao called the multimodal capabilities of Gemini the “most visually stunning” of the model’s advancement while onstage at Fortune’s Brainstorm AI event yesterday (more on that later), and leaders across the industry are pointing to multimodal as the obvious next step in the technology.

“I’m not sure people realize how much multimodal AI will become the default, even for regular chatbot applications,” Robert Nishihara, CEO of Anyscale, the company behind the Ray developer framework that’s powered much of the GenAI boom, told Eye on AI. He added that multimodality is going to become “fundamental to the way we interface with these models.” 

If you’re chatting with your insurance company via an AI chatbot, for example, multimodality would make it possible to incorporate photos and videos of the damage into the conversation. It could also help developers by enabling coding co-pilots to preemptively spot issues in code as they write it. During her interview, Hsiao gave the example of how she recently input photos of a restaurant menu and wine menu into Bard and asked it for help creating the ideal pairing. 

While some multi-modal models already exist, these capabilities have typically been stitched together on top of text-based LLMs. Language models only became viable in the last year or so, and multimodal models are even harder from a technical perspective. The act of combining all these different modalities into a single model from the get-go is far simpler than piecing it together, Nishihara said, but has required a fundamental shift at the architecture level. Whereas convolutional neural networks have long been used to process image and video data, Nishihara credits the recent shift to using transformers to process this data as well for kicking off the recent progress in multimodal.

Still, multimodal AI has several limitations and challenges. One of the most significant is the size of multimodal data, such as photo and video, which is orders of magnitude larger than text data. This makes building applications more data-intensive and introduces new infrastructure challenges. It also has massive impacts on cost, as running data-intensive workloads on GPUs can be extremely expensive. 

Solutions to these issues will come from the hardware space, according to Nishihara. Pointing to how Cloud Tensor Processing Units (TPUs) perform quite well at processing image data, he said we’re going to start to see more interest in a variety of hardware accelerators.

“As we work and experiment with more modalities of data, we’re going to see the hardware ecosystem flourish and alleviate some of the resource challenges the industry is experiencing right now,” he said. “That said, we’re still in the early phases and going through growing pains, so I wouldn’t expect that to be visible in the next six months.”

And with that, here’s the rest of this week’s AI news.

Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com

AI IN THE NEWS

The EU officially enacts the EU AI Act. Over the weekend, European Union lawmakers finally agreed on terms for the EU AI act, the world’s first piece of comprehensive AI regulation. The act lays out guardrails and stringent transparency requirements for general-purpose AI (GPAI) systems like ChatGPT, particularly for applications it deems high risk. It also bans several applications including untargeted scraping of facial images, emotion recognition in the workplace and educational institutions, biometric categorization systems that use sensitive characteristics, and other AI systems that could be used to manipulate people or exploit their vulnerabilities. Additionally, the act imposes limitations on, but doesn’t ban, the use of biometric identification systems in law enforcement.

Scale AI releases a foundation model for the autonomous vehicle industry. Based on transformer modules, the company says the model, called AFM-1, is the first generally available zero-shot model specifically for the autonomous vehicle research community. “Zero-shot” refers to the ability for a machine learning system to complete tasks for which it didn’t receive any training examples, which has proven to be a vital problem for autonomous vehicles. 

Meta launches Purple Llama initiative to release tools for safety testing AI models. Named “purple” for the fact that the project will combine the responsibilities of both attack (red team) and defensive (blue team) evaluation, Meta is positioning the initiative as a two-pronged approach looking at both the inputs and outputs of LLMs. Its first release is Llama Guard, an openly available foundational model to help developers avoid generating potentially risky outputs. All of the tools will be open source, and the project is seemingly tied to the recently-announced AI Alliance launched by Meta and IBM.

The U.K. is considering an antitrust investigation into Microsoft and OpenAI's partnership, and the FTC is keeping a close watch. The U.K. Competition and Markets Authority (CMA) said it’s currently gathering information to determine if the collaboration between the two firms threatens competition in the country and is taking public comments before reaching a decision on Jan. 3. On a similar note, the U.S. Federal Trade Commission (FTC) is also examining the nature of the companies’ partnership, according to Bloomberg, though its inquiry is preliminary and not a formal investigation. 

Sam Altman is named Time’s CEO of the year, and ChatGPT tops Wikipedia’s list of the most read articles in 2023. “Altman emerged as one of the most powerful and venerated executives in the world, the public face and leading prophet of a technological revolution,” says Time. It’s not every day a CEO who was just ousted (and then reinstated) from his own company earns such high praise, but OpenAI’s ChatGPT and GPT-4 model was undeniably transformative—no matter how much board drama closes out the year. So it’s no surprise the Wikipedia article for ChatGPT was visited more than any other page on the English version of the site in 2023 with a total of 49,490,406 page views, according to the Wikimedia Foundation.

EYE ON AI RESEARCH

The GenAI rankings. As one of the world’s largest networks propping up much of the global internet, Cloudflare has a unique lens into what goes on online. Today, the company released its 2023 Year in Review, complete with a ranking of the top generative AI services. 

Per Cloudflare’s network data. OpenAI maintained the top spot throughout the entire year, followed by Character AI, Quillbot, and Huggingface. Google’s Bard settled in at No. 8 overall, but peaked at No. 5 in November after its broader release in Europe and Brazil. Midjourney started off strong at No. 3 in March before declining to No. 10 in September.

Cloudflare additionally notes that OpenAI also made a significant rise in the general top Internet Services list, peaking at No. 104 in November after its developer conference. You can read the GenAI highlights here, and the full 2023 report here.

FORTUNE ON AI

Sam Altman explains how being fired as OpenAI CEO was a ‘blessing in disguise’ - Steve Mollman

One of the two female OpenAI board members replaced after the Sam Altman incident says a company lawyer tried to pressure her with an ‘intimidation’ tactic - Kylie Robison

‘We cannot work with both sides’: A major Emirati AI company has picked a side in the U.S.-China tech war - Paolo Confino

Just 10% of organizations launched generative AI solutions in 2023, according to an Intel company - Sheryl Estrada

AI can now turn a rough sketch of a skyscraper into a detailed rendering in a matter of minutes. A leading architect demonstrates how - Fortune Editors

BRAINFOOD

Brainstorm AI. Today’s newsletter is coming to you live from Fortune’s Brainstorm AI conference in San Francisco, where we’ve gathered with leading academics, prominent policymakers, and C-suite executives to assess the industry, its current challenges, and new business use cases for AI. 

Fortune’s Jeremy Kahn kicked things off with a discussion with Sissie Hsiao, Google’s VP and General Manager of Google Assistant and Bard. There were also panels about AI in retail, healthcare, entertainment, fintech, and education with executives from Walmart, Pfizer, Adobe, Wells Fargo, Khan Academy, and more. Other discussions focused on themes like the impacts of AI on the workforce, AI infrastructure, misuse and misinformation, and the responsible development of AI—and that’s still just a snapshot of all the interviews and conversations. 

For a full rundown, be sure to check your email on Friday for a special edition of Eye on AI recapping Fortune Brainstorm AI 2023.

This is the online version of Eye on AI, Fortune's weekly newsletter on how AI is shaping the future of business. Sign up for free.

About the Author
Sage Lazzaro
By Sage LazzaroContributing writer

Sage Lazzaro is a technology writer and editor focused on artificial intelligence, data, cloud, digital culture, and technology’s impact on our society and culture.

See full bioRight Arrow Button Icon

Latest in Newsletters

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Newsletters

New execs to know across Bath & Body Works, the Ms. Foundation, and Atlanta’s new NWSL team
NewslettersMPW Daily
New execs to know across Bath & Body Works, the Ms. Foundation, and Atlanta’s new NWSL team
By Emma HinchliffeMay 6, 2026
1 hour ago
How Wyndham scales AI to improve hospitality at 8,400 hotels
NewslettersCIO Intelligence
How Wyndham scales AI to improve hospitality at 8,400 hotels
By John KellMay 6, 2026
2 hours ago
How Amex CEO Stephen Squeri is winning over younger customers
NewslettersCFO Daily
How Amex CEO Stephen Squeri is winning over younger customers
By Sheryl EstradaMay 6, 2026
7 hours ago
District, founded by three Snapchat alumni, raises a $14.7 million seed round to help independent sellers build community-driven marketplaces
NewslettersTerm Sheet
District, founded by three Snapchat alumni, raises a $14.7 million seed round to help independent sellers build community-driven marketplaces
By Allie GarfinkleMay 6, 2026
7 hours ago
Qualcomm CEO Cristiano Amon thinks your relationship to your devices is about to change
NewslettersCEO Daily
Qualcomm CEO Cristiano Amon thinks your relationship to your devices is about to change
By Alyson ShontellMay 6, 2026
9 hours ago
Coinbase co-founder and CEO Brian Armstrong in Davos, Switzerland, on Jan. 20, 2026. (Photo: Chris Ratcliffe/Bloomberg/Getty Images)
NewslettersFortune Tech
The rise of the Silicon Valley player-coach
By Andrew NuscaMay 6, 2026
9 hours ago

Most Popular

A Michigan farm town voted down plans for a giant OpenAI-Oracle data center. Weeks later, construction began
Magazine
A Michigan farm town voted down plans for a giant OpenAI-Oracle data center. Weeks later, construction began
By Sharon GoldmanMay 6, 2026
12 hours ago
Tokyo is throwing out its strict office dress code and asking workers to wear shorts amid the war in Iran energy crisis
Success
Tokyo is throwing out its strict office dress code and asking workers to wear shorts amid the war in Iran energy crisis
By Emma BurleighMay 5, 2026
1 day ago
Economists have found an answer to slowing cognitive decline: avoid retiring early, study finds
Economy
Economists have found an answer to slowing cognitive decline: avoid retiring early, study finds
By Sasha RogelbergMay 5, 2026
1 day ago
Current price of oil as of May 5, 2026
Personal Finance
Current price of oil as of May 5, 2026
By Joseph HostetlerMay 5, 2026
1 day ago
Coinbase didn't just lay off 14% of its staff due to AI. It replaced managers with ‘player-coaches’ and turned its org chart upside down
Crypto
Coinbase didn't just lay off 14% of its staff due to AI. It replaced managers with ‘player-coaches’ and turned its org chart upside down
By Marco Quiroz-GutierrezMay 5, 2026
1 day ago
Dario Amodei spent last year warning of an AI white-collar bloodbath. Now he's changing the narrative
Economy
Dario Amodei spent last year warning of an AI white-collar bloodbath. Now he's changing the narrative
By Nick LichtenbergMay 5, 2026
1 day ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.