GPT-4 debuts and Google adds generative A.I. to office tools, beating Microsoft

Greetings. It promises to be (another) massive week in A.I. news. And that’s leaving aside the lingering effects that the collapse of Silicon Valley Bank may have on some A.I. startups and the venture funds backing them.

Right as this newsletter was going to press, OpenAI released its long-anticipated GPT-4 model. The new model is multimodal, accepting both images and text as inputs, although it only generates text as its output. According to data released by OpenAI, GPT-4 performs much better than GPT-3.5, its latest model, and the one that powers ChatGPT, on a whole range of benchmark tests, including a battery of different tests designed for humans. For instance, GPT-4 scores well enough to be within the top 10% of test takers on a simulated bar exam. OpenAI also says that GPT-4 is safer than GPT-3.5—returning more factual answers and it’s much more difficult to get GPT-4 to jump its guardrails than has been the case with GPT-3.5.

But, the company is also saying that the model is still flawed. It will still hallucinate—making up information. And OpenAI notes that in some ways hallucination might be more of an issue because GPT-4 does this less often, so people might get very complacent about the answers it produces. It is also still possible to get the model to churn out biased and toxic language. OpenAI is saying very little about how big a model GPT-4 actually is, how many specialized graphics processing units it took to train it, or exactly what data it was trained on. It says it wants to keep these details secret for both competitive and safety reasons. I’ll no doubt be writing much more about GPT-4 in next week’s newsletter. But my initial take is that GPT-4 looks like a big step forward, but not a revolutionary advance over what OpenAI and others have been racing to put into production over the past two months. And it will only heighten the debate about whether tech companies, including OpenAI, are being irresponsible by putting this powerful technology in the hands of consumers and customers despite its persistent flaws and drawbacks.

Meanwhile, Microsoft is expected to unveil a range of A.I.-powered enhancements to its Office software suite on Thursday. And Baidu, the Chinese search giant, has a big announcement scheduled for later this week. Google, which was caught flat-footed by the viral popularity of ChatGPT and OpenAI’s alliance with Microsoft, is eager to prove that it’s not about to be sidelined in the A.I. race. And the big news today before OpenAI’s GPT-4 announcement was that Google had beaten Microsoft out of the gate with a bunch of big A.I. announcements of its own.

For most people, the main news is that the search giant said it is adding generative-A.I. features to its popular Workspace productivity tools, such as Google Docs, Sheets, and Slides. Among the things people will now be able to do is use a text box to prompt Google’s A.I. to automatically draft almost any kind of document, or to create different kinds of charts for Sheets data. Users can highlight text and ask Google’s A.I. to edit it for them or rewrite it in a different tone and style. You will also be able to automatically draft emails or summarize entire email threads in Gmail. In Google Meet you will be able to generate new virtual backgrounds and automatically create notes of conversations, complete with summaries.

But equally important was the other news Google announced: The company is allowing enterprise customers to tap its most advanced family of large language models—called PaLM —through an application programming interface on Google Cloud.

Beyond PaLM, it has also launched an updated set of its Vertex AI platform for A.I. developers and data scientists. The platform allows them access to large foundation models, not just from Google, but from its growing ecosystem of allied A.I. labs, such as Anthropic and Cohere, as well as AI21 Labs and Midjourney. And it has launched a set of software, called Generative AI App Builder, that will allow slightly less technical teams to quickly build and roll out custom applications using generative A.I. models.

For both Vertex AI and the Generative AI App Builder, Google says users will have access to two new related capabilities: The first is an enterprise search tool that will allow them to perform Google searches across their own data—including data generated by CRM or ERP software, as well as internal websites and other documents—and return results only from that knowledge base. These results can then be used for natural language tasks, such as summarization, sentiment analysis, or question-answering, with less risk that the language model will simply invent information or draw information from its pretraining data rather than the customer’s own data. The other new capability is a chatbot-like “conversational A.I.” function that customers can deploy to act as the user interface for these search, natural language processing, and generative A.I. capabilities.

Google announced a group of initial “trusted testers” who will have immediate access to these new A.I. services including Toyota, Deutsche Bank, HCA Healthcare, Equifax, the television network Starz, and the Mayo Clinic, among others. The new products and features will be rolled out more broadly in the coming weeks, the company said. But it was a sign of just how intense this A.I. technology race has become that Thomas Kurian, the CEO of Google’s Cloud business, was forced to acknowledge during the press briefing that although Google was releasing these new products without having yet worked out exactly how to price them. In the past, Kurian said, Google had always made its A.I. advances available as free, open-source releases or the technology was simply “embedded in our products.” “This is the first time we are taking our new, general A.I. models and making them accessible to the developer community with an API,” he said.

Google’s press release on its new products touted the company’s commitment to “Responsible AI” and it tried to position its release under this rubric, noting that Vertex AI and Generative AI App Builder include tools to “inspect, understand, and modify model behavior” and that the information retrieval aspects of the new systems used traditional search algorithms, lessening the risk of inaccurate answers. But Kurian did not say exactly what sort of guarantees Google could offer customers that its large language models could not be prompted in ways that would elicit inaccurate responses—or worse, might morph their chatbot from a friendly assistant into a petulant, abusive, and threatening “devil-on-your-shoulder,” as testers discovered with Microsoft’s Bing. It also did not address whether Google was planning to take any steps to prevent users of its very popular Workspace tools from using the new generative A.I. features to deliberately churn out misinformation or to cheat on school essays.

Concern about this is growing. I recently debated Gary Marcus on a Canadian podcast—which hasn’t aired yet—about whether ChatGPT will wind up doing more harm than good. I am more sanguine about the technology’s potential than Gary is, but there is much we agree on. Gary is among those trying desperately to raise the alarm about the potential dangers—particularly when it comes to the industrial-scale production of misinformation that these generative A.I. systems represent. I recommend his recent piece in The Atlantic on this topic as well as his blog post questioning why those building advanced A.I. systems persist in doing so despite being fully aware of the potentially calamitous impacts the technology they are creating could have.

One reason may be that most of those researchers are now embedded inside big tech companies and if they step out of line, they get fired. Tech news site The Verge and Casey Newton’s The Platformer just revealed that Microsoft recently disbanded its A.I. ethics and society team—a central group that had been trying to raise concerns about many of the advanced A.I. systems Microsoft was building and had been urging the company to slow down the speed of its generative A.I. roll out. Some of the ethics experts were assigned to other teams. Some were fired. An audio recording of a Microsoft manager addressing the team about its restructuring that leaked to Newton made it clear that there was pressure from CEO Satya Nadella and CTO Kevin Scott to roll out OpenAI’s advanced A.I. technology throughout the company as quickly as possible and that questioning that decision or its pace was not appreciated.

Now Microsoft still has another corporate Office of Responsible AI, but its role is more to set high-level principals, frameworks, and processes—not to conduct the actual safety and ethical checks. The disbanding of the A.I. ethics group is further evidence of why the tech industry should not be trusted to self-regulate when it comes to A.I. ethics or safety and why government regulation is urgently needed.

Before we get to the rest of this week’s A.I. news, a couple of quick corrections: In last week’s newsletter, I got the surname of one of the cofounders of legal A.I. software company Casetext wrong. He is Pablo Arredondo, not Arrodondo. I also erroneously capitalized the letter ‘t’ in Casetext. I regret both errors.

And with that here’s the rest of this week’s news in A.I.

Jeremy Kahn
@jeremyakahn
jeremy.kahn@fortune.com

A.I. IN THE NEWS

Duck Duck Go releases its own chatbot. The search engine, which is known for its privacy-preserving features, has become the latest to bolt a chatbot onto its search bar, Mashable reports. The chatbot, which was built using technology from both OpenAI and its rival Anthropic, uses sources such as Wikipedia (and sometimes Britannica Online) to find answers to factual questions.

Stability AI buys A.I. photo editing company. Stability, the London-based company that helped create the popular text-to-image generating A.I. system Stable Diffusion, bought Paris-based Init ML, the startup behind the photo editing app Clipdrop, for an undisclosed sum, Stability said in a blog post. The app was probably best known for making it really easy to excise your ex from old photos. But it had a bunch of other cool A.I.-based editing tools, too. Stability, which recently raised $101 million at a $1 billion valuation and is now looking to raise more (see below), said Init would continue to operate as an independent but wholly-owned subsidiary of Stability.

Stability AI looks to raise more venture money at a $4 billion valuation. That’s according to a story from Bloomberg, which cited unnamed sources familiar with the funding round. The fast-growing A.I. startup last raised money in October at a $1 billion valuation, but generative A.I. is so hot—and training massive foundation models so expensive—that the company has decided it’s a smart move to raise yet more.

The U.S. government again warns companies not to exaggerate their A.I. claims. The Federal Trade Commission published a blog reiterating its warning that it will throw the book at companies that are found to be exaggerating what their A.I.-powered systems are really capable of. It particularly reminded companies that if it said its A.I.-enhanced product was somehow better than other ways of doing the same thing, the company would need hard data to back that claim up or face potential regulatory action for false advertising. It also said that “you need to know about the reasonably foreseeable risks and impact of your AI product before putting it on the market. If something goes wrong–maybe it fails or yields biased results–you can’t just blame a third-party developer of the technology. And you can’t say you’re not responsible because that technology is a ‘black box’ you can’t understand or didn’t know how to test.”

The debate about the carbon footprint of the generative A.I. boom is heating up. A story in Bloomberg looks at growing concerns about the carbon footprint of the massive foundation models that are powering the generative A.I. boom. It is estimated that training GPT-3, OpenAI’s large language model that is a predecessor to ChatGPT, used as much as electricity as 120 U.S. homes would consume in a year. The piece highlights that work of Sasha Luccioni, a researcher at A.I. company Hugging Face, who has been proactive in documenting the carbon footprint of the A.I. models the company builds. She has also tried to come up with good estimates for other large models too. The problem is that the carbon footprint of training and running these models varies tremendously based on which data center is actually being used—and even the particular day. Data centers in places like Iceland run entirely on renewable energy and don’t spend much electricity on cooling, while others, such as data centers in Ireland, have carbon footprints that vary greatly depending on the weather—low when lots of clean wind power is available, but spiking much higher whenever the wind drops.

EYE ON A.I. RESEARCH

Google shows how large language models might create more capable robots. Well, at least, sort of. What the company’s A.I. researchers have done is combine a very large language model with a smaller, but still pretty large, computer vision system. It calls the resulting model PaLM-E, with the E standing for “embodied.” The system takes in continuous visual information and can couple this input with text inputs. This allows the system to then output automatic instructions, in text form, for actions a robot should take in the world to produce a particular visual change in the environment. In other words, the system could be part of the brain of future robots. In testing, Google applied PaLM-E to three robotics tasks, including task and motion planning and object manipulation, and finds it performs much better than systems that were trained for just one of these tasks. What’s more, it finds the system can transfer its know-how to new environments and tasks with little to no additional training. It can, for instance, perform chain-of-thought reasoning about both visual images and text, even on questions that are very different from its training. The system may be an important step towards big advances in training robots. But it is also a bit scary since we are now starting to move beyond A.I. which just generates words or images to systems that generate actions (or at least instructions for actions) in the physical world. You can read the paper on the non-peer-reviewed research repository arxiv.org here.

FORTUNE ON A.I.

Commentary: Artificial intelligence is increasingly being used to make workplace decisions–but human intelligence remains vital—by Gary D. Friedman

Futurist Kevin Kelly says ‘there are no A.I. experts today’ and it’s a great time to enter the field—by Steve Mollman

Two researchers have created a new A.I. model that can draw what you’re thinking with 80% accuracy—by Tristan Bove

The tech mogul who gave the world ChatGPT is investing $180 million into a life extension startup—by Prarthana Prakash

BRAINFOOD

Can a digital filter help save artists from the threat of generative A.I.? The explosive growth of text-to-image generation models has been a blow to many creative artists, who worry both about the use of their existing intellectual property without fair compensation, the dilution of their personal brand (as the tools often make it simple for users to create images “in the style of” a living artist that is difficult to differentiate from the real thing), and the loss of future income, as publishers and businesses turn to generative A.I. tools rather than employing human illustrators and graphic artists. I recently spoke to Ben Zhao, a computer scientist at the University of Chicago, who has come up with an A.I. tool to help artists potentially avoid at least some of these existential threats. Called GLAZE, it is an A.I. model that applies a kind of digital mask to an image of an artist’s work that is not noticeable to the human eye, but which results in the image being misclassified by a text-to-image A.I. system during training. As a result, if an image generator is later prompted by a user to create an image in the style of that particular artist, the generator will fail to do so, instead producing an image that is significantly different from the artist’s work.

Zhao admits that GLAZE isn’t a silver bullet. It might still be possible with enough stylistic descriptors in a prompt, to generate images that are very close to the styles of certain artists. It also only works on images that are newly added to a training set—so any images already digitized and scraped into big training datasets will still be available for these models to learn from. It also doesn’t entirely resolve all the IP issues or issues around future employment raised by the advent of generative A.I. “This does not solve the misuses or fair use copyright issues. Those are being solved in court,” Zhao says. “We are just targeting the most obvious misuse ethically. It is just the tip of the iceberg.”

Zhao got the idea for creating the tool after he worked on previous software, called FAWKES, which masked people’s faces in digital images so that facial recognition wouldn’t work on them. Then, once DALL-E and Stable Diffusion came out, he started getting calls from desperate artists, wondering if they could apply FAWKES to their art. But the solution wasn’t that simple: There are only a few key points in the human face that need to be changed to render facial recognition unreliable. With art, there are places in an image where distortions need to be applied to result in misclassification and are more numerous and in diverse parts of the image. Creating GLAZE took Zhao some time—and the model takes a fair bit of computer power. But it works.

Zhao tells me he doesn’t buy the argument that text-to-image generators are “democratizing creativity” by allowing anyone to produce art that is as good as that created by skilled human artists, many of which have spent their entire lives perfecting their craft. “Democratization that is only possible by ripping off a small subpopulation’s work and, in doing so, ending that particular industry does not sound appealing,” he says. “In the end, if we really do ‘democratize’ human creativity in this way, there will be no one left to go into these industries and there will be no real creative, stylistic, or artistic advances. Everything will just be derivative of what already exists. So that is a real problem.”

What do you think? Does text-to-image generation democratize art creation or devalue true creativity?

This is the online version of Eye on A.I., a free newsletter delivered to inboxes on Tuesdays and Fridays. Sign up here.