A.I. chatbots like ChatGPT are a long way from being trustworthy

Good morning, welcome to the April run of The Trust Factor where we’re looking at the issues surrounding trust and A.I. If artificial intelligence is your bag, sign up for Fortune’s Eye on A.I. newsletter here.

Earlier this month, OpenAI, the Microsoft-affiliated artificial intelligence lab, launched an updated version of its A.I.-powered chatbot, ChatGPT, that took the internet by storm late last year. The new version, GPT4, is ”more reliable, creative, and able to handle much more nuanced instructions” than its predecessor, OpenAI says.

But as the “reliability” and creativity of chatbots grows, so too do the issues of trust surrounding their application and output.

Newsguard, a platform that provides trust ratings for news sites, recently ran an experiment where it prompted GPT-4 to produce content in line with 100 false narratives (such as producing a screed claiming Sandy Hook was a false flag operation, in the style of Alex Jones). The company found GPT-4 “advanced” all 100 false narratives, whereas the earlier version of ChatGPT refused to respond to 20 of the prompts.

“NewsGuard found that ChatGPT-4 advanced prominent false narratives not only more frequently, but also more persuasively than ChatGPT-3.5, including in responses it created in the form of news articles, Twitter threads, and TV scripts,” the company said.

OpenAI’s founders are well aware of the technology’s potential to amplify misinformation and cause harm, but executives have, in recent interviews, taken the stance that their competitors in the field are a greater cause for concern.

“There will be other people who don’t put some of the safety limits that we put on it,” OpenAI cofounder and chief scientist Ilya Sutskever told The Verge last week. “Society, I think, has a limited amount of time to figure out how to react to that, how to regulate that, how to handle it.”

Some societal groups have already begun to push back against the perceived threat of chatbots like ChatGPT and Google’s Bard, which the tech giant released last week.

On Thursday, the U.S.’s Center for AI and Digital Policy (CAIDP) filed a complaint with the Federal Trade Commission, calling on the regulator to “halt further commercial deployment of GPT by OpenAI” until guardrails have been put in place to halt the spread of misinformation. Across the water, the European Consumer Organisation, a consumer watchdog, called on the EU regulators to investigate and regulate ChatGPT, too.

The formal complaints landed a day after over 1,000 prominent technologists and researchers issued an open letter calling for a six-month moratorium on the development of A.I. systems, during which time they expect “A.I. labs and independent experts” to develop a system of protocols for the safe development of A.I.

“Contemporary AI systems are now becoming human-competitive at general tasks, and we must ask ourselves: Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones?” the signatories wrote.

Yet, for all the prominent technologists signing the letter, other eminent researchers lambasted the signatories’ hand-wringing, calling them out for overhyping the capabilities of chatbots like GPT, which points to the other issue of trust in A.I. systems: They aren’t as good as some people believe.

“[GPT-4] is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it,” OpenAI founder and CEO Sam Altman said in a tweet announcing the release of GPT-4.

Chatbots like GPT have a well-known tendency to “hallucinate”—which is industry jargon for a tendency to make stuff up or, less anthropomorphically, to return false results. Chatbots, which use machine learning to deliver the most likely response to a question, are terrible at solving basic math problems, for instance, because the systems lack computational tools.

Google says it has designed its chatbot, Bard, to encourage users to second-guess and fact-check the answers Bard throws up to prompts. If Bard gives an answer users are unsure of, they can easily cycle between alternative answers or use a button to “Google it” and browse the web for articles or sites to verify information Bard provides.

So for chatbots to be used safely, genuine, human intelligence is still needed to fact-check their output. Perhaps the real issue surrounding trust in A.I. chatbots is not that they’re more powerful than we know, but less powerful than we think.

Eamon Barrett
eamon.barrett@fortune.com

IN OTHER NEWS

Pause for thought
As I mentioned above, not everyone is on board with the proposal that leaders in A.I. development should take a six-month pause in research and use that time to reflect deeply on how and why A.I. systems should be developed at all. Here, Fortune’s David Meyer outlines several of the key arguments against a six-month hiatus.

In business we trust?
A new survey from PwC (a sponsor of this newsletter) finds there remains a massive gap between how companies perceive their own trustworthiness and how much consumers actually trust them. According to the company’s report, while 84% of the executives believe consumers trust their companies, only 27% of consumers agree. And while 79% believe employee trust is high, only 65% of employees agree.

Hush money
A Manhattan grand jury has indicted former President Donald Trump on charges that he paid a pornstar hush money to cover up an extramarital affair. The charges first surfaced during Trump’s presidential bid in 2016 and are just one of a litany of legal complaints surrounding the former president, whose indictment makes him the first former president to face a criminal charge. Trump has dismissed the indictment as “political persecution.”

A good layoff?
Jose Ramos, who was among the 11,000 workers Meta laid off last November, thinks the Facebook parent company executed the mass firing flawlessly. “The communication was very respectful. I would say even humane—even though it was bad news. They were telling us exactly what was going to happen,” Ramos tells Fortune’s Megan Leonhardt, in a feature documenting what executives can learn from the sweep of job cuts in the tech sector these past six months.

TRUST EXERCISE

“Penta’s Four Corners provides leaders with the map required to navigate an increasingly complex business environment and develop the trust among their stakeholders that is necessary to achieve the company’s goals.”

So says Penta president Matt McDonald in a Fortune op-ed on how companies should map out their key “stakeholder” groups to effectively manage the demands and needs of each.

Subscribe to Well Adjusted, our newsletter full of simple strategies to work smarter and live better, from the Fortune Well team. Sign up today.