Some small startups making headway on generative A.I.’s biggest challenges

Aligned AI co-founders Stuart Armstrong and Rebecca Gorman standing in front of Oxford's Radcliffe Camera.
Aligned AI's co-founders Stuart Armstrong and Rebecca Gorman. The small startup, based in Oxford, England, says it has developed a content filter for generative A.I. chatbots that is far more effective than a similar system developed by ChatGPT creator OpenAI.
Courtesy of Aligned AI

Many large companies are eager to reap the benefits of generative A.I. but are worried about both the risks—which are numerous—and the costs. In the past few weeks, I’ve had a number of conversations with startups trying to address both of these concerns.

Leif-Nissen Lundbaek is the founder and CEO of Xayn, a six-year-old A.I. company based in Berlin. It specializes in semantic search, the term that refers to techniques that allow people to use natural language to find information, and recommendation engines, which suggest content to customers. Lundbaek tells me that while most people have become fixated on the ultra-large language models, such as OpenAI’s GPT-4 and Google’s PaLM 2, they are often not the best tool for companies to use.

If all you want is to be able to find relevant information, a huge LLM isn’t the most efficient approach in terms of cost, energy efficiency, speed, or data privacy, Lundbaek tells me. Instead, Xayn has pioneered a suite of much smaller models that are better at learning from small amounts of data and surfacing results much faster than a very large language model would. Xayn’s models are small enough that they will run on a mobile phone, rather than requiring a connection to a model running in a data center. In a pilot project for German media company ZDF, Xayn’s recommendation software, which the company calls Xaynia, increased the volume of digital content users watched and the click-through rate compared to the media company’s previous recommendation model, while reducing energy consumption by 98%, Lundbaek says. He says that compared to OpenAI’s latest model for embedding information, which is called Ada 002, Xaynia offers 40 times better energy performance. It is also about 20 times more energy efficient than using Google’s BERT model.

In a demonstration, Lundbaek also showed me how the model tries to infer what content a user might like based solely on a single search or a single piece of content that a person engages with—in this case, a search for football, which surfaced recommendations about the soccer team FC Bayern Munich, as well as other sports—rather than, as many recommendation engines do, trying to compare a user’s profile with those of similar users it has seen before. Xaynia’s model is based mostly on the content itself. This solves many of the data privacy concerns that companies, particularly in Europe, have about how to personalize content for users without having to store lots of sensitive data about them, he says. “It’s completely individualistic,” he says. “Even if this user looks similar to someone else.”

Another thorny problem for chatbots powered by large language models is their tendency to produce toxic or inappropriate content and to easily jump guardrails. Aligned AI, a tiny startup based in Oxford, England, has developed a technique for content moderation that it says significantly outperforms competing models created by OpenAI. On a content filtration challenge that Google’s Jigsaw division created, OpenAI’s GPT-powered content moderation was only able to accurately filter about 32% of the problematic chatbot responses, while Aligned AI’s scored 97%. On a separate evaluation dataset that OpenAI itself provided, OpenAI’s moderation system scored 79% compared to Aligned AI’s 93%.

Rebecca Gorman, Aligned AI’s cofounder and CEO, tells me that even those kinds of results may not be good enough for many enterprise use cases where a chatbot might engage in tens of thousands or hundreds of thousands or even more conversations. At such scale, missing 3% of toxic interactions would still lead to a lot of bad outcomes, she says. But Aligned AI has at least shown its methods are able to make progress on the problem.  

While much of what Aligned AI is doing is proprietary, Gorman says that at its core Aligned AI is working on how to give generative A.I. systems a much more robust understanding of concepts, an area where these systems continue to lag humans, often by a significant margin. “In some ways [large language models] do seem to have a lot of things that seem like human concepts, but they are also very fragile,” Gorman says. “So it’s very easy, whenever someone brings out a new chatbot, to trick it into doing things it’s not supposed to do.” Gorman says that Aligned AI’s intuition is that methods that make chatbots less likely to generate toxic content will also be helpful in making sure that future A.I. systems don’t harm people in other ways. The work on “the alignment problem”—which is the idea of how we align A.I. with human values so it doesn’t kill us all and from which Aligned AI takes its name—could also help address dangers from A.I. that are here today, such as chatbots that produce toxic content, is controversial. Many A.I. ethicists see talk of “the alignment problem,” which is what people who say they work on “A.I. Safety” often say is their focus, as a distraction from the important work of addressing present dangers from A.I.

But Aligned AI’s work is a good demonstration of how the same research methods can help address both risks. Giving A.I. systems a more robust conceptual understanding is something we all should want. A system that understands the concept of racism or self-harm can be better trained not to generate toxic dialogue; a system that understands the concept of avoiding harm and the value of human life, would hopefully be less likely to kill everyone on the planet.

Aligned AI and Xayn are also good examples that there are a lot of promising ideas being produced by smaller companies in the A.I. ecosystem. OpenAI, Microsoft, and Google, while clearly the biggest players in the space, may not have the best technology for every use case.

With that, here’s the rest of this week’s A.I. news.

Jeremy Kahn


Pentagon attack deepfake shows the age of A.I.-driven misinformation is upon us. Fake images of smoke near the Pentagon, likely created with text-to-image generative A.I. software and posted from a blue-check Twitter account that seemed to be linked to Bloomberg News, went viral on Twitter, causing a brief selloff in the markets. Although the hoax was rapidly debunked, many analysts said the case showed the dangers of both generative A.I. to supercharge misinformation and the problems with Twitter allowing anyone to pay for a blue check. Jim Reid, Deutsche Bank's head of global economics, emphasized the dangers of A.I.-generated fake news affecting asset prices, as my Fortune colleague Christiaan Hetzner reported.

Anthropic raises another $450 million, valuing it at $4 billion. The San Francisco-based A.I. startup, which was formed by a group of researchers who broke away from OpenAI in 2021, raised $450 million in a Series C venture capital round, Axios reported. The funding round was led by Spark Capital with participation from Google, Salesforce Ventures, Sound Ventures, and Zoom Ventures. The new round comes hot on the heels of another $300 million venture round in March, and an additional $300 million investment from Google, which purchased a 10% stake in the startup, in February. The amounts indicate just how much money it takes to train ultra-large language models and hire top-tier A.I. talent. Anthropic has also had to make up for a $580 million funding hole: That’s the amount that disgraced crypto king Sam Bankman-Fried had previously pledged to the startup prior to the collapse of his FTX empire.

Samsung will not switch to Bing after all. That’s according to a story in the Wall Street Journal, which says that the electronics giant has suspended an internal review exploring the replacement of Google with Microsoft's Bing as the default search engine on its smartphones. In April, the New York Times reported that Samsung was considering the switch and that the prospect had caused panic within Google, which fears losing its market dominance in search due to perceptions it is moving too slowly to use generative A.I. to enhance its search offering. While Samsung has halted discussions, for now, Bing remains a future option, the newspaper reported. Samsung, the world's largest smartphone maker, has long viewed its reliance on Google's software as a concern and has been seeking ways to diversify its smartphone software.

OpenAI debuts a ChatGPT iPhone app. OpenAI launched an iOS app for ChatGPT, allowing users to access the chatbot on their mobile phones, the company announced. The app is free to use and syncs user history across devices. It also includes the ability to take voice input. The rollout begins in the U.S. and will expand to more countries soon, with an Android version also in the works, the company said.

Debt collection agencies are turning to large language models. Vice News reports that debt collection agencies are looking to embrace generative A.I., including OpenAI’s GPT models, to craft debt collection letters and emails, as well as to produce the scripts for robocalling applications. Odette Williamson, a senior attorney at the National Consumer Law Center, is among those who expressed alarm at the development, saying that A.I. models could reinforce systemic biases in a long history of lending discrimination against low-income groups and people of color. The U.S. Consumer Financial Protection Bureau (CFPB) said in a statement that “Regardless of the type of tools used, the CFPB will expect debt collectors to comply with all Fair Debt Collection Practices Act requirements and the Consumer Financial Protection Act’s prohibitions against unfair, deceptive, and abusive practices.”


Another open-source ChatGPT competitor emerges. The open-source community has been adept at rapidly mimicking the capabilities of the proprietary ultra-large language models being built by companies like OpenAI, Microsoft, Google, Meta, Anthropic, Baidu, and Cohere. This week brings another example: SambaNova offers an A.I. development platform based on Hugging Face’s open-source LLM BLOOM and contributions from an open-source company called Together. The result is BLOOMChat, a fairly large open-source model with 176 billion parameters—about the same as OpenAI’s GPT-3 model (but probably a lot smaller than GPT-4, whose parameter count has not been revealed by OpenAI but is thought to be as much as 1 trillion parameters). BLOOMChat stacks up well against its bigger, more expensive competitors, while easily beating many other open-source efforts, according to SambaNova.

When pitted against open-source rivals, BLOOMChat's responses across six languages were preferred by human evaluators in 66% of cases. Against, OpenAI’s GPT-4, BLOOMChat won 45% of the time, while GPT-4 was preferred 55% of the time. You can read more about BLOOMChat here.


Bill Gates says the winner of the A.I. race will be whoever creates a personal assistant—and it’ll spell the end for Amazon—by Eleanor Pringle

Ice Cube, musician who became famous rapping over samples, says A.I. is ‘demonic’ for doing a very similar thing—by Tristan Bove

Top tech analyst argues A.I. has spawned a ‘Game of Thrones’–style battle for what is a $800 billion opportunity over the next decade—by Will Daniel

Apple hasn’t gotten into the new tech gold rush—until now. Generative A.I. job posts are blanketing its careers page—by Chris Morris

Apple clamps down on employees using ChatGPT as more companies fear sensitive data sharing with A.I. models—by Nicholas Gordon


How should we avoid the potential dangers of artificial general intelligence? Artificial general intelligence (or AGI) is the kind of A.I. out of science fiction, the kind that is smarter than any human and can perform all of the cognitive tasks we can. It is also the stated goal of a number of A.I. research labs and companies, including OpenAI, Google DeepMind, and Anthropic. What do researchers at these labs think should be done to try to ensure the safe development of such powerful A.I.? Well, a group of researchers from the Centre for the Governance of AI, an Oxford-based think tank associated with the Effective Altruism movement, recently surveyed people working at these A.I. labs as well as in academia and civil society groups about what they thought should be done. It’s a small sample—just 51 participants completed the survey—but the results are interesting: Most of the participants were in favor of almost all 50 possible measures to mitigate the risk of developing dangerous AGI. More emphasis on pre-deployment risk assessments, dangerous capabilities evaluations, third-party model audits, safety restrictions on model usage, and "red teaming" attracted strong support from 98% of the respondents. The only measure that those surveyed seemed less in favor of was informing other research labs about progress toward AGI. You can read the write-up of the results here.

This is the online version of Eye on A.I., a free newsletter delivered to inboxes on Tuesdays. Sign up here.

Read More

CEO DailyCFO DailyBroadsheetData SheetTerm Sheet