Exclusive: Corpora.ai’s AI ‘research engine’ creates 8-page reports citing hundreds of sources in seconds

Hello and welcome to Eye on AI. In today’s edition: an exclusive on a new AI research platform; OpenAI kicks off 12 days of launches and demos; Google DeepMind’s new weather model outperforms existing systems; Spotify taps NotebookLM; and medical note-taking company Abridge scores another big health care contract.

Those watching the swarm of generative AI models and product launches this week can add another to the list: Corpora.ai.

The new platform, launching in a limited capacity today, is a “research engine” that scours academic papers, news articles, patents, and any other information available freely on the internet to create detailed research documents in response to user prompts in seconds. After a user inputs a topic, Corpora.ai creates an initial summary, which users can then request be expanded into a four or eight-page report complete with citations—often, several hundred of them. Founder Mel Morris, an English entrepreneur and early backer of Candy Crush, says he’s not offering search like Google, or even new AI-enabled search tools such as Perplexity, but instead aims to provide much more depth on a particular subject.

“It’s not going to help you find the cheapest place to buy a TV. But it will help you understand a topic you know something about or nothing about,” Morris, who’s funded the company so far, told Eye on AI.

This makes Corpora.ai the latest in an emerging crop of AI tools aimed at research, including Elicit, Consensus, Scite, and ResearchRabbit. I got an early look at the new platform and found that while it offers something different than many of the popular generative AI tools people are currently using, it still faces many of the same challenges.

How it works

The Corpora.ai model was built by the company from scratch and is yet another tool using RAG (retrieval augmented generation), an approach that’s swept the AI industry. After a user submits a query—for example, “bird watching in the New York Hudson Valley,” which is one I tried—it first deconstructs the prompt to understand what the topic or question involves and then breaks it into parts, Morris said.

Essentially, it creates an outline of what high-level information a research paper on bird watching in the Hudson Valley should cover, creates a bunch of related search terms, and starts making its way through Corpora.ai’s dataset to collect relevant information. Lastly, the platform uses a generative AI model to summarize the information collected, creating the text for the report. The four-page bird watching report I generated started with an introduction and had chapters on prime viewing locations, tips for identifying birds in the area, and conservation efforts.

Is sourcing enough to build trust and ensure accuracy?

The key to a product like this is the quality of the information and whether the report generated can be trusted. For this, Corpora.ai is relying on citations and an additional scoring system that indicates how much of the information was “extracted” directly from the source material as opposed to being written by the model. The sourcing is extensive—the bird watching report contained 31 sources, and an eight-page report I created about milestones in the history of AI development credited 323 sources. The sources are cited and linked throughout the text, as well as at the end of each chapter and overall end of the report.

The links direct you to the source material but not to the specific passages containing the relevant information, so you’re still on the hook for scouring the original source if you want to verify specific facts or figures. The source material I’ve seen so far mostly looks reputable, but it’s important to note opinion articles are included, so opinions may be presented as unbiased information.

There’s also the fact that at the final point in the process, a model is still generating the final text. It didn’t take me long to find differences between text in a Corpora.ai report and the text it cited. For example, my report about milestones in AI development at one point stated that “the integration of AI also prompted shifts in the employment landscape, with a significant percentage of jobs affected either through augmentation or replacement.” The source material actually states that nearly 40% of jobs worldwide face being impacted by AI—a projection of the future versus a statement that this impact has already occurred. Yet, the Corpora score for this chapter was 100%, indicating it was extracted directly rather than written by the model. While this kind of error has been observed in answers from other generative AI tools like Perplexity and Google’s AI summaries, it’s a fairly serious failing. And it’s not encouraging that it occurred on the very first fact I checked.

Corpora.ai vs. ChatGPT vs. Wikipedia vs. humans

The bird watching report Corpora.ai created was informative and decently pleasant to read. But I also wanted to see how the platform would fare on more intricate and specific research quests.

My request for a timeline of the major milestones in AI development served as the perfect topic for comparison, since it’s one I know well and thus would be able to easily evaluate for thoroughness and accuracy. Corpora.ai didn’t structure its report as a timeline like requested (OK, maybe timelines aren’t the platform’s strong suit). But it also offered a lot of superfluous information outside the scope of my request while omitting a significant number of the most pivotal events in AI development, such as AlphaGo’s victory over the human Go champion and the publication of the 2017 “Attention is All You Need” paper. Responding to the same prompt, ChatGPT generated a timeline but it was even less complete and lacked detail, even after revising my prompt several times. I checked Wikipedia and it had a handy timeline, but it was far too granular and didn’t contextualize the information. Last but not least, I Googled “Timeline of AI development.” The first link was an article from TechTarget, written by a human. It hands-down fulfilled my request the best. (Maybe search isn’t dead after all).

Lastly, since Corpora.ai is focused on deep research, I wanted to give it a chance to shine in this department. So I prompted it to research a high-level idea related to how technology impacts society that I’ve been kicking around for a while, but finding difficult to research through traditional means. The platform seemed to understand my prompt, but the information delivered wasn’t any better (or worse) than the other methods I’ve used to research this topic. The report did, however, repeat itself often, and at times, felt like just a collection of random facts strung together. In those moments, the way the model works (breaking a topic into parts, extracting information from various sources, and then combining and summarizing it) was palpable.

New model, same problems

From my early look at Corpora.ai, I can say it definitely adds something new and interesting to the landscape of AI tools. At the same time, it faces—and poses—many of the same problems as other generative AI products.

The text produced still feels slightly disjointed and soulless. It can’t be entirely trusted. Morris says the information is “extracted” from the source materials, which could draw the same copyright concerns affecting other generative AI platforms (he says the company is interested in revenuing sharing deals down the line). Also like other generative AI platforms, Corpora.ai relies on the availability of high-quality information. If Corpora.ai and tools like it succeed, eliminating the need for users to ever actually go to news sites or directly interact with the sources providing information that feeds the tool, what will happen to the business models that currently sustain those sources?

And with that, here’s more AI news.

Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com

AI IN THE NEWS

OpenAI promises daily launches and demos for 12 days of “shipmas,” which will reportedly include Sora and a new reasoning model. Citing sources familiar with OpenAI’s plans,The Verge reported the releases will include the company’s highly anticipated text-to-video model Sora, which was previewed in February and delayed throughout the year. Today on day one, OpenAI announced the roll out of the full-version of its o1 model, which is better at reasoning than its GPT-4o model. That is particularly important at tasks involving math, computer coding, and logical processes. The new, full-version of o1 also is significantly more reliable than the o1-preview version that OpenAI debuted in September, the company said. The new o1 model is also multimodal—meaning it can reason about visual inputs and respond to voice commands—which the o1-preview version could not do. The company also unveiled a new pricing tier, call Pro Plan, that costs $200 per month, but includes unlimited use of all its models, including a version of o1 that the company says offer even better performance than the version it is making available to ChatGPT Plus users, who currently pay $20 per month. Google yesterday also launched its own AI video model, Veo, and Amazon Web Services (AWS) released a video model—among others—earlier this week at its annual re:Invent conference. AI video models are officially here.

Google DeepMind introduces an AI weather model that outperforms current systems. Published yesterday in the journal Nature, researchers detailed how the model, called GenCast, outperforms traditional weather prediction systems and can make accurate forecasts further out: 15 days out compared to about a week previously. It’s a leap above Google DeepMind’s previous weather system, which it unveiled late last year and which topped out at 10 day accuracy. In addition to daily forecasts, GenCast also works faster than the current top-of-the-line models and outperforms on predictions of deadly storms, including predicting the paths of hurricanes with great accuracy. You can read more in the New York Times.

Spotify Wrapped adds personalized AI-created podcasts recapping users’ listening habits. Built using Google NotebookLM, the new feature adds audio commentary to the annual summary of personalized listening stats the company shares with users each year. Mine, for example, was three-and-half minutes long, unveiled my top song with a drum roll, and discussed some of my Wrapped stats with additional context, like when songs were released and how my listening habits changed with the seasons. The AI “hosts” also chimed in with their own commentary, discussing the vibes certain phases of my listening give off, the diversity of my taste in music, and describing songs as “amazing,” “energetic,” and so on.

FORTUNE ON AI

Exclusive: Mark Zuckerberg publicly praises Meta’s Llama AI, but also uses rival GPT-4 to improve an internal AI coding tool —by Kali Hays

Exclusive: Reasoner, a startup from Crashlytics’ cofounder, claims a breakthrough in making AI reliable enough for the enterprise —by David Meyer

Elon Musk’s budget-cutter-in-chief role for Trump is a ‘dangerous combination’ that risks creating conflicts of interest with his AI empire —by Sharon Goldman

Amazon’s new Nova models are part of its master plan to shine bright in AI —by Sharon Goldman

AI CALENDAR

Dec. 2-6: AWS re:Invent, Las Vegas

Dec. 8-12: Neural Information Processing Systems (Neurips) 2024, Vancouver, British Columbia

Dec. 9-10: Fortune Brainstorm AI, San Francisco (register here)

Dec. 10-15: NeurlPS, Vancouver

Jan. 7-10: CES, Las Vegas

Jan. 20-25: World Economic Forum. Davos, Switzerland

EYE ON AI NUMBERS

4,000

That’s how many physicians are set to begin using AI medical note-taking platform Abridge as part of a new deal with Corewell Health, one of the largest health care systems in Michigan. Announced today, it’s the latest health care provider to introduce the technology into hundreds of facilities as the industry jumps on generative AI with the aim of reducing clinicians’ documentation load and decreasing burnout.

The partnership follows a 90-day pilot where participating Corewell Health doctors reported spending an average of 48% less time on after-hours documentation each week, decreasing from 4.3 hours to 2.2 hours. According to Abridge, 90% of surveyed clinicians who used the technology reported an increase in the undivided attention they could give patients, 85% reported increased satisfaction at work, and more than half reported an decrease in burnout.

When asked if the documents were reviewed for accuracy, a representative for Abridge said Corewell Health has not shared specific data on this yet but that Abridge continuously collects data on accuracy at a system level. A white paper shared by Abridge describes the company’s processes for ongoing post-deployment monitoring, including capturing edits physicians make to the AI-generated notes and Abridge evaluators conducting reviews of the AI-generated notes and corresponding transcripts. This is of course sensitive health data, and on the privacy and security front, Abridge says it applies HIPPA security standards to all health data it collects, never sells “your identifiable data,” and performs its research only on de-identified health data or external datasets acquired with patient consent. (It’s important to note, however, that research has shown de-identified data can sometimes be re-identified.)

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.