How Databricks became an A.I. sensation

Ben Horowitz says it was one of the worst pitch decks he has ever seen.

It was 2013, and the Databricks team—a group of seven data scientists and professors—were working out of a room in an office building adjacent to the University of California Berkeley campus when Horowitz, cofounder of Andreessen Horowitz, arrived for their meeting.

“The graphics were terrible,” Horowitz recalls of the deck they showed him. “The ideas were somewhere between patronizing and insane,” he says. “It was a very unprofessional pitch deck compared to what we were used to, for sure.”

Ali Ghodsi, a Databricks cofounder who became CEO in 2016, laughed at the memory of it. “I think our first pitch was really, really bad,” he says.

Fortunately for the team, it wouldn’t matter very much. One of the team members, Berkeley professor Scott Shenker, was friends with Horowitz and had told him he thought cofounder Matei Zaharia was one of the “best distributed systems people out of academia in the last 10 years,” Horowitz recalls. Between that and the team’s open-source distributed computing project, Spark (now known as Apache Spark), Horowitz was already sold, and he says he would have invested should they have had no plan at all.

The Databricks team had been wanting to start small, Ghodsi says, and were looking to raise a meager $200,000. Some of them were skeptical of Horowitz, particularly after he asked for exclusivity for the Series A round. “We were like—we don’t want that guy. He’s not a programmer. That’s how we felt,” Ghodsi says.

Then the offer came in: Horowitz said a16z would put down $14 million. At the sound of a check that big, it didn’t take the Databricks team long to bite: They took it immediately.

Fortune published two of Databricks’ original pitch decks.
To read them, click here.

Ten years later, $14 million looks like peanuts. Databricks has raised a total of $3.5 billion from investors between then and its last funding round in August 2021, and it has more than 9,000 companies using its technology, from Microsoft to Warner Bros., to store massive droves of data in the cloud, generate analytics and insights, and power development of machine learning tools. Databricks revealed that it crossed an annual run rate of $1 billion in the second quarter of 2022 and that its revenue was growing at more than 70% annually. (Ghodsi won’t provide more recent figures about the company’s run rate other than to say it is “definitely higher than back then, thank God.”)

As of 2021, Databricks was one of the 10 most valuable startups in the world, with a $38 billion valuation and a list of star-studded backers like Horowitz from a16z, Peter Sonsini from New Enterprise Associates, and firms like Alphabet’s independent growth stage fund CapitalG and asset manager T. Rowe Price. Cloud providers Amazon Web Services and Microsoft have invested through their corporate venture arms.

Many of Databricks’ decacorn peers are pulling back their spend as interest rates climb and the economy stumbles—with growth-stage fintech companies Stripe and Klarna laying off workers, and peers like grocery delivery company Instacart slashing their valuations. Meanwhile, Databricks seems to be fairing well amid the market chaos, and is hunkering down amid the private market turbulence—nearly doubling the size of its headcount in 2022 and even making an acquisition.

Ghodsi has for years claimed Databricks is ready for an IPO, as the company has brought on execs like CFO Dave Conte, who took Splunk public, or DocuSign general counsel Trâm Phi as senior vice president, to prepare the company. The only problem? The current IPO market no longer seems ready for Databricks.

The decade of the ‘data revolution’

Ghodsi may not have the background you’d initially expect for a CEO running a company with approximately 5,000 employees that has raised billions of dollars in funding.

Until he became the CEO of Databricks in 2016, the Swedish-Iranian tech executive’s résumé had been heavy on academia (he has a Ph.D. in distributed computing, and had served as an assistant professor in Sweden) and light on business experience. This was true for all of Databricks’ cofounders, who had been working together in Berkeley’s research lab for about four years constructing Apache Spark, the open-source analytics engine for large-scale data processing that cofounder Zaharia had gotten started, before they began assembling their own for-profit company.

“This founding team had positives and negatives,” Ghodsi tells Fortune. “The negative—let’s start with that—was that we didn’t know anything about business. The positive was that maybe we didn’t know anything about business.”

Their roots firmly within the academic world, the Databricks team had insight into how companies like Google, Facebook, or Twitter were leveraging data, and how it left “a lot to be desired” at most corporations, according to Ghodsi. Their vision was to build a platform that could bridge that gap, and allow companies to organize and harness their own data to power insights, or build machine learning tools. The problem was it would take a bit of coaxing—and time—for the world of Big Business to catch on.

“We were sort of like, ‘Oh, of course everybody wants to do A.I.’ It turned out that was a little bit too early, maybe. So it was gonna take another decade for that to become true,” Ghodsi says.

People frequently asserted the Databricks team were making “rookie mistakes,” according to Ghodsi. That included their talk about machine learning: Detractors said no one but Google would use it.

Ten years later, Databricks’ cofounders—who are still all working at the company in some capacity—find themselves in a world where, largely thanks to the release of OpenAI’s ChatGPT tool, everyone from food bloggers to screenwriters to dermatologists have something to say about A.I., raising exposure to various use cases and potential for the technology.

“It’s one of these overnight sensations that was decades in the making,” says Wing Venture Capital founding partner Peter Wagner, who invested in Databricks competitor Snowflake’s seed round in 2014 and participated in every subsequent round thereafter. He describes this current moment in time as the “early innings of the data revolution,” where companies are upping their investments into data and innovating around the kinds of A.I. applications their data science teams can construct on top of their machine learning models.

This is the enterprise software market Databricks has capitalized on—and the one it expects to continue to swell in size. Databricks essentially acts as the data infrastructure layer for corporations: Its cloud-based platform allows a company’s data teams to store and safekeep data, generate analytics and insights, and power development of machine learning tools that can ultimately be used across the rest of the organization. Its “Lakehouse” architecture removes traditional data silos so teams can pull from a single data source.

For the uninitiated, the infrastructure layer—where Databricks, Snowflake, MongoDB, or Confluent have made their name—is extremely technical, and you would hardly consider it to be a “sexy” sector of the tech industry. Nonetheless, these technologies serve as the foundation necessary to power machine learning models, then later, more exciting and user-facing capabilities that sit on top.

“It’s a bit of a creative explosion for how we put these models to work,” says Wagner of the opportunity for the broader industry, now that the biggest data infrastructure players are accommodating various data types.

At Databricks to date there are more than 9,000 companies using its platform to power everything from supply-chain management to recommendations on which HBO show your cousin should stream. Shell, for example, uses Databricks to run more than 10,000 inventory simulations across all its parts and facilities—helping the oil company’s analysts decipher the ideal number of spare parts they should be storing in their warehouses, and it’s running predictions on Databricks’ platform on when equipment might fail. AT&T has used Databricks’ capabilities to train and deploy A.I. models that can detect and stop fraudulent phone purchase attempts. CVS Health uses the Databricks platform to signal when to remind patients to fill or pick up medications and identify potential side effects.

I think our first pitch was really, really bad.
Ali Ghodsi, Databricks cofounder and CEO

What’s particularly appealing about machine learning applications in the current market environment—which features a severely depressed economy and pressured bottom lines—is the opportunity for cost savings. Comcast says A.I. has helped reduce its computation costs by 10x, and J.B. Hunt says it shaved $2.7 million in IT infrastructure spending in 2022.

It’s the reason that Microsoft was willing to pour billions of dollars into OpenAI at the beginning of this year, even as it laid off around 10,000 of its own employees.

“[A.I.], I think, can make people more effective at their jobs—and we see that today with programmers,” Microsoft executive Yusuf Mehdi told Fortune in Bellevue, Wash., during the release event for its GPT-powered Bing search engine. “Programmers are actually more productive with their jobs. It’s not like they do less—they are now able to do more, because they can maybe not have to do a bunch of the base-level work [that they] can have automated.”

All of this is directly translating to the revenue growth Databricks has enjoyed in what’s been a relatively short period of time, and has seemed to position the enterprise software company as somewhat of an outlier in the current market—where many of the highest-valued decacorns are slashing their valuations or laying off employees.

Back in 2017, when Databricks was raising its Series D round from investors, the company had only about 500 customers, and it showed financial models detailing $21.9 million in annual recurring revenue (ARR). By 2021, revenue had grown to $600 million. Last year, Databricks adjusted its financial models to report run rate, rather than ARR, and said in August that it had crossed the $1 billion annual run rate threshold in the second quarter of 2022, and that it had crossed a 70% annualized growth rate.

Databricks said it had 5,000 customers in 2021. In February, it had more than 9,000. And last year, when decacorns like Klarna or Stripe were laying off workers, Databricks nearly doubled its headcount by hiring 2,400 new employees, and it acquired Datajoy, an analytics company that uses machine learning to correlate unstructured data from things like customer relationship management, marketing automation, or spreadsheets.

‘Who knows what we would be worth’

Ever since interest rates began to inch up and valuations began to plummet, growth-stage private companies have put their plans to go public on hold, shuttering one important way to raise additional capital for a company’s operations.

Meanwhile, many of the world’s most valuable private companies aren’t worth what they used to be. Stripe, for example, is reportedly looking at a valuation of $55 billion to $60 billion for its new round of funding, down from roughly $95 billion two years ago. Buy now, pay later startup Klarna, once valued at $45.6 billion, recently raised new capital at a $6.7 billion post-valuation, shortly after it laid off 10% of its workforce. Shein is reportedly in talks to raise new funding at $64 billion, a 36% trim from last April. And Instacart has cut its own valuation internally to about $10 billion, down from $39 billion in 2021.

Databricks isn’t immune. Ghodsi says Databricks has lowered its internal 409A valuation by approximately 10% below its last formal fundraise in August 2021—an adjustment, albeit a modest one relative to the share prices of some of its high-valued peers.

It’s hard to say how that reduced valuation would stand up in today’s market, should Databricks need to go back and fundraise. Competitors Snowflake and MongoDB, both public data management and cloud storage companies, are trading at more than 41% below what they were in August 2021, when Databricks was back in market fundraising. Meanwhile, asset managers like Fidelity, BlackRock, and T. Rowe Price, which hold both private and public equities in some of their mutual funds, have marked down the price of their Databricks shares to represent somewhere between a $24 billion to $31 billion valuation, with BNY Mellon pricing Databricks as low as $15 billion, according to SEC filings and data compiled by Caplight, a private market company data provider.

“Who knows what we would be worth if we were public,” Ghodsi says.

Horowitz, who has participated in every financing round since that first Series A investment, says Databricks has been growing so quickly that it doesn’t really matter that valuations are coming down, and estimates that the valuation figure probably hasn’t changed by much. “It’s right in that neighborhood,” Horowitz says, referring to Databricks’ last formal fundraising round. “Until you do a round, you’re not so sure, but it’s right there, yeah.”

But Databrick’s current valuation may not matter too much—so long as the company has enough capital to outlast the need of another fundraise before the IPO market calms. Ghodsi says the company doesn’t need capital, and that Databricks is still sitting on $2 billion on its balance sheet and that its growth strategy is fully funded. However, the company won’t comment on whether it’s profitable or not. Public competitors, which are typically held to more stringent expectations when it comes to reporting bottom-line losses, are reporting hundreds of millions in quarterly losses: Snowflake, which posted more than $589 million in revenue during the three months ending Jan. 31, 2023, reported a net loss of $207.5 million during the same period. Database platform MongoDB, also public, reported $333.6 million in revenue and a $84.8 million loss for the three months ending on Oct. 31, 2022.

‘Scratching the surface’

Horowitz was proof that Databricks didn’t need a well-put-together pitch deck to raise millions in capital. And by 2018, investors apparently didn’t need to look at one at all.

Since Databricks started fundraising for its Series D round a little less than six years ago, Ghodsi says he didn’t put together a formal pitch deck for any of the four subsequent rounds, rounds where the company raised more than $3 billion in capital.

Since Databricks’ team first launched their product, Ghodsi says they’ve always gotten pushback from investors: for building their product on the cloud when companies were using cold storage; for their open-source project Spark that wouldn’t make any money; or for talking about machine learning that detractors said no one but Google would use.

In hindsight, these are the very things that would ultimately drive enormous success for the data management company.

Ghodsi appears uninterested in when the company will go public, or is perhaps just bored by being asked about it. Maybe it is because of his tenure as a professor, or because of his years of technical work on Databricks’ product—but he’s much more excited to talk about how, after more than a decade since their founding team launched Apache Spark, the corporate world has truly latched on to what the cloud has made possible.

“I think this is just the very beginning, and we are just scratching the surface on what A.I. and data analytics can do,” he says.

Learn how to navigate and strengthen trust in your business with The Trust Factor, a weekly newsletter examining what leaders need to succeed. Sign up here.