Here’s Facebook’s Secret to Creating Software For Billions of Users
Facebook’s billions of users may not have noticed any changes. But over the past year, the social network has overhauled its underlying software so that it can debut new features more quickly and save its engineers time.
The fixes, both small and large, help keep the site operating smoothly, even during huge spikes in traffic during major events like Hurricane Harvey. They also help the company add new features like video streaming and messaging, without many technical hiccups.
To help with these projects, Facebook used software development techniques typically used by startups that are a fraction of its size, said Chuck Rossi who oversees the company’s big software release projects. When he first joined Facebook nearly 10 years ago after stints at Google (GOOG), VMware (VMW), and IBM (IBM), he saw how the “crazy kids,” as he put it, built the site and its related software infrastructure unlike anything he’d seen before.
Get Data Sheet, Fortune’s technology newsletter.
Instead of building software like most big companies do in long, drawn-out stages, Facebook’s small staff rapidly wrote code in smaller chunks to accommodate Facebook’s growth. This agile development approach, as its known in the tech industry, can tend to be more chaotic, but it results in being able to debut features more quickly than they would otherwise.
“Do I step in and apply my 20 years of experience here and force them to go down a more known and industry standard, or do we go with what these guys set up?” said Rossi. “I chose the later.”
Developers would have access to the company’s entire source code, and “cherrypick” bits and pieces from it for their respective projects, he said. Changes they made to the software would be implemented once daily.
But the more coders Facebook hired, the more frequently they wanted to modify the code, often from far-flung offices in Tel Aviv and Dublin. Coordinating the activity was difficult because of the global nature of their work.
Eventually, engineers ramped up to making nearly 1,000 changes to the code at three set times daily. Additionally, there would be a weekly mega update for changes that were supposed to happen earlier in the week, but for whatever reason, didn’t.
This process of releasing software in set times started to slow things down, which is not good for a service that keeps expanding. Eventually, Facebook’s coding started resembling the development practices of older, larger companies, and not the hot startup like Facebook once was.
Starting in April 2016, Facebook gradually started tweaking its software even more frequently, thus undermining any reason to have scheduled releases. Instead, Facebook developed a system it calls Gatekeeper that involves rolling out hundreds of changes every couple of hours.
Using customized tools, Facebook’s coders can automatically check their new software for bugs before implementing their changes. An automatic delay in pushing the changes to the entire service gives employees time to notice any hiccups, like a disappearing tab, so that they can hit an emergency kill-switch to stop the code from reaching users.
Once the code is ready, it goes out to only 2% of Facebook users. If nothing breaks, it then rolls out to everyone.
Rossi acknowledges that Facebook’s new check-and-balance system isn’t revolutionary considering most fast-rising startups have similar systems that ensure software is built fast without catastrophic bugs. Google and Amazon likely have similar systems.
Still, the fact that Facebook had to essentially overhaul how it builds software is noteworthy, and Rossi said it “was a little lonely and a little scary,” because there wasn’t much precedent for a company as big as Facebook to make such a big change in only a year. Over a course of three days in April, Facebook’s entire software process shifted to the new system, “and no one noticed,” he said.