FORTUNE — About 54 million people play Zynga (ZNGA) games like FarmVille, Hidden Chronicles and Words With Friends each day. That level of activity adds up to a whole lot of virtual cows (and other imaginary goods) being sold, but it also poses a huge challenge for the social gaming company’s back-end infrastructure—the hardware and software that power Zynga. The newly-public company’s recently-launched online platform, Zynga.com, is adding yet another layer of complexity, as it will be open to games from third-party developers. We caught up with Allan Leinwand, CTO of infrastructure at Zynga, to find out what went into developing the company’s “Z Cloud” and why he thinks the hybrid cloud is the answer to Zynga’s demanding (and often unpredictable) traffic load.
FORTUNE: In the past you’ve relied on Amazon Web Services. How has Zynga’s infrastructure strategy evolved?
Leinwand: When I first joined Zynga we were actually doing infrastructure in a traditional model—we had bought servers and network equipment and put them in cages at facilities and begun to scale that out. We had also begun to use the public cloud, Amazon (AMZN) Web Services, for games like FarmVille, which had scaled from zero to 25 million daily active users in six weeks. So we had this model where a certain number of our games were sitting inside our own private facility and other games that we were launching were in the public cloud. We really enjoyed using both but as we watched the growth we realized that we were going to consume a lot more of the public cloud going forward and we thought there was a better way to build infrastructure. So we started building out our own private cloud in the middle of 2010 and six months later we were running live on a game.
Then we turned it into a hybrid cloud. What that means is that we tightly coupled these [the private and public clouds]. We built connections between them and made sure that we could move workloads between them seamlessly. We made it so that when our game studios launch new code they don’t have to think about whether or this is going into our private cloud or Amazon. If you’re trying to work on a new feature in FarmVille, you shouldn’t have to worry about where your servers are going to sit. The infrastructure can figure it out. The net result was that at the beginning of 2011 about 20% of our workload was sitting inside our private cloud in our private facilities and 80% in Amazon’s. By the end of last year we flipped that number around.
Why keep Amazon’s service at all now that you’ve built your own private cloud [Z Cloud]?
The advantage of the public cloud is that there are enough resources there to consume in a scale that‘s bigger than ours. We want to have the best of both. I like to think of it as the shock absorbers. We can take the private cloud and build it and scale it as we need to but in case we have those moments where large spikes occur or we’re adding a feature that we didn’t know would be so viral we have more capacity. We really think of Z Cloud as a hybrid cloud. We move the workloads around as we see fit.
Does the hybrid cloud model works particularly well for Zynga’s unique needs or is this something you think works well for other companies?
We need to build scale incredibly fast, but I think the hybrid cloud model applies to almost anybody. It means you’re going to have some piece of resource that you own internally and manage and monitor and operate. And you’re going to use the public cloud as a resource to handle bursts and spikes and workloads that might not be as core to your business. That might work for companies that have two servers, and for companies that have 100 servers. That model of own and optimize and utilize what you do best, and leverage a commodity in the public [cloud] makes a lot of sense. We think of the public cloud as a four-door sedan—they’re for a lot of things, like hauling groceries and family trips. What we did on Z cloud is we really tuned it for social games. It’s a race car for social games. And as we moved workload out of the public cloud to the Z Cloud we actually saw a 66% reduction in servers that we were employing for the same workload. So three servers in Amazon actually equaled one on Z Cloud. It wasn’t because our servers were more powerful, it was because social games work in a particular way.
How have you gone about building your hybrid cloud? What have you developed in house and what have you turned to other vendors for?
We use a monitoring and management service by Rightscale that allows us to monitor and manage and launch both what’s in Z cloud and in Amazon Web Services from a single console. There’s a lot of IP that we’ve built that figures out where the workload is best utilized. If two services go ask for a large increase in compute power at the same time something needs to understand which is more important than the other. So we’ve done a lot of automation. We can bring a thousand physical servers from being cold deadweight to running games in less than 24 hours. We have put processes and procedures in place to roll racks in and boot up machines and have them self-configure and automate themselves.
And now you’re about to share some of what you’ve done with others. How are you doing that?
We’ve built infrastructure that has this veneer of manageability and orchestration on top of it. On top of that there are also APIs and services that our game studios use–like feeds and publishing and analytics and stats. So essentially what we’ve been focused on doing is taking these APIs and transforming them to the Zynga platform. This is really just an iterative step that we’re taking. We have Z Cloud as infrastructure and we have games that consume that infrastructure. We have a series of APIs internally and now we’re taking them and exposing them publicly. We as infrastructure and services providers want to make the games successful. We want to make our studios focus on great games and make the infrastructure as seamless as possible. Now we’re opening up that API layer and beginning to make it seamless for other studios as well.