Skip to Content

Here’s Why Google’s Cloud Went a Little Dark This Month

Google’s cloud took a hit earlier this month, but presumably all is well now.

The search giant said on Tuesday that problems with its Central U.S.-based data center region on August 11 led to some of its cloud-computing customers experiencing errors and delays in their service. Google’s Central U.S.-based data center region is located in Council Bluffs, Iowa., and the search giant has spent roughly $2.5 billion building it out.

The data center hiccup primarily impacted Google (GOOG) customers of its cloud-based app development tool Google App Engine. Google said 18% of customer apps hosted in its Central data center region saw error rates of 10% to 50% for roughly two hours. Another 14% of customer apps experienced error rates of 1% to 10%.

Additionally, Google noted 37% of these apps that suffered more errors than usual also witnessed a slight increase in latency, meaning that the apps were slower than they should be.

“We apologize for this incident,” Google said in an engineering blog post. “We know that you choose to run your applications on Google App Engine to obtain flexible, reliable, high-performance service, and in this incident we have not delivered the level of reliability for which we strive.”

Ultimately, it was a routine data center maintenance update gone awry that led to the cloud hiccup.

Google explained it occasionally moves its customers’ cloud-based apps between its data centers to more evenly distribute network traffic and computing resources. In this case, Google preps a new data center with all the computing resources it needs before it migrates the apps into their new home. The process entails Google periodically moving a group of the apps over to the new data center in chunks so that it can “gracefully drain traffic” from the “downsized data center” and improve the data center’s efficiency.

Get Data Sheet, Fortune’s technology newsletter.

During this routine app migration process, however, a software update affecting Google’s traffic routers was also occurring simultaneously. Google admitted the software update caused its routers to restart, which “temporarily diminished the available router capacity.”

With the routers no longer being able distribute the required resources effectively, some customers’ applications ended up feeling the burden. Google continued that its data center engineers were able to address the problem as it occurred and manually redirected the network traffic “to other datacenters which further mitigated the issue.”

For more about Google, watch:

The engineers then discovered that a configuration error, or software bug, was responsible for network traffic problem, and they eventually remedied the situation.

From the blog post:

In order to prevent a recurrence of this type of incident, we have added more traffic routing capacity in order to create more capacity buffer when draining servers in this region. We will also change how applications are rescheduled so that the traffic routers are not called and also modify that the system’s retry behavior so that it cannot trigger this type of failure.

This isn’t the first time this year that Google’s cloud experienced a hiccup due to a software bug.

In April, Google engineers were performing regular networking maintenance on some of Google’s data centers when someone made an error that triggered a “previously-unseen software bug” to spread the error to more computer systems. This chain of events led to the company’s Google Compute Engine and VPN service going offline for 18 minutes.

At the time, Google said it would offer refunds to some of its customers for the error. Google did not say on Tuesday if it would offer refunds due to the August hiccup.