Google cloud services suffered a significant outage in the afternoon of June 2nd, which lasted for several hours and impacted many of its own and third-party services.
On June 3rd, the Palo Alto, California-based company revealed more details about the outage and gave solutions for preventing future incidents.
In a statement, Benjamin Treynor Sloss, vice-president of engineering at Google, said the company apologizes for the outage that caused “low performance and elevated error rates on several Google services.”
According to Google, the outage occurred as a result of a “configuration change” intended for a “small number of servers in a single region” was being applied to a “larger number of servers across several neighbouring regions.”
This caused congestion on the regional servers of Google Cloud.
“The network became congested, and our networking systems correctly triaged the traffic overload and dropped larger, less latency-sensitive traffic in order to preserve smaller latency-sensitive traffic flows,” said Sloss.
According to Google statistics, a 10 percent drop in global YouTube views occurred during the outage. Google Cloud storage measured a 30 percent reduction in traffic. One percent of active Gmail users, which accounted for millions of users worldwide, had problems with their account. Google Search, on the other hand, recorded a short-lived increase in latency.
Third-party services like iCloud and Snapchat, which uses Google Cloud, were also affected during the outage.
Sloss said that the issue was detected “within seconds” and the same team of engineers, who congested the network earlier, was put in charge to restore the correct configuration. However, the degradation in services slowed their progress.
Google said that the company is working to prevent similar events from occurring again.