Google stated, “One of the datacenters that hosts zone europe-west2-a could not maintain a safe operating temperature due to a simultaneous failure of multiple, redundant cooling systems combined with the extraordinarily high outside temperatures,”
The failure of europe-west2-a affected Google services like Persistent Desk, Google Compute Engine, and Google Cloud Storage which caused issues with networking, terminations, and service degradation. The total hours of outage lasted 18 hours and 23 minutes with an entire duration of 35 hours and 15 minutes before everything got back to normalcy.
According to Google, the zone’s failure happened due to not maintaining a safe operating temperature because of multiple failures, and the outside temperatures too added up.
The news is unsettling, especially since Google claims that its regional services are designed in a way that they can survive the failure of a single zone. However, the issue resulted in customers unable to access data services regionally across zones.
On 19th July, Google realized an issue that affected two of the cooling systems in a data center that hosted europe-west2-a and started looking into it.
The report stated that “Google engineers are actively conducting a detailed analysis of the cooling system failure that triggered this incident,”
However, Google has pledged never to let this happen again. Instead, they will investigate and develop methods to decrease the load within a single data center of thermal load to reduce the probability that may require a complete shutdown.
Google will also examine its tooling, automated recovery systems gaps, and procedures. Google will also be conducting an audition of the data centers for the cooling system equipment and standards globally for Google Clouds.