Access member only content, take part in discussions with comments on blogs, news and reviews and receive all the latest security industry news directly to your inbox. Join now for free.
Processing registration... Please wait.
This process can take up to a minute to complete.
A confirmation email has been sent to your email address - SUPPLIED EMAIL HERE. Please click on the link in the email to verify your email address. You need to verify your email before you can start posting.
If you do not receive your confirmation email within the next few minutes, it may be because the email has been captured by a junk mail filter. Please ensure you add the domain @scmagazine.com.au to your white-listed senders.
A lack of detail on the root cause of Friday’s Microsoft Office365 outage has even the strongest advocates of cloud computing concerned the vendor isn’t up to the task of securing online services.
The outage, which Microsoft claims only to have impacted customers for around four hours, took out global Office365, Hotmail and SkyDrive services.
Microsoft has had four days to provide a post-incident report, but has only provided the briefest of statements to explain what went wrong.
“On Thursday, September 8th at approximately 8 p.m. PDT, Microsoft became aware of a Domain Name Service (DNS) problem causing service degradation for multiple cloud-based services. A tool that helps balance network traffic was being updated, and for a currently unknown reason, the update did not work correctly. As a result, the configuration was corrupted, which caused service disruption. Service restoration began at approximately 10:30 p.m. PDT, with full service restoration completed at approximately 11:30 p.m. PDT. We are continuing to review the incident.”
“On Thursday, September 8th at approximately 8 p.m. PDT, Microsoft became aware of a Domain Name Service (DNS) problem causing service degradation for multiple cloud-based services.
A tool that helps balance network traffic was being updated, and for a currently unknown reason, the update did not work correctly. As a result, the configuration was corrupted, which caused service disruption.
Service restoration began at approximately 10:30 p.m. PDT, with full service restoration completed at approximately 11:30 p.m. PDT. We are continuing to review the incident.”
Microsoft's statement is nowhere as detailed as Amazon Web Services’ post-incident report when it suffered an outage in April.
Missing is information on why global services were affected – despite Microsoft’s promise of regional availability zones – and what steps it would take to ensure the incident is never repeated.
IT engineers discussing the outage with iTnews said it is perfectly feasible that Microsoft technicians did indeed break the load distribution system at a central location, from where the service is distributed globally.
But this doesn't explain why Microsoft's first response was to attribute the outage to a power failure in a post that was pulled within an hour.
In the vacuum of information around the outage, one hacking group has been in contact with SC Magazine Australia claiming responsibility for deleting Microsoft’s DNS records. The group is yet to provide the publication any concrete evidence (such as logs) of its involvement.
Microsoft MVP Wayne Small, owner of small business server resource SBSFAQ.com, said it was nonetheless of great concern that Microsoft’s own DNS (Domain Name Service) records – an essential element of its online services – could have been corrupted or deleted.
“DNS is the root of the internet – we rely on it to resolve domain names to IP addresses," Small said. "It is an intrinsic part of the design of DNS that it should still work if a single server goes down.
“It could be that, as Microsoft says, an update corrupted these DNS records. But it could just as well be some mischievous attacker deleting them.
"If somebody out there is able to kill DNS records, we better watch out. I would prefer to think Microsoft screwed up when updating their tool.”
Justin Warren, managing director at PivotNine said it was hard to be able to read into the outage without an intimate knowledge of Microsoft’s architecture.
“Perhaps Microsoft’s infrastructure is not as distributed as it should be,” he said.
But he does hold some doubts about why a hacking group would attack DNS when a DDoS attack on the service itself would be so much easier and equally effective.
Either way, the speculation could be remedied with a more detailed post-incident report.
“Why hasn’t Microsoft come clean?” Small asked.
“Microsoft’s explanation is nowhere near as detailed as what Google provided [for an hour-long Google Docs outage last week]. I’m a little concerned about that.
"Microsoft hasn’t given customers a clear understanding of just what plans are in place to make sure this doesn’t happen again.”
To begin commenting right away, you can log in below or register an account if you don't yet have one. Please read our guidelines on commenting. Offending posts will be removed and your access may be suspended. Abusive or obscene language will not be tolerated. The comments below do not necessarily reflect the views or opinions of SC Magazine, Haymarket Media or its employees.