Microsoft warns it lost some customer's security logs for a month
Microsoft is warning enterprise customers that, for almost a month, a bug caused critical logs to be partially lost, putting at risk companies that rely on this data to detect unauthorized activity.
The issue was first reported by Business Insider earlier this month, who reported that Microsoft had began notifying customers that their logging data had not been consistently collected between September 2nd and September 19th.
The lost logs include security data commonly used to monitor for suspicious traffic, behavior, and login attempts on a network, increasing the chances for attacks to go undetected.
A Preliminary Post Incident Review (PIR) sent to customers and shared by Microsoft MVP Joao Ferreira sheds further light on the issue, saying that logging issues were worse for some services, continuing until October 3rd.
Microsoft's review says that the following services were impacted, each with varying degrees of log disruption:
- Microsoft Entra: Potentially incomplete sign-in logs, and activity logs. Entra logs flowing via Azure Monitor into Microsoft Security products, including Microsoft Sentinel, Microsoft Purview, and Microsoft Defender for Cloud, were also impacted.
- Azure Logic Apps: Experienced intermittent gaps in telemetry data in Log Analytics, Resource Logs, and Diagnostic settings from Logic Apps.
- Azure Healthcare APIs: Partially incomplete diagnostic logs.
- Microsoft Sentinel: Potential gaps in security related logs or events, affecting customers' ability to analyze data, detect threats, or generate security alerts.
- Azure Monitor: Observed gaps or reduced results when running queries based on log data from impacted services. In scenarios where customers configured alerts based on this log data, alerting might have been impacted.
- Azure Trusted Signing: Experienced partially incomplete SignTransaction and SignHistory logs, leading to reduced signing log volume and under-billing.
- Azure Virtual Desktop: Partially incomplete in Application Insights. The main connectivity and functionality of AVD was unimpacted.
- Power Platform: Experience minor discrepancies affecting data across various reports, including Analytics reports in the Admin and Maker portal, Licensing reports, Data Exports to Data Lake, Application Insights, and Activity Logging.
Microsoft says the logging failure was caused by a bug introduced when fixing a different issue in the company's log collection service.
"The initial change was to address a limit in the logging service, but when deployed, it inadvertently triggered a deadlock-condition when the agent was being directed to change the telemetry upload endpoint in a rapidly changing fashion while a dispatch was underway to the initial endpoint. This resulted in a gradual deadlock of threads in the dispatching component, preventing the agent from uploading telemetry. The deadlock impacted only the dispatching mechanism within the agent with other functionalities working normally, including collecting and committing data to the agent's local durable cache. A restart of the agent or the OS resolves the deadlock, and the agent uploads data it has within its local cache upon starting. There were situations where the amount of log data collected by the agent was larger than the local agent's cache limit before a restart occurred, and in these cases the agent overwrote the oldest data in the cache (circular buffer retaining the most recent data, up to the size limit). The log data beyond the cache size limit is not recoverable."
by ❖ Microsoft
Microsoft says that even though they fixed the bug following safe deployment practices, they failed to identify the new problem and it took a few days to detect it.
In a statement to TechCrunch, Microsoft corporate vice president John Sheehan said that the bug has now been resolved and that all customers have been notified.
However, cybersecurity expert Kevin Beaumont says that he knows of at least two companies with missing log data who did not receive notifications.
This incident came a year after Microsoft faced criticism from CISA and lawmakers for not providing adequate log data to detect breaches for free, instead requiring customers to pay for it.
In July 2023, Chinese hackers stole a Microsoft signing key that allowed them to breach corporate and government Microsoft Exchange and Microsoft 365 accounts and steal email.
While Microsoft has still not determined how the key was stolen, the US government first detected the attacks by using Microsoft's advanced logging data.
However, these advanced logging capabilities were only available to Microsoft customers who paid for Microsoft's Purview Audit (Premium) logging feature.
Due to this, Microsoft was widely criticized for not providing this additional logging data for free so that organizations could quickly detect advanced attacks.
Working with CISA, the Office of Management and Budget (OMB), and the Office of the National Cyber Director (ONCD), Microsoft expanded its free logging capabilities for all Purview Audit standard customers in February 2024.
Comments