The conversation I keep having with security leaders in Azure-heavy organizations is not "we do not have enough telemetry." It is "we have too much telemetry and we cannot answer the questions we care about." Azure Monitor, Log Analytics, diagnostic settings, activity logs, Defender for Cloud — the data is there. What is missing is the set of Kusto queries that turn the data into supply chain answers, and an org-level policy for where the data lives so that the queries can actually run across subscriptions.
This post is an industry-oriented look at Azure Monitor as a supply chain observability platform: the data sources that matter, the workspace design that makes them queryable, and the specific questions they answer. I am writing from the perspective of someone who has built these dashboards for a half-dozen organizations over the last two years, including two that were doing Azure-scale incident response at the time.
The Shape of the Telemetry
Azure's native telemetry comes from four sources that all end up in Log Analytics workspaces:
- Activity Log: the subscription-level control plane audit. Every ARM operation — who created what, who changed what, who deleted what — lands here. Retained by default for 90 days in the platform; longer if routed to a workspace.
- Azure AD sign-in and audit logs: identity events, including managed identity sign-ins (since 2022), service principal sign-ins, and directory changes.
- Diagnostic settings: resource-level data plane events, emitted from the resource providers that support them. Key Vault secret reads, ACR pushes, storage blob accesses, Function invocations. Each resource type has a different set of categories.
- Microsoft Defender for Cloud: alerts, recommendations, and regulatory compliance state. Built on top of the above plus the Defender-specific detections.
For supply chain observability, the control plane (Activity Log) and the data plane (diagnostic settings) are the two that matter most. Azure AD logs are the identity-side story that connects them.
Workspace Design: The Decision That Scales or Does Not
The single biggest operational decision in Azure Monitor is "how many Log Analytics workspaces, and what data goes where." Get this wrong and every cross-resource query is a cross-workspace query, which works but is slow and expensive. Get it right and the query interface is a single workspace.
The pattern that works for organizations with more than a handful of subscriptions:
- One central workspace per environment (prod, non-prod), receiving Activity Log from every subscription in that environment and diagnostic settings from all resources.
- A separate workspace for security-sensitive data (Key Vault, Sentinel, identity) with tighter RBAC and longer retention.
- Resource-local workspaces only for very high-volume data (Application Insights for heavy-traffic services) with continuous export to the central workspace for cross-resource queries.
The reason for consolidation is that a supply chain incident response query is almost always cross-resource: "show me every resource modified by this service principal in the last 72 hours across every subscription." That query is one line in a consolidated workspace and ten lines of union across per-subscription workspaces.
Cost management matters here. Log Analytics charges per GB ingested and per GB retained beyond 31 days. The economics favor basic logs (for high-volume data you query rarely) and archive logs (for data you keep for compliance but rarely query). Pick the tier deliberately; the default is analytics logs, which is the most expensive.
The Activity Log Queries That Matter
Activity Log in KQL lives in the AzureActivity table. The queries that I have built in some form in every observability project:
Who created or modified resources in the last 24 hours, grouped by identity?
AzureActivity
| where TimeGenerated > ago(24h)
| where OperationNameValue endswith "/write" or OperationNameValue endswith "/action"
| where ActivityStatusValue == "Success"
| summarize count() by Caller, ResourceProviderValue
| order by count_ desc
This is the "who is deploying" query. If a service principal you do not recognize shows up in the top of this list, that is the start of an investigation.
What role assignments were created in the last 7 days?
AzureActivity
| where TimeGenerated > ago(7d)
| where OperationNameValue == "Microsoft.Authorization/roleAssignments/write"
| project TimeGenerated, Caller, _ResourceId, Properties
Role assignment creation is the privilege escalation operation in Azure. Every creation should have a known reason. Uncorrelated creations are the signal.
What resources were created outside of the expected deployment identities?
let deployIdentities = dynamic(["<service-principal-id-1>", "<service-principal-id-2>"]);
AzureActivity
| where TimeGenerated > ago(7d)
| where OperationNameValue endswith "/write"
| where ActivityStatusValue == "Success"
| where CallerIpAddress != ""
| where not(Caller in (deployIdentities))
| summarize count() by Caller, ResourceProviderValue
This one requires you to know which identities should be deploying. The "should" list is usually short — pipeline service principals, maybe a handful of named human operators — and everything else is either a finding or a gap in your inventory.
The Diagnostic Settings That Matter Most
For supply chain specifically, four resource types need their diagnostic settings routed to the central workspace:
- Key Vault (categories:
AuditEvent,AzurePolicyEvaluationDetails). Every secret read, every key operation, every policy evaluation. - Azure Container Registry (categories:
ContainerRegistryRepositoryEvents,ContainerRegistryLoginEvents). Every image push, pull, and authentication. - Azure Kubernetes Service (categories:
kube-audit,kube-audit-admin). Kubernetes API server audit events, including deployments and policy violations. - Storage accounts with function packages (blob diagnostic settings). Access to function deployment packages.
For ACR specifically, the Kusto query that answers "which images have been pushed to production registries in the last 24 hours" lives in ContainerRegistryRepositoryEvents filtered to OperationName == "Push". Cross-reference that against the expected CI/CD identities, and any push outside the set is either a break-glass deploy or a problem.
Enabling diagnostic settings across hundreds of resources is tedious by hand, which is why the Azure Policy DINE policies for this exist. Deploy them at the management group level (covered in the Policy post) and let them ensure coverage.
Identity and the Sign-In Logs
Azure AD sign-in logs live in Log Analytics once you turn on the diagnostic setting on the Azure AD tenant — SignInLogs, AuditLogs, ServicePrincipalSignInLogs, and ManagedIdentitySignInLogs. The supply chain value is correlating a non-interactive sign-in by a service principal or managed identity with the ARM operations that followed.
Show me every service principal sign-in from an unusual IP in the last 24 hours:
ServicePrincipalSignInLogs
| where TimeGenerated > ago(24h)
| where ResultType == 0
| join kind=leftouter (
ServicePrincipalSignInLogs
| where TimeGenerated between (ago(30d) .. ago(24h))
| summarize knownIps = make_set(IPAddress) by ServicePrincipalId
) on ServicePrincipalId
| where not(set_has_element(knownIps, IPAddress))
This is a first-order anomaly. Service principals typically sign in from a small set of IPs (the CI runners, the orchestrator). A new IP is not necessarily bad, but it is worth reviewing, and the threshold is low enough that the noise is manageable.
Connecting the Graph
The highest-leverage observability work is joining these streams. A supply chain incident often looks like this: a service principal signs in from a new IP, then creates a role assignment granting itself access to a storage account, then reads data from the storage account, then exfiltrates. Each step on its own is plausibly benign. The sequence is the signal.
Building that sequence in Kusto is a union across the relevant tables, projected down to (time, actor, operation, resource), sorted by actor and time. The resulting timeline per actor is the investigative primitive that turns 15 disconnected alerts into one coherent incident narrative. This is the work that dashboards alone do not do; it is the work Sentinel analytics rules are meant to automate.
Retention and Cost Realism
Long retention is expensive but necessary. Activity Log should be retained for at least a year; Key Vault and ACR diagnostic events should be retained for the same. Azure Monitor's archive tier (archive logs, available since GA in 2022) gets you cheaper retention with the tradeoff of slower retrieval — queries against archived data are asynchronous and take minutes, not seconds. For compliance retention with occasional investigation access, archive is the right choice.
Budget the central workspace as a security-critical cost. I have seen security budgets cut the workspace and then be unable to investigate an incident six months later because the data was gone. The economics of "keep the data" are much better than the economics of "rebuild the investigation from partial records."
How Safeguard Helps
Safeguard reads from the Log Analytics workspaces an organization already runs and produces the supply chain views that are otherwise custom KQL — who is deploying what, where unexpected role assignments are happening, which images are entering production registries, and which identities are active on which resources. The queries I walked through in this post are built in, run on a schedule, and feed into the findings view that correlates across identity, control plane, and data plane. The result is that supply chain observability stops being a Kusto project and becomes a standing surface, which is what organizations actually need.