Architecture

Service Mesh Telemetry as a Dependency Source of Truth

Apr 7, 2026 • 11 min read • Buildpathio Team

Service mesh telemetry — specifically the distributed trace data generated by Istio or Linkerd sidecar proxies — is the highest-fidelity source of dependency truth available in a running microservice system. It reflects actual runtime call patterns, not what code imports declare or what architecture documents claim. Here is why it should feed your dependency graph, and how to make that connection.

Why documentation fails as a dependency source

Architecture diagrams and service catalog entries require manual updates. When an engineer adds a new HTTP call from service A to service B, they update the code. Updating the architecture diagram is optional and rarely enforced. Within months, documentation drifts from reality.

Static code analysis is more reliable than documentation — it is derived from the actual codebase. But it has two blind spots: services that communicate through shared databases or message brokers (Kafka, SQS, RabbitMQ), and services that resolve endpoints dynamically through service discovery or environment variables at runtime rather than hardcoded imports.

A dependency graph built from static analysis alone will miss both of these patterns. In architectures with significant event-driven communication, this can mean 30-40% of the actual dependency edges are invisible to the graph.

What service mesh telemetry adds

Istio and Linkerd both instrument every inter-service call at the sidecar proxy layer. Every request that crosses a service boundary generates a trace span with source service, destination service, HTTP method, status code, and latency. This happens regardless of how the endpoint was discovered or how the call was constructed in application code.

Aggregating these trace spans over a time window — say, 24 hours of production traffic — produces a weighted directed call graph. Every edge in this graph represents an actual observed call between two services. Edge weight reflects call frequency (calls/hour). This is the real dependency graph.

When Buildpathio ingests this data via graph.include_runtime_traces: true, it merges the runtime call graph with the static analysis graph. Edges that exist in both sources get higher confidence weights. Edges visible only in telemetry (dynamic calls, event-driven paths) are added with a runtime-trace annotation so reviewers know their source.

"The dependency graph is not what your architecture doc says it is. It is what your service mesh says it is."

Integrating with Istio

Buildpathio reads Istio telemetry via the Prometheus metrics endpoint that Istio exports by default. No additional Istio configuration is required. In your buildpath.yaml:

graph:
  include_runtime_traces: true
  mesh_provider: istio
  prometheus_endpoint: "http://prometheus.monitoring.svc.cluster.local:9090"

Buildpathio queries Istio's istio_requests_total metric, grouped by source_app and destination_app labels, with a 24-hour lookback window. The result is a weighted edge list that feeds into the graph construction.

Integrating with Linkerd

Linkerd exports similar traffic metrics via its own Prometheus endpoint. The metric names differ (response_total with deployment labels instead of Istio's istio_requests_total), but the integration pattern is the same:

graph:
  include_runtime_traces: true
  mesh_provider: linkerd
  prometheus_endpoint: "http://prometheus.linkerd-viz.svc.cluster.local:9090"

In both cases, the Prometheus endpoint must be accessible from the Buildpathio agent (either the cloud agent, for Team tier, or the self-hosted agent for Enterprise). The graph update runs on the same schedule as the static scan — once per PR push. The telemetry lookback window is configurable via graph.trace_lookback_hours (default: 24).

Message broker dependencies: the gap both mesh and static analysis miss

Service mesh telemetry captures direct HTTP/gRPC calls between services. It doesn't capture indirect dependencies mediated by message brokers: services that produce to a Kafka topic and services that consume from it are coupled, but that coupling is invisible to both static analysis and sidecar proxy telemetry, because the two services never exchange a direct network connection.

Broker-mediated dependencies require a third data source: either direct inspection of topic consumer group configurations (for Kafka) or queue subscription configuration (for SQS, RabbitMQ). When Buildpathio's config includes broker integration, it enriches the graph with producer-to-consumer edges derived from broker metadata. A service that writes to order-events is dependent upon by any service that reads from order-events — and that relationship will appear in the blast radius calculation even if it never shows up in mesh telemetry.

graph:
  include_runtime_traces: true
  mesh_provider: istio
  brokers:
    - type: kafka
      bootstrap_servers: "kafka.platform.svc.cluster.local:9092"

Validating graph accuracy against incident history

The confidence level of any dependency graph source should be validated against your actual incident history. When an incident occurs and the postmortem identifies the root cause as a change that affected a downstream service, check whether that dependency was present in your graph at the time of the merge. If it was visible, the blast radius calculation should have flagged the change. If it wasn't visible, that's a gap in your graph sources.

Teams that do this validation consistently find that the graph is most accurate for direct HTTP dependencies (high mesh telemetry coverage), less accurate for message broker dependencies (requires broker integration to be configured), and least accurate for database schema dependencies (no telemetry surface — these require schema migration analysis). Knowing which categories your graph covers well and which it misses tells you where to invest in additional analysis coverage.

We're not saying service mesh telemetry is the complete solution to dependency visibility. We're saying it's the highest-accuracy source for the largest category of microservice dependencies — direct service-to-service network calls — and it should be the primary source rather than documentation or static analysis alone for teams that have already deployed a service mesh.