Organization

Distributed service ownership: when the catalog isn't enough

Carter Hughes · January 28, 2026 · 7 min read

Service ownership matrix showing squad assignments and SLA tiers

Backstage and similar service catalogs solve a real problem: at 40+ services across six teams, nobody can remember who owns what. But ownership data alone isn't what you need the night your payment pipeline starts failing. You need to know which service changed, which contract it broke, and whose code is now throwing 500s — and the catalog gives you only the last of those three.

What service catalogs are actually designed to answer

The questions a service catalog is built to answer:

Who owns service X?
What is the SLA for service X?
Where is the runbook for service X?
Who is on-call for service X this week?

These are legitimate, valuable operational questions. When a PagerDuty alert fires at 2am, the on-call engineer needs the runbook without digging through Slack history. Backstage is well-designed for this. It's a searchable, structured registry of service metadata, and teams that maintain it consistently find it genuinely reduces incident response time — particularly the time spent in the "who do I escalate to?" phase.

But catalogs are not designed to answer this question: "I'm about to rename the userId field in auth-service's token response — which other services will break, and do I need to coordinate with the teams that own them before I merge?" That question requires a different data structure entirely.

Ownership is a node attribute; dependency is a graph edge

This is the conceptual distinction that unlocks the design of the right tooling. Ownership is an attribute of a service node: team name, SLA tier, runbook URL, on-call schedule. Dependency is a directed edge between service nodes, typed by the contract relationship (REST API call, gRPC RPC, event subscription, SDK import). A catalog stores node attributes. A dependency graph stores edges.

Both are necessary. An edge without node attributes is technically complete but operationally useless — knowing that order-processor consumes payment-api's checkout endpoint tells you nothing about who to call at 2am. A catalog without edges tells you who owns what but not what will cascade when a team ships a breaking change to a shared contract.

Teams that have only the catalog discover the gap through incidents. A payments engineer renames transactionId to txnId in their checkout response body. The catalog told them who to contact for their own service. It didn't tell them that fraud-detection, owned by a squad they've never interacted with, has been deserializing that field by name for 14 months. The rename ships. Fraud detection starts logging deserialization errors silently. Two days later, a fraud decision threshold misfires and triggers a manual review queue backup. The postmortem lists root cause: "unknown downstream consumer."

The catalog couldn't have prevented this. A dependency graph with field-level resolution would have surfaced fraud-detection's declared dependency on transactionId at PR time.

Where CODEOWNERS breaks down past 30 services

Many platform teams start with CODEOWNERS as their ownership model. At ten services, it's adequate — a file that declares which team owns which path, enforced at PR review time. The problem at scale isn't that CODEOWNERS becomes wrong; it's that it operates at the wrong level of abstraction.

CODEOWNERS captures file-level ownership within a repository. It has no concept of cross-repo contract dependency. A consumer service that imports a producer service's generated client SDK and calls a specific endpoint doesn't share any files with that producer — they're in separate repos, separate CODEOWNERS, separate review groups. When the producer changes that endpoint, CODEOWNERS has no mechanism to require consumer team review. The consumer's engineers find out when their integration tests fail, if they have integration tests, or when their service starts 500-ing in production if they don't.

The ownership model that works at scale is contract ownership, not file ownership. The producer team owns the API contract surface: backward compatibility guarantees, versioning policy, deprecation notice period. The consumer teams declare their dependency on the contract in schema manifest files: they're responsible for testing against contract changes and acknowledging breaking changes in the producer's CI before merge. This model requires that dependencies be declared and that CI validates those declarations — but it integrates naturally with catalog data. When the dependency graph surfaces a consumer impact, it resolves the owning team from the catalog and routes the notification automatically.

The integration architecture that works at scale

The combination that works is not "replace Backstage with a dependency graph tool" — it's "use Backstage as the ownership source of truth, and feed that data into the dependency graph as node attributes." The two tools are complementary at the data layer:

Backstage stores: service name → team owner, SLA tier, runbook URL, on-call schedule, Slack channel. This is slow-moving metadata: team ownership changes infrequently, SLA tiers are set by architecture decision, runbooks are updated on incident. A catalog is the right data store for this — it's human-curated, structured, and well-suited to search and browse.

The dependency graph stores: service → service directed edges based on schema declarations, updated on every PR that touches a schema file. This is fast-moving operational data: consumer relationships change when teams add features, deprecate endpoints, or restructure their event topics. A static catalog page is not the right data store for this — it would need a human update on every schema change to stay current.

The integration: When the dependency graph runs a pre-merge impact analysis and identifies that fraud-detection will break if the proposed transactionId rename merges, it resolves the owning team from the catalog and surfaces the Slack channel and on-call contact directly in the CI check output. The engineer shipping the change doesn't need to open Backstage, search for the service, navigate to the team page, and look up the contact — the impact report brings that data into the PR context where the engineer is already making the decision.

The result: both tools become more valuable in combination. The catalog's ownership data becomes actionable at deploy time, not just at incident time. The dependency graph's impact reports are navigable — named owners and contact paths — not just a list of service identifiers.

The scale threshold where both become necessary

There's a specific organizational threshold where the combination of catalog and dependency graph shifts from "nice to have" to operationally required: the point at which no single engineer can hold the complete dependency topology in working memory.

At ten services, an experienced principal engineer probably knows every dependency relationship from recall. They'll catch the dangerous cross-service change in code review. At 40 services, that mental model is under pressure — the principal who used to hold it all is also a bottleneck, and they're going to be absent on the incident that proves it. At 80 services, no individual carries the complete graph. The question is not whether to externalize the mental model; it's how quickly you do it before an incident forces the conversation.

The catalog and dependency graph together make the mental model queryable by any engineer, at any point in the development workflow — on a Tuesday afternoon before opening a PR, not at 3am during an incident when the person who knows is unavailable. If your team is running 40+ services across more than three squads, you likely have a catalog already. The dependency graph is the second half of the operational picture, and its absence is the gap that shows up in your postmortem action items as "improve cross-team communication."

Buildpathio integrates with Backstage to pull team ownership metadata directly into dependency impact reports — so the right squad is always identified before a breaking change ships.

View Integrations