The Polyrepo Graph
Why One Graph
Section titled “Why One Graph”A per-repo index can tell you what exists inside one codebase. It answers:
- what functions are defined in this file
- what symbols does this module export
- where is this class used within this repo
It is weaker at answering the questions that matter most in a multi-repo platform:
- which frontend caller reaches this backend route
- which repos depend on the same shared contract version
- which event is produced in one service and consumed in two others
- which downstream services are likely affected by a change to this type
The cross-repo blast radius problem is not a search problem. It is a graph traversal problem. The answer requires not just finding the relevant code in each repo, but knowing how those pieces connect to each other across repo boundaries.
The polyrepo code graph exists to make those cross-repo relationships first-class, queryable edges rather than implicit knowledge held in senior engineers’ heads.
Node Kinds
Section titled “Node Kinds”The graph uses a typed node model. Every node has a kind, a deterministic ID, a repo attribution (or a marker indicating it is workspace-level), and source span information where applicable.
Structural Nodes
Section titled “Structural Nodes”These represent static code structure as extracted from source files:
| Kind | What it represents |
|---|---|
File | A source file in a configured repo |
Function | A function or method definition |
Class | A class declaration |
Type | A named type alias or interface |
Module | A module-level namespace or barrel |
Import | An import declaration |
Decorator | A decorator application on a class, method, or parameter |
Domain and Runtime Nodes
Section titled “Domain and Runtime Nodes”These represent framework-level and runtime concepts extracted by framework-aware rules:
| Kind | What it represents |
|---|---|
Entity | A schema or model construct (e.g., a database entity or Mongoose document) |
Route | An HTTP route, modeled as a virtual node shared between handler and callers |
Topic | A message topic (e.g., Kafka topic), shared across producers and consumers |
Queue | A message queue |
Subject | A pub-sub subject (e.g., NATS subject) |
Stream | A message stream |
Event | A named domain event |
Service | A service-level node derived from framework registration |
Cross-Repo Abstraction Nodes
Section titled “Cross-Repo Abstraction Nodes”These are the bridge points that connect separate repositories in the graph:
| Kind | What it represents |
|---|---|
SharedSymbol | A symbol exported by a shared package, keyed by package name, version, and symbol name |
PayloadContract | An inferred payload shape for a topic or queue, per side (producer or consumer) |
Repository and Workflow Nodes
Section titled “Repository and Workflow Nodes”These support git-aware intelligence and ownership signals:
| Kind | What it represents |
|---|---|
Repo | A configured repository in the workspace |
Convention | A detected coding or architectural convention |
Commit | A git commit |
PR | A pull request |
Review | A review on a PR |
Comment | A comment on a PR or review |
Author | A code author derived from git history |
Ticket | A linked issue or ticket reference |
Edge Kinds
Section titled “Edge Kinds”Edges are also typed. The graph stores edge kind, source node, target node, resolver name, and confidence on every edge.
Static Code Relationships
Section titled “Static Code Relationships”| Kind | What it connects |
|---|---|
Defines | File or class defines a function, type, or member |
Calls | Function calls another function |
Imports | Module imports another module or symbol |
Exports | Module exports a symbol |
Extends | Class extends a base class |
Implements | Class implements an interface |
References | Node references another node without a direct call or import |
DependsOn | Manifest-level package dependency |
UsesDecorator | A class or method applies a decorator |
Runtime and System Relationships
Section titled “Runtime and System Relationships”| Kind | What it connects |
|---|---|
Publishes | A function or class publishes to a topic, queue, or event |
Consumes | A function or class consumes from a topic, queue, or event |
Triggers | An event triggers a handler |
Serves | A handler serves a route |
PersistsTo | A handler or service persists to a storage entity |
UsesShared | A node uses a shared symbol from a cross-repo package |
Analysis and History Relationships
Section titled “Analysis and History Relationships”| Kind | What it connects |
|---|---|
BreaksIfChanged | A change to this node would likely break the target |
CoChangesWith | Two nodes are frequently modified together (git-derived) |
OwnedBy | A file or node is attributed to an author |
CrossRepoDepends | A cross-repo dependency edge beyond the shared-symbol mechanism |
PropagatesEvent | An event propagates from one node to a downstream consumer |
DriftsFrom | A payload contract field drifts from its counterpart on the other side |
ContractOn | A payload contract is attached to a specific topic or queue |
Deterministic IDs
Section titled “Deterministic IDs”Node IDs are not random UUIDs. They are computed deterministically from the node’s semantic identity: its repo, file path, kind, and canonical name.
This matters for three reasons:
-
Idempotent re-indexing. Re-running
gather-step indexon an unchanged workspace produces an identical graph. The same node identity is found in the same table slot. No orphan nodes accumulate from repeated runs. -
Stable cross-repo attachment. A virtual node for a route or topic has the same ID every time it is computed, regardless of which repo triggers its creation. When repo A defines a handler and repo B defines a caller, both attach to the same route node because both compute the same stitch key.
-
Compact downstream analysis. Analysis functions can cache results keyed by node ID without worrying that a re-index has changed which entity a given ID refers to.
Virtual Nodes and Stitch Keys
Section titled “Virtual Nodes and Stitch Keys”Cross-repo relationships are normalized through virtual nodes. A virtual node is a graph node whose identity is derived from a canonical external name — a route, topic, queue, or shared symbol — rather than from a physical file location.
The stitch key is the canonical qualified name used to compute the virtual node’s ID. Different repos producing or consuming the same external surface compute the same stitch key, find the same virtual node (creating it if it does not yet exist), and attach their local nodes to it.
Implemented stitch key formats:
| Surface | Stitch key format | Example |
|---|---|---|
| HTTP route | __route__METHOD__/path | __route__POST__/orders |
| Kafka / message topic | __topic__protocol__name | __topic__kafka__order.created |
| Message queue | __queue__protocol__name | __queue__rabbitmq__invoicing |
| Shared symbol | __shared__package@version__symbol | [email protected]__OrderDto |
Route matching also normalizes HTTP method aliases (for example, FETCH is normalized to GET) and strips trailing slashes before computing the stitch key, so equivalent route definitions from different codebases converge to the same node.
What This Enables
Section titled “What This Enables”Once virtual nodes are in place, cross-repo graph questions become single traversals:
- “Which frontends call
/orders?” — find theRoutevirtual node for__route__POST__/orders, walk incomingConsumesedges, collect theFunctionnodes on the caller side. - “Which services consume
order.created?” — find theTopicvirtual node for__topic__kafka__order.created, walk incomingConsumesedges, collect the handler functions and their owning repos. - “What is the blast radius of changing
OrderDtoin the contracts package?” — find theSharedSymbolvirtual node, walkUsesSharededges, collect allFileandFunctionnodes that depend on it, group by repo.
None of those queries require knowledge of which repo each side lives in. The virtual node handles the stitching.
Why This Wins for AI Retrieval
Section titled “Why This Wins for AI Retrieval”When an AI assistant asks “what would break if I change this event’s payload?”, it needs a precomputed graph answer, not another scan of raw source files.
The polyrepo code graph provides:
- Stable IDs. An assistant can reference a node ID across sessions without it changing on re-index.
- Repo-aware search. Results are attributed to the repo they came from, so the assistant can tell the engineer “this consumer is in
repo_beta, filesrc/handlers/order.handler.ts.” - Route and event topology. The graph already knows which functions produce or consume which topics. No reasoning from text is needed.
- Bounded task packs. Instead of returning a raw multi-hop neighborhood, the context pack system slices the graph into a byte-budgeted, mode-specific bundle shaped for the specific task at hand.
The retrieval step has already happened. The assistant can focus on synthesis.