Building AI plugins for Backstage: a four-part roadmap
Where the work fits — the RFCs, the BEP, the plugins shipped, and the ones still ahead.
backstageaimcproadmaprfcbep
The previous three posts in this series (1, 2, 3) each shipped one artifact: a pair of upstream fixes, then a scaffolder MCP client, then a catalog Q&A backend. The fourth artifact — an AI template-authoring backend — also shipped while this post was being drafted; post 5 covers it. This post zooms out.
There’s a coherent shape to “AI inside Backstage” that didn’t exist a year ago and is now coming into focus across multiple RFCs, a BEP, and a handful of new plugins. This post is a map: what’s been formally proposed, what’s been shipped, what I’m building, and where the gaps still are.
The Backstage AI surface, as of mid-2026
Four things are happening at once upstream:
-
RFC #32062 — Modeling of MCP Servers in the Catalog. Closed. Resolution: extend the existing
APIkind with a discriminated union onspec.type, so MCP servers become first-class catalog entities. AnMcpServerDiscoveryProcessoris in flight to auto-register them. -
RFC #33575 — Introduce an
AIContext/AIResourcekind. Open and active. The kind shipped asAIResourcein Backstage 1.51. Catalogs skills and rules for AI coding agents (Claude Code, Copilot, Cursor). An automaticSKILL.mdprovider is planned for an upcoming release. -
RFC #33865 → BEP-0015 — AI Model Provider Service. Open, BEP under review. A core abstraction so plugins can call
generateText/streamText/embedetc. against any provider (OpenAI, Anthropic, Bedrock, Google, Ollama, …) through one interface, with provider implementations published as backend modules. Vercel AI SDK is the protocol contract. -
mcp-actions-backend— already shipped. Exposes Backstage actions as MCP tools so external agents can act on the catalog.
Together these form the production side of AI in Backstage: how AI resources are modeled, how external agents talk to Backstage, how Backstage talks to providers. That’s three of the four corners.
What’s missing is the consumption side: plugins that take advantage of all this infrastructure to deliver actual AI-powered features to humans inside Backstage.
That’s the gap this four-plugin plan aims to fill.
The four-plugin plan
The dotted arrows are the reuse story: every later phase composes the infrastructure built by earlier ones, which is why Phase 4 (incident co-pilot) is last — it depends on all three others as ingredients.
Each plugin sits at a deliberate point on a 2x2 of who is doing the AI and what they’re doing with the catalog:
| Read the catalog | Change the world | |
|---|---|---|
| Human asks a question | Phase 2: catalog Q&A | Phase 3: AI template authoring |
| AI does the action | (mcp-actions-backend, upstream) | Phase 1: MCP scaffolder client |
The diagonal is suggestive: Phase 4 (incident investigation) goes in the “AI changes the world based on what it read about the world” corner, which is why it’s the most ambitious and lives in a separate enterprise repo.
Phase 1: scaffolder-backend-module-mcp (shipped)
Backstage scaffolder gains action: mcp:call — invoke any tool on any MCP
server from a template step. App-config declares servers, the registry
lazily spawns them and reuses the connection.
Why this comes first: it has the widest applicability with the smallest surface area. Any MCP server (and there are dozens already) becomes available to templates immediately, without writing a new Backstage-specific action per integration. The blast radius is tiny — one new action — and the upside compounds with every MCP server that ships in the broader ecosystem.
Post 2 walks through the design and code.
Phase 2: catalog-assistant-backend (shipped)
Natural-language Q&A over the catalog with grounded answers and entity-ref citations. Keyword retrieval today, embedding-based retrieval later, tool-use (multi-step graph traversal) once BEP-0015 exposes it.
Why this is Phase 2 and not Phase 1: it depends on a real LLM call out
of the gate, which means provider config, API keys, and a dependency on
either @ai-sdk/anthropic or (eventually) the AI Provider Service. Phase 1
needed none of that — just an MCP transport and stdio.
Post 3 walks through the design and code.
Phase 3: AI-assisted template authoring (shipped)
“Generate a Backstage scaffolder template for spinning up a new Node.js microservice with our standard logging, tracing, and CI.”
Shipped as @backstage/plugin-template-authoring-backend. The endpoint
takes a free-text description plus optional reference template refs and
returns a runnable v1beta3 Template entity YAML with citations.
The big design decision was generateObject over generateText: the
LLM’s output is constrained at the model level by a zod schema. The
schema enforces:
apiVersion === 'scaffolder.backstage.io/v1beta3',kind === 'Template'metadata.namematches Backstage’s kebab-case rulespec.steps[].actionis one of a curated whitelist of action ids — the model literally cannot invent action names.spec.stepsis non-empty
A TemplateValidator runs semantic checks beyond the schema: step-ref
resolution against declared step ids, action whitelist double-check, and
ordering hints (templates should typically fetch:* first;
catalog:register should come after publish:*). Failures return as
warnings, not exceptions.
Open question that the v1 punted on: dynamic vs. static action
enumeration. The current build uses a static curated catalog of well-known
action ids. The right answer is to source the catalog from the runtime
ActionsRegistry so the model only ever sees actions the host backend
has actually loaded — including third-party actions like Phase 1’s
mcp:call. That’s the natural next iteration.
Post 5 is the full walkthrough.
Phase 4: AI incident investigation co-pilot (in design, separate repo)
This one is bigger than the other three combined and lives in a separate
repo (backstage-corp — enterprise track). Brief sketch:
- An incident comes in. The co-pilot has read access to:
- The catalog (services, owners, dependencies)
- Logs and traces (via integrations to whatever the org uses)
- Recent deploys (via the existing scaffolder task history + CI hooks)
- It produces:
- A short summary: what’s affected, who owns it, what changed recently.
- A ranked list of candidate causes with evidence links.
- A suggested action plan (rollback / hotfix / page-on-call).
- Two operating modes:
- Co-pilot: runs alongside an on-call engineer in the UI, suggests, never acts without confirmation.
- Agent: scoped autonomy for a small set of pre-approved actions (kick off a rollback scaffolder template, post to the incident channel, page a secondary on-call).
Why this is Phase 4 and not Phase 2: it needs every other phase as infrastructure. The catalog assistant is the read path. The MCP scaffolder is the write path. Template authoring (Phase 3) is the generator for the rollback templates the agent would invoke. Without those three, you end up writing yet another bespoke AI integration for one use case. With them, the co-pilot is the composition.
Where the formal RFCs intersect
| Phase | Depends on / interacts with |
|---|---|
| 1 (MCP scaffolder client) | RFC #32062 (catalog model) — eventually resolve server input as a catalog ref instead of an app-config key |
| 2 (catalog Q&A) | BEP-0015 (AI Provider Service) — refactor target |
| 3 (template authoring) | BEP-0015 (structured output) + #32062 (templates may reference MCP entities) |
| 4 (incident co-pilot) | All of the above + #33575 (skills / rules for the agent’s behavior) |
Nothing here is gated on the upstream work — each plugin ships against
direct dependencies (raw MCP SDK, @ai-sdk/anthropic) today and refactors
behind the upstream abstractions as they land. That’s the price of moving
before the BEP merges. Worth it.
What I’d ask of the Backstage AI community
Three open questions where outside input would shape the plugins above:
-
Where does retrieval over the catalog belong? The catalog Q&A plugin reinvents retrieval today. If the catalog itself shipped a “search-with-scoring” API (or, eventually, “embeddings-aware search”), half of the
catalog-assistant-backendcollapses to a thin orchestrator. Is there appetite for that? -
Should
mcp:call-style actions move into core? Oncemcp-actions-backendandscaffolder-backend-module-mcpboth exist, the scaffolder has both ends of the MCP protocol natively. Worth formalising as part of a future Backstage release? -
AI Provider Service tool-use story. BEP-0015 covers text/structured/embed but doesn’t yet pin down how
generateText({ tools })works. Without it, every plugin that needs tool use will bake its own glue. Worth front-loading the design.
I’ve drafted comments on the relevant RFCs that I’ll post separately once the work above has shipped a bit more — easier to push back with concrete implementation evidence than with hypotheticals.
Subscribe / follow along
- The fork: github.com/Naga15/backstage
- The shipped branches:
Phase 4 (incident investigation co-pilot) is what’s left, and it lives in a separate repo because it composes everything else as infrastructure. Plan stays the same: build-first, blog the journey.