Building AI plugins for Backstage: a four-part roadmap

Where the work fits — the RFCs, the BEP, the plugins shipped, and the ones still ahead.

Jun 3, 2026 · Backstage AI plugins, part 4

backstageaimcproadmaprfcbep

The previous three posts in this series (1, 2, 3) each shipped one artifact: a pair of upstream fixes, then a scaffolder MCP client, then a catalog Q&A backend. The fourth artifact — an AI template-authoring backend — also shipped while this post was being drafted; post 5 covers it. This post zooms out.

There’s a coherent shape to “AI inside Backstage” that didn’t exist a year ago and is now coming into focus across multiple RFCs, a BEP, and a handful of new plugins. This post is a map: what’s been formally proposed, what’s been shipped, what I’m building, and where the gaps still are.

The Backstage AI surface, as of mid-2026

Four things are happening at once upstream:

RFC #32062 — Modeling of MCP Servers in the Catalog. Closed. Resolution: extend the existing API kind with a discriminated union on spec.type, so MCP servers become first-class catalog entities. An McpServerDiscoveryProcessor is in flight to auto-register them.
RFC #33575 — Introduce an AIContext / AIResource kind. Open and active. The kind shipped as AIResource in Backstage 1.51. Catalogs skills and rules for AI coding agents (Claude Code, Copilot, Cursor). An automatic SKILL.md provider is planned for an upcoming release.
RFC #33865 → BEP-0015 — AI Model Provider Service. Open, BEP under review. A core abstraction so plugins can call generateText / streamText / embed etc. against any provider (OpenAI, Anthropic, Bedrock, Google, Ollama, …) through one interface, with provider implementations published as backend modules. Vercel AI SDK is the protocol contract.
mcp-actions-backend — already shipped. Exposes Backstage actions as MCP tools so external agents can act on the catalog.

Together these form the production side of AI in Backstage: how AI resources are modeled, how external agents talk to Backstage, how Backstage talks to providers. That’s three of the four corners.

What’s missing is the consumption side: plugins that take advantage of all this infrastructure to deliver actual AI-powered features to humans inside Backstage.

That’s the gap this four-plugin plan aims to fill.

The four-plugin plan

Architecture diagram

The dotted arrows are the reuse story: every later phase composes the infrastructure built by earlier ones, which is why Phase 4 (incident co-pilot) is last — it depends on all three others as ingredients.

Each plugin sits at a deliberate point on a 2x2 of who is doing the AI and what they’re doing with the catalog:

	Read the catalog	Change the world
Human asks a question	Phase 2: catalog Q&A	Phase 3: AI template authoring
AI does the action	(mcp-actions-backend, upstream)	Phase 1: MCP scaffolder client

The diagonal is suggestive: Phase 4 (incident investigation) goes in the “AI changes the world based on what it read about the world” corner, which is why it’s the most ambitious and lives in a separate enterprise repo.

Phase 1: `scaffolder-backend-module-mcp` (shipped)

Backstage scaffolder gains action: mcp:call — invoke any tool on any MCP server from a template step. App-config declares servers, the registry lazily spawns them and reuses the connection.

Why this comes first: it has the widest applicability with the smallest surface area. Any MCP server (and there are dozens already) becomes available to templates immediately, without writing a new Backstage-specific action per integration. The blast radius is tiny — one new action — and the upside compounds with every MCP server that ships in the broader ecosystem.

Post 2 walks through the design and code.

Phase 2: `catalog-assistant-backend` (shipped)

Natural-language Q&A over the catalog with grounded answers and entity-ref citations. Keyword retrieval today, embedding-based retrieval later, tool-use (multi-step graph traversal) once BEP-0015 exposes it.

Why this is Phase 2 and not Phase 1: it depends on a real LLM call out of the gate, which means provider config, API keys, and a dependency on either @ai-sdk/anthropic or (eventually) the AI Provider Service. Phase 1 needed none of that — just an MCP transport and stdio.

Post 3 walks through the design and code.

Phase 3: AI-assisted template authoring (shipped)

“Generate a Backstage scaffolder template for spinning up a new Node.js microservice with our standard logging, tracing, and CI.”

Shipped as @backstage/plugin-template-authoring-backend. The endpoint takes a free-text description plus optional reference template refs and returns a runnable v1beta3 Template entity YAML with citations.

The big design decision was generateObject over generateText: the LLM’s output is constrained at the model level by a zod schema. The schema enforces:

apiVersion === 'scaffolder.backstage.io/v1beta3', kind === 'Template'
metadata.name matches Backstage’s kebab-case rule
spec.steps[].action is one of a curated whitelist of action ids — the model literally cannot invent action names.
spec.steps is non-empty

A TemplateValidator runs semantic checks beyond the schema: step-ref resolution against declared step ids, action whitelist double-check, and ordering hints (templates should typically fetch:* first; catalog:register should come after publish:*). Failures return as warnings, not exceptions.

Open question that the v1 punted on: dynamic vs. static action enumeration. The current build uses a static curated catalog of well-known action ids. The right answer is to source the catalog from the runtime ActionsRegistry so the model only ever sees actions the host backend has actually loaded — including third-party actions like Phase 1’s mcp:call. That’s the natural next iteration.

Post 5 is the full walkthrough.

Phase 4: AI incident investigation co-pilot (in design, separate repo)

This one is bigger than the other three combined and lives in a separate repo (backstage-corp — enterprise track). Brief sketch:

An incident comes in. The co-pilot has read access to:
- The catalog (services, owners, dependencies)
- Logs and traces (via integrations to whatever the org uses)
- Recent deploys (via the existing scaffolder task history + CI hooks)
It produces:
- A short summary: what’s affected, who owns it, what changed recently.
- A ranked list of candidate causes with evidence links.
- A suggested action plan (rollback / hotfix / page-on-call).
Two operating modes:
- Co-pilot: runs alongside an on-call engineer in the UI, suggests, never acts without confirmation.
- Agent: scoped autonomy for a small set of pre-approved actions (kick off a rollback scaffolder template, post to the incident channel, page a secondary on-call).

Why this is Phase 4 and not Phase 2: it needs every other phase as infrastructure. The catalog assistant is the read path. The MCP scaffolder is the write path. Template authoring (Phase 3) is the generator for the rollback templates the agent would invoke. Without those three, you end up writing yet another bespoke AI integration for one use case. With them, the co-pilot is the composition.

Where the formal RFCs intersect

Phase	Depends on / interacts with
1 (MCP scaffolder client)	RFC #32062 (catalog model) — eventually resolve `server` input as a catalog ref instead of an app-config key
2 (catalog Q&A)	BEP-0015 (AI Provider Service) — refactor target
3 (template authoring)	BEP-0015 (structured output) + #32062 (templates may reference MCP entities)
4 (incident co-pilot)	All of the above + #33575 (skills / rules for the agent’s behavior)

Nothing here is gated on the upstream work — each plugin ships against direct dependencies (raw MCP SDK, @ai-sdk/anthropic) today and refactors behind the upstream abstractions as they land. That’s the price of moving before the BEP merges. Worth it.

What I’d ask of the Backstage AI community

Three open questions where outside input would shape the plugins above:

Where does retrieval over the catalog belong? The catalog Q&A plugin reinvents retrieval today. If the catalog itself shipped a “search-with-scoring” API (or, eventually, “embeddings-aware search”), half of the catalog-assistant-backend collapses to a thin orchestrator. Is there appetite for that?
Should mcp:call-style actions move into core? Once mcp-actions-backend and scaffolder-backend-module-mcp both exist, the scaffolder has both ends of the MCP protocol natively. Worth formalising as part of a future Backstage release?
AI Provider Service tool-use story. BEP-0015 covers text/structured/embed but doesn’t yet pin down how generateText({ tools }) works. Without it, every plugin that needs tool use will bake its own glue. Worth front-loading the design.

I’ve drafted comments on the relevant RFCs that I’ll post separately once the work above has shipped a bit more — easier to push back with concrete implementation evidence than with hypotheticals.

The fork: github.com/Naga15/backstage
The shipped branches:
- fix/gitlab-repo-push-empty-actions → upstream #34480
- fix/scaffolder-dryrun-task-secrets → upstream #34481
- feat/scaffolder-backend-module-mcp
- feat/catalog-assistant-backend
- feat/template-authoring-backend

Phase 4 (incident investigation co-pilot) is what’s left, and it lives in a separate repo because it composes everything else as infrastructure. Plan stays the same: build-first, blog the journey.