theplatformlog

Generating Backstage scaffolder templates from one sentence

Building template-authoring-backend — zod-constrained LLM output, semantic post-validation, and the future ActionsRegistry hook.

· Backstage AI plugins, part 5

backstageaillmscaffolderclaudetypescriptstructured-output

The previous two posts in this series shipped two AI plugins for Backstage: an MCP scaffolder client (post 2) and a catalog-aware LLM Q&A backend (post 3). This post is the third plugin: a template-authoring backend.

The pitch: you describe what you want — “a Node.js microservice with Express, structured logging, and GitHub Actions CI” — and the plugin returns a runnable Backstage scaffolder Template entity YAML. With citations. Constrained at the model level so the LLM literally cannot emit an invalid Template structure.

What the endpoint does

POST /api/template-authoring/v1/generate takes:

{
  "description": "A Node.js microservice with Express, OTel tracing, and GitHub Actions CI",
  "referenceTemplates": ["template:default/nodejs-base"]
}

And returns:

{
  "yaml": "apiVersion: scaffolder.backstage.io/v1beta3\nkind: Template\n...",
  "template": { "apiVersion": "...", "kind": "Template", ... },
  "citations": {
    "referenceTemplates": ["template:default/nodejs-base"],
    "actionsUsed": ["fetch:template", "publish:github", "catalog:register"]
  },
  "warnings": []
}

referenceTemplates is optional. When supplied, each entity ref is fetched from the catalog and embedded in the user prompt as an example of the step layout — but the system prompt tells the model not to return them verbatim.

Architecture

Architecture diagram

Four moving parts, plus the router:

Why generateObject and not generateText

If you ask an LLM for “YAML,” you sometimes get back markdown code fences. Sometimes you get back commentary around the YAML. Sometimes you get back invalid YAML that needs re-parsing.

generateObject (from the Vercel AI SDK) constrains the model’s response to JSON that conforms to a zod schema. The model’s sampling is biased against producing tokens that would break the schema. For Backstage Templates, JSON and YAML are isomorphic — so we get JSON from the model and emit YAML to the caller via yaml.stringify.

The schema:

export const TemplateSchema = z.object({
  apiVersion: z.literal('scaffolder.backstage.io/v1beta3'),
  kind: z.literal('Template'),
  metadata: z.object({
    name: z.string().regex(
      /^[a-z0-9]([-a-z0-9]*[a-z0-9])?$/,
      'metadata.name must be kebab-case (lowercase, hyphens)',
    ),
    title: z.string().optional(),
    description: z.string().optional(),
    tags: z.array(z.string()).optional(),
  }),
  spec: z.object({
    owner: z.string(),
    type: z.string(),
    parameters: z.array(/* ... */).optional(),
    steps: z.array(z.object({
      id: z.string(),
      name: z.string().optional(),
      action: z.enum(STEP_ACTION_IDS),   // ← curated whitelist
      input: z.record(z.unknown()).optional(),
      if: z.string().optional(),
    })).min(1, 'spec.steps must contain at least one step'),
    output: z.record(z.unknown()).optional(),
  }),
});

The action: z.enum(STEP_ACTION_IDS) is the load-bearing constraint. STEP_ACTION_IDS is built from the well-known-actions catalog — which means the model literally cannot invent an action name. If it tries to emit action: "wishful:thinking", the response is rejected at the SDK level before it ever reaches my code.

The kebab-case regex on metadata.name is the second load-bearing constraint. Backstage rejects entity names that don’t match this regex at catalog ingestion time. Catching it at generation time means a generated Template will register cleanly.

Why a curated action catalog (for now)

The cleanest version of this plugin would source the list of available actions from the runtime ActionsRegistry — so the model only ever sees actions the host backend has actually loaded, including third-party actions like Phase 1’s mcp:call. That’s the natural next iteration.

v1 uses a static curated catalog of 12 action ids hard-coded in the plugin. The trade-off: the plugin runs standalone without having to wire into another plugin’s internals. The catalog covers the common path (fetch:*, publish:github, publish:gitlab, catalog:register, debug:log, filesystem:*, plus mcp:call as a hand-wave at Phase 1).

The README spells this out as a known limitation. The next revision should source from the registry.

Semantic validation beyond the schema

TemplateValidator runs three checks zod can’t:

// 1. Step references inside ${{ steps.X.* }} must resolve to declared step ids.
const refRegex = /\$\{\{\s*steps\.([a-zA-Z0-9_-]+)\./g;
for (const step of template.spec.steps) {
  for (const refId of extractStepRefs(step.input, refRegex)) {
    if (!stepIds.has(refId)) {
      warnings.push(`step '${step.id}' references unknown step '${refId}' in its input`);
    }
  }
}
// 3. Ordering hints (advisory).
const firstAction = template.spec.steps[0]?.action;
if (firstAction && !firstAction.startsWith('fetch:')) {
  warnings.push(
    `first step uses '${firstAction}'; templates typically start with a fetch:* step to populate the workspace`,
  );
}
const publishIdx = template.spec.steps.findIndex(s => s.action.startsWith('publish:'));
const registerIdx = template.spec.steps.findIndex(s => s.action === 'catalog:register');
if (publishIdx >= 0 && registerIdx >= 0 && publishIdx > registerIdx) {
  warnings.push(
    `catalog:register appears before publish:*; the published repo URL is normally registered after publishing`,
  );
}

Failures don’t fail the request — they come back as a warnings[] field. That keeps the endpoint useful during early iteration when the model is occasionally producing something almost right.

The expected next iteration is a one-shot self-correction pass: when the validator returns warnings, feed them back to the LLM as a second prompt with “fix these specific issues, keep everything else.” Bounded to one extra call so cost doesn’t compound.

What surprised me

Two things, in order of importance:

1. no-cond-assign ESLint rule blocks the classic while ((m = regex.exec(s)) !== null) pattern. I’d written the step-reference extractor as a while loop assigning the exec result inside the condition — works fine, ten thousand JavaScript files do it. Backstage’s lint config rejects it. The fix is to use s.matchAll(regex) which returns an iterator and reads much cleaner anyway. Solid rule.

2. Defaulting spec.owner to a static fallback hides a real signal. The model frequently leaves spec.owner blank — there’s no way for it to know which Backstage group should own the new template. I implemented a defaultOwner config option (group:default/unowned by default) so the response is at least schema-valid. But emitting it silently would make the missing-owner case invisible to the caller. The warnings field surfaces it: "spec.owner was missing; defaulted to 'X'" — the caller can decide whether to prompt the user for a real owner before saving.

Where this fits in the series

Three of the four plugins in the AI-on-Backstage track are now live on the fork:

PhasePluginStatus
1scaffolder-backend-module-mcp✅ shipped
2catalog-assistant-backend✅ shipped
3template-authoring-backendthis post
4Incident investigation co-pilot (backstage-corp)📐 in design

Each composes the same Vercel-AI-SDK shape, so all three swap behind the BEP-0015 AI Model Provider Service when it lands with a contained refactor.

Phase 4 is the diagonal of the 2x2 from post 4: it composes the other three as infrastructure. That’s what the next post in the series will cover.

Code

Branch lives at Naga15/backstage feat/template-authoring-backend. 19 unit tests across 4 files. Lint clean. Not upstream-PR’d yet — net-new plugin contributions in Backstage want RFC signal first.

Next post: Phase 4. Reading the catalog, reading logs and traces, reading recent deploys — and acting on what’s read, within a pre-approved scope. Build-first, blog the journey.