Generating Backstage scaffolder templates from one sentence
Building template-authoring-backend — zod-constrained LLM output, semantic post-validation, and the future ActionsRegistry hook.
backstageaillmscaffolderclaudetypescriptstructured-output
The previous two posts in this series shipped two AI plugins for Backstage: an MCP scaffolder client (post 2) and a catalog-aware LLM Q&A backend (post 3). This post is the third plugin: a template-authoring backend.
The pitch: you describe what you want — “a Node.js microservice with Express, structured logging, and GitHub Actions CI” — and the plugin returns a runnable Backstage scaffolder Template entity YAML. With citations. Constrained at the model level so the LLM literally cannot emit an invalid Template structure.
What the endpoint does
POST /api/template-authoring/v1/generate takes:
{
"description": "A Node.js microservice with Express, OTel tracing, and GitHub Actions CI",
"referenceTemplates": ["template:default/nodejs-base"]
}
And returns:
{
"yaml": "apiVersion: scaffolder.backstage.io/v1beta3\nkind: Template\n...",
"template": { "apiVersion": "...", "kind": "Template", ... },
"citations": {
"referenceTemplates": ["template:default/nodejs-base"],
"actionsUsed": ["fetch:template", "publish:github", "catalog:register"]
},
"warnings": []
}
referenceTemplates is optional. When supplied, each entity ref is
fetched from the catalog and embedded in the user prompt as an example
of the step layout — but the system prompt tells the model not to
return them verbatim.
Architecture
Four moving parts, plus the router:
wellKnownActions.ts— a curated catalog of 12 scaffolder action ids (fetch:template, publish:github, catalog:register, …) with input shape sketches. Injected into the LLM system prompt.ReferenceTemplateLoader— fetches Template entities by ref; rejects refs whosekindisn’tTemplate.TemplateGenerationService— builds the prompt, callsgenerateObjectwith a zod schema constraining the output, defaults missingspec.owner, stringifies to YAML.TemplateValidator— runs semantic checks the zod schema can’t.
Why generateObject and not generateText
If you ask an LLM for “YAML,” you sometimes get back markdown code fences. Sometimes you get back commentary around the YAML. Sometimes you get back invalid YAML that needs re-parsing.
generateObject (from the Vercel AI SDK) constrains the model’s
response to JSON that conforms to a zod schema. The model’s sampling is
biased against producing tokens that would break the schema. For
Backstage Templates, JSON and YAML are isomorphic — so we get JSON from
the model and emit YAML to the caller via yaml.stringify.
The schema:
export const TemplateSchema = z.object({
apiVersion: z.literal('scaffolder.backstage.io/v1beta3'),
kind: z.literal('Template'),
metadata: z.object({
name: z.string().regex(
/^[a-z0-9]([-a-z0-9]*[a-z0-9])?$/,
'metadata.name must be kebab-case (lowercase, hyphens)',
),
title: z.string().optional(),
description: z.string().optional(),
tags: z.array(z.string()).optional(),
}),
spec: z.object({
owner: z.string(),
type: z.string(),
parameters: z.array(/* ... */).optional(),
steps: z.array(z.object({
id: z.string(),
name: z.string().optional(),
action: z.enum(STEP_ACTION_IDS), // ← curated whitelist
input: z.record(z.unknown()).optional(),
if: z.string().optional(),
})).min(1, 'spec.steps must contain at least one step'),
output: z.record(z.unknown()).optional(),
}),
});
The action: z.enum(STEP_ACTION_IDS) is the load-bearing constraint.
STEP_ACTION_IDS is built from the well-known-actions catalog —
which means the model literally cannot invent an action name. If it
tries to emit action: "wishful:thinking", the response is rejected
at the SDK level before it ever reaches my code.
The kebab-case regex on metadata.name is the second load-bearing
constraint. Backstage rejects entity names that don’t match this regex
at catalog ingestion time. Catching it at generation time means a
generated Template will register cleanly.
Why a curated action catalog (for now)
The cleanest version of this plugin would source the list of available
actions from the runtime ActionsRegistry — so the model only ever
sees actions the host backend has actually loaded, including third-party
actions like Phase 1’s mcp:call. That’s the natural next iteration.
v1 uses a static curated catalog of 12 action ids hard-coded in the
plugin. The trade-off: the plugin runs standalone without having to wire
into another plugin’s internals. The catalog covers the common path
(fetch:*, publish:github, publish:gitlab, catalog:register,
debug:log, filesystem:*, plus mcp:call as a hand-wave at Phase 1).
The README spells this out as a known limitation. The next revision should source from the registry.
Semantic validation beyond the schema
TemplateValidator runs three checks zod can’t:
// 1. Step references inside ${{ steps.X.* }} must resolve to declared step ids.
const refRegex = /\$\{\{\s*steps\.([a-zA-Z0-9_-]+)\./g;
for (const step of template.spec.steps) {
for (const refId of extractStepRefs(step.input, refRegex)) {
if (!stepIds.has(refId)) {
warnings.push(`step '${step.id}' references unknown step '${refId}' in its input`);
}
}
}
// 3. Ordering hints (advisory).
const firstAction = template.spec.steps[0]?.action;
if (firstAction && !firstAction.startsWith('fetch:')) {
warnings.push(
`first step uses '${firstAction}'; templates typically start with a fetch:* step to populate the workspace`,
);
}
const publishIdx = template.spec.steps.findIndex(s => s.action.startsWith('publish:'));
const registerIdx = template.spec.steps.findIndex(s => s.action === 'catalog:register');
if (publishIdx >= 0 && registerIdx >= 0 && publishIdx > registerIdx) {
warnings.push(
`catalog:register appears before publish:*; the published repo URL is normally registered after publishing`,
);
}
Failures don’t fail the request — they come back as a warnings[]
field. That keeps the endpoint useful during early iteration when the
model is occasionally producing something almost right.
The expected next iteration is a one-shot self-correction pass: when the validator returns warnings, feed them back to the LLM as a second prompt with “fix these specific issues, keep everything else.” Bounded to one extra call so cost doesn’t compound.
What surprised me
Two things, in order of importance:
1. no-cond-assign ESLint rule blocks the classic while ((m = regex.exec(s)) !== null) pattern.
I’d written the step-reference extractor as a while loop assigning the
exec result inside the condition — works fine, ten thousand JavaScript
files do it. Backstage’s lint config rejects it. The fix is to use
s.matchAll(regex) which returns an iterator and reads much cleaner
anyway. Solid rule.
2. Defaulting spec.owner to a static fallback hides a real signal.
The model frequently leaves spec.owner blank — there’s no way for it
to know which Backstage group should own the new template. I implemented
a defaultOwner config option (group:default/unowned by default) so
the response is at least schema-valid. But emitting it silently would
make the missing-owner case invisible to the caller. The warnings
field surfaces it: "spec.owner was missing; defaulted to 'X'" —
the caller can decide whether to prompt the user for a real owner
before saving.
Where this fits in the series
Three of the four plugins in the AI-on-Backstage track are now live on the fork:
| Phase | Plugin | Status |
|---|---|---|
| 1 | scaffolder-backend-module-mcp | ✅ shipped |
| 2 | catalog-assistant-backend | ✅ shipped |
| 3 | template-authoring-backend | ✅ this post |
| 4 | Incident investigation co-pilot (backstage-corp) | 📐 in design |
Each composes the same Vercel-AI-SDK shape, so all three swap behind the BEP-0015 AI Model Provider Service when it lands with a contained refactor.
Phase 4 is the diagonal of the 2x2 from post 4: it composes the other three as infrastructure. That’s what the next post in the series will cover.
Code
Branch lives at
Naga15/backstage feat/template-authoring-backend.
19 unit tests across 4 files. Lint clean. Not upstream-PR’d yet —
net-new plugin contributions in Backstage want RFC signal first.
Next post: Phase 4. Reading the catalog, reading logs and traces, reading recent deploys — and acting on what’s read, within a pre-approved scope. Build-first, blog the journey.