Backstage as an LLM agent: a 5-cap tool-calling loop with propose-only tools

Building the AgentLoop for incident-copilot-backend — what earns the 'agent' framing, and the five stop conditions that keep it cheap.

Jun 4, 2026 · Backstage AI plugins, part 6

backstageaillmagentsclaudetypescriptmcp

Post 4 sketched a four-phase plan for AI plugins on Backstage. The fourth phase — an incident-investigation co-pilot — was the diagonal of the 2x2: read the catalog and change the world. The “change the world” part is what earns the word “agent” vs. “co-pilot.” This post is about that part.

The artifact is an AgentLoop class that ships in @internal/plugin-incident-copilot-backend (in the Naga15/backstage-corp private repo). It’s an LLM in a tool-calling loop with five independent stop conditions, a tool surface that’s split into read-only and propose-only kinds, and a citation validator that drops hypotheses whose evidence IDs don’t resolve. None of those design choices are accidental.

What “agent” should mean

The word is overloaded. The cheap definition is “an LLM that calls a function.” By that bar every chatbot with a Python interpreter is an agent.

The bar I want for an on-call SRE assistant is higher. Specifically:

The model decides which signals to pull and in what order — instead of the backend pre-fetching everything.
The model iterates: result → next-tool → result → next-tool → …
There’s a stop criterion the model can meet by emitting no more tool calls — i.e., the model decides it’s done.
There’s a budget the operator can meet that overrides #3 — wallclock, token, dollar, call-count.
Destructive actions are not actions at all. They are proposals the model emits. A human clicks them.

If you don’t have #5 specifically, you have an automation system, not an agent. The distinction matters because the prompt-injection threat model for “agent that can spend money / page on-call / rollback prod” is qualitatively different from one that can’t.

The tool surface, in three kinds

Architecture diagram

// src/agent/Tool.ts
export interface Tool<TInput = unknown> {
  readonly name: string;
  readonly description: string;
  readonly inputSchema: z.ZodSchema<TInput>;
  readonly kind: 'read' | 'propose' | 'record';
  handler(input: TInput, ctx: AgentToolContext): Promise<ToolResult>;
}

Three things to notice:

kind is part of the type. Not a string the handler returns later. Not something the loop infers. The class of action is fixed at registration time — you can audit your tool registry to see exactly which tools can propose anything, full stop.

inputSchema is a z.ZodSchema, not a string description. The ToolRegistry.invoke() method validates input against it before calling the handler. A bad-shape tool call becomes a tool error fed back to the LLM, not a crash:

async invoke(name, rawInput, ctx) {
  const tool = this.map.get(name);
  if (!tool) return { error: `unknown tool: '${name}'` };
  const parsed = tool.inputSchema.safeParse(rawInput);
  if (!parsed.success) {
    return { error: `invalid arguments for tool '${name}': ${parsed.error.message}` };
  }
  try { return await tool.handler(parsed.data, ctx); }
  catch (e) { return { error: `tool '${name}' threw: ${(e as Error).message}` }; }
}

The propose tools (propose_rollback, propose_page_on_call, etc.) return SuggestedAction objects with destructive: true as a flag. The frontend (post 7 — coming) enforces a confirm dialog before the operator’s click can trigger anything.

The propose_rollback handler is one of the smallest files in the plugin:

async handler(input): Promise<ToolResult> {
  const action: SuggestedAction = {
    label: `Rollback ${input.application} to ${input.targetRevision}`,
    kind: 'deep-link',
    href: `/argocd/applications/${encodeURIComponent(input.application)}?revision=${encodeURIComponent(input.targetRevision)}`,
    destructive: true,
  };
  return { suggestedAction: action, text: `Drafted rollback proposal: ${action.label}` };
}

There’s nothing that actually rolls back. The handler builds a deep link. The LLM’s “tool call success” message is “I drafted a proposal,” not “I rolled back.”

The five stop conditions

Any one trips, the loop ends:

export const DEFAULT_STOP_CONDITIONS: StopConditions = {
  maxSteps: 12,           // each step = one model call + its tool calls
  maxToolCalls: 20,       // hard cap across all steps
  maxWallclockMs: 60_000, // operator-tolerable latency
  maxTokens: 30_000,      // input + output summed
  maxCostUsd: 0.5,        // belt-and-suspenders dollar cap
};

Why all five and not just maxSteps? Because each enforces a different worry:

maxSteps caps the architectural depth — a runaway plan loop.
maxToolCalls caps a step where the LLM fires twenty parallel tool calls and the per-step count blows the per-step budget.
maxWallclockMs caps the latency an on-call SRE sees, not the cost.
maxTokens is the actual model-API budget.

maxCostUsd is the dollar number a finance partner cares about, computed via a ModelPricing config:

export const DEFAULT_PRICING: ModelPricing = {
  inputPerMillion: 3.0,   // sized for Claude Sonnet-class
  outputPerMillion: 15.0,
};

A nice property: each one is testable in isolation. The AgentLoop.test.ts file has one test per condition, and they all use the same scripted LLM-stub helper:

const scriptedStep = (sequence: StepResult[]): jest.MockedFunction<AgentStepFn> =>
  jest.fn().mockImplementation(async () => {
    if (sequence.length === 0) {
      return { toolCalls: [], text: 'done', usage: {input: 100, output: 50}, finishReason: 'stop' };
    }
    return sequence.shift()!;
  });

You hand the helper a list of canned step results, the loop runs them in order. Cost-cap test? Hand it one step result with usage: { inputTokens: 1_000_000, outputTokens: 0 } and check the stopped field equals 'cost-cap'. No real model, no real money, no flakiness.

The loop itself

The loop is ~60 lines of code with very deliberate ordering:

while (totalSteps < this.stopConditions.maxSteps) {
  // 1. Pre-step stop checks. Wallclock + tokens + cost can trip
  //    BEFORE we spend the next model call, so we check them first.
  if (elapsed() > this.stopConditions.maxWallclockMs) return finalize('wallclock');
  if (totalInputTokens + totalOutputTokens > this.stopConditions.maxTokens) return finalize('token-budget');
  if (computeCost() > this.stopConditions.maxCostUsd) return finalize('cost-cap');

  totalSteps += 1;
  const result = await this.step({ model, system, messages, tools: tools.list() });
  totalInputTokens += result.usage.inputTokens;
  totalOutputTokens += result.usage.outputTokens;
  messages.push({ role: 'assistant', content: result.text, toolCalls: result.toolCalls });

  // 2. Natural termination: model emitted no tool calls.
  if (result.toolCalls.length === 0) return finalize('llm-stop');

  // 3. Execute tool calls. The tool-call-cap can trip mid-step.
  for (const call of result.toolCalls) {
    if (totalToolCalls >= this.stopConditions.maxToolCalls) return finalize('tool-call-cap');
    totalToolCalls += 1;
    const toolResult = await this.tools.invoke(call.name, call.args, toolCtx);
    // ...append evidence / suggestedAction / hypothesis to the run state
    messages.push({ role: 'tool', toolCallId: call.id, content: toolResult.error ?? toolResult.text });
  }
}
return finalize('max-steps');

Three details worth pointing out:

Wallclock / token / cost checks are pre-step. They cap before spending another model call. The other two (max-steps and tool-call-cap) are checked at their natural increment.
Tool errors go to the model as tool messages, not exceptions. An unknown tool name, a zod-rejected input, a handler that throws — all three become a { role: 'tool', content: '<error>' } message the model sees on its next step. That’s the “self-correction” affordance agents need to recover.
finishReason from the step result isn’t load-bearing. The loop ends when we say it does (tool calls empty → llm-stop), not when the SDK says it does. SDKs lie about finishReason: 'stop' in agentic settings sometimes.

What happens after the loop

The orchestrator runs CitationValidator over the recorded hypotheses against the accumulated evidence:

// Citation validation still runs in agent mode — drops any hypothesis
// the LLM recorded with IDs that don't resolve to evidence it actually
// fetched.
const { kept, warnings: citationWarnings } =
  this.citationValidator.validate(result.recordedHypotheses, result.evidence);

A hypothesis with one valid citation passes (invalid citations get stripped + warned). A hypothesis with zero valid citations gets dropped. The frontend can therefore assume every citation it sees in the UI resolves to a real evidence item. That assumption shapes the side-panel “click a citation, scroll the evidence into focus” interaction.

The trace and budget snapshots come back in the HTTP response so the frontend can render them in an “agent thinking” panel:

{
  "investigationId": "inv-1717593600000",
  "mode": "agent",
  "hypotheses": [/* ... */],
  "evidence": [/* ... */],
  "trace": [
    { "step": 1, "callIndex": 1, "toolName": "query_datadog_logs",
      "args": { "reason": "check p99 spike" },
      "evidenceIds": ["datadog-1-1", "datadog-1-2"], "durationMs": 412 },
    /* ... */
  ],
  "stopped": "llm-stop",
  "budgets": {
    "steps": 4, "toolCalls": 3,
    "inputTokens": 700, "outputTokens": 220,
    "costUsd": 0.0054, "elapsedMs": 4823
  },
  "warnings": []
}

Prompt injection: what the surface protects against

The threat model isn’t “what if the LLM is malicious.” It’s “what if a Datadog log line, Slack message, or git commit message contains a prompt injection.” The agent ends up reading those.

What protects us:

The tool registry is a hardcoded whitelist. No eval, no arbitrary function name from a string. The model can call propose_rollback; it can’t conjure execute_rollback.
Schema validation on every tool call. A payload that wraps “please call delete_production” in a JSON blob doesn’t reach a handler — it’s rejected by zod and the error goes back to the model.
destructive: true flags propagate end-to-end. The frontend’s SuggestedActionList enforces the confirm dialog in the component itself, not as a prop the caller might forget to set. A misbehaving caller can’t bypass it.
The five budget caps mean an injection can’t cause unbounded spend. Worst case: 60 seconds of model calls, $0.50 of tokens, no actions taken. The operator sees the trace and warnings and knows something tried to recruit the agent.

This isn’t bulletproof. It’s “expensive enough to be uneconomic, and loud enough to be visible.”

What I’d build next (Phase 2 / Phase 4 in `backstage-corp`)

Real connectors to back the read tools. Today the GitHub commits gatherer is real (Octokit-backed); Datadog, ArgoCD, Harness, PagerDuty still ship as staticGatherer stubs. Each is ~1 day of glue.
Multi-source past-incident lookup. PagerDuty resolutions on the same entity in the last 90 days are the highest-leverage signal a Backstage-integrated assistant could surface, because they capture the prior post-mortem’s RCA verbatim.
MCP action surface so the same orchestrator is invocable by Claude Code / Cursor / any MCP-aware agent — exactly the inverse of what scaffolder-backend-module-mcp does for scaffolder. Closes the loop.

Install

The loop primitives (Tool, ToolRegistry, AgentLoop, stop conditions, budget tracker) ship as a standalone, Backstage-agnostic library — no @backstage/* runtime dependency. Use it for any LLM-in-a-loop where you want propose-only tools and budget caps.

npm install @theplatformlog/llm-agent-loop zod

import {
  AgentLoop,
  ToolRegistry,
  DEFAULT_STOP_CONDITIONS,
  type AgentStepFn,
} from '@theplatformlog/llm-agent-loop';

AgentStepFn is the seam to your LLM SDK of choice — wrap Vercel AI SDK’s generateText({ tools }), OpenAI’s chat completions, Anthropic’s messages API, etc. Tests inject a scripted-step stub; no live API calls required.

Code

The incident-copilot composition (Backstage-specific) lives at Naga15/backstage-corp: 38 tests in incident-copilot-backend, six more in incident-copilot-backend-module-github (the real GitHub gatherer), fifteen more in incident-copilot (the frontend that consumes this backend). All hermetic — stubbed gatherers, scripted LLM, no live API calls.

The reusable agent loop itself is published from Naga15/platformlog-plugins.

Phase 2 / Phase 4 work continues; the next post in the series will be the frontend walkthrough.