theplatformlog

Backstage as an LLM agent: a 5-cap tool-calling loop with propose-only tools

Building the AgentLoop for incident-copilot-backend — what earns the 'agent' framing, and the five stop conditions that keep it cheap.

· Backstage AI plugins, part 6

backstageaillmagentsclaudetypescriptmcp

Post 4 sketched a four-phase plan for AI plugins on Backstage. The fourth phase — an incident-investigation co-pilot — was the diagonal of the 2x2: read the catalog and change the world. The “change the world” part is what earns the word “agent” vs. “co-pilot.” This post is about that part.

The artifact is an AgentLoop class that ships in @internal/plugin-incident-copilot-backend (in the Naga15/backstage-corp private repo). It’s an LLM in a tool-calling loop with five independent stop conditions, a tool surface that’s split into read-only and propose-only kinds, and a citation validator that drops hypotheses whose evidence IDs don’t resolve. None of those design choices are accidental.

What “agent” should mean

The word is overloaded. The cheap definition is “an LLM that calls a function.” By that bar every chatbot with a Python interpreter is an agent.

The bar I want for an on-call SRE assistant is higher. Specifically:

  1. The model decides which signals to pull and in what order — instead of the backend pre-fetching everything.
  2. The model iterates: result → next-tool → result → next-tool → …
  3. There’s a stop criterion the model can meet by emitting no more tool calls — i.e., the model decides it’s done.
  4. There’s a budget the operator can meet that overrides #3 — wallclock, token, dollar, call-count.
  5. Destructive actions are not actions at all. They are proposals the model emits. A human clicks them.

If you don’t have #5 specifically, you have an automation system, not an agent. The distinction matters because the prompt-injection threat model for “agent that can spend money / page on-call / rollback prod” is qualitatively different from one that can’t.

The tool surface, in three kinds

Architecture diagram

// src/agent/Tool.ts
export interface Tool<TInput = unknown> {
  readonly name: string;
  readonly description: string;
  readonly inputSchema: z.ZodSchema<TInput>;
  readonly kind: 'read' | 'propose' | 'record';
  handler(input: TInput, ctx: AgentToolContext): Promise<ToolResult>;
}

Three things to notice:

  1. kind is part of the type. Not a string the handler returns later. Not something the loop infers. The class of action is fixed at registration time — you can audit your tool registry to see exactly which tools can propose anything, full stop.

  2. inputSchema is a z.ZodSchema, not a string description. The ToolRegistry.invoke() method validates input against it before calling the handler. A bad-shape tool call becomes a tool error fed back to the LLM, not a crash:

    async invoke(name, rawInput, ctx) {
      const tool = this.map.get(name);
      if (!tool) return { error: `unknown tool: '${name}'` };
      const parsed = tool.inputSchema.safeParse(rawInput);
      if (!parsed.success) {
        return { error: `invalid arguments for tool '${name}': ${parsed.error.message}` };
      }
      try { return await tool.handler(parsed.data, ctx); }
      catch (e) { return { error: `tool '${name}' threw: ${(e as Error).message}` }; }
    }
  3. The propose tools (propose_rollback, propose_page_on_call, etc.) return SuggestedAction objects with destructive: true as a flag. The frontend (post 7 — coming) enforces a confirm dialog before the operator’s click can trigger anything.

The propose_rollback handler is one of the smallest files in the plugin:

async handler(input): Promise<ToolResult> {
  const action: SuggestedAction = {
    label: `Rollback ${input.application} to ${input.targetRevision}`,
    kind: 'deep-link',
    href: `/argocd/applications/${encodeURIComponent(input.application)}?revision=${encodeURIComponent(input.targetRevision)}`,
    destructive: true,
  };
  return { suggestedAction: action, text: `Drafted rollback proposal: ${action.label}` };
}

There’s nothing that actually rolls back. The handler builds a deep link. The LLM’s “tool call success” message is “I drafted a proposal,” not “I rolled back.”

The five stop conditions

Any one trips, the loop ends:

export const DEFAULT_STOP_CONDITIONS: StopConditions = {
  maxSteps: 12,           // each step = one model call + its tool calls
  maxToolCalls: 20,       // hard cap across all steps
  maxWallclockMs: 60_000, // operator-tolerable latency
  maxTokens: 30_000,      // input + output summed
  maxCostUsd: 0.5,        // belt-and-suspenders dollar cap
};

Why all five and not just maxSteps? Because each enforces a different worry:

A nice property: each one is testable in isolation. The AgentLoop.test.ts file has one test per condition, and they all use the same scripted LLM-stub helper:

const scriptedStep = (sequence: StepResult[]): jest.MockedFunction<AgentStepFn> =>
  jest.fn().mockImplementation(async () => {
    if (sequence.length === 0) {
      return { toolCalls: [], text: 'done', usage: {input: 100, output: 50}, finishReason: 'stop' };
    }
    return sequence.shift()!;
  });

You hand the helper a list of canned step results, the loop runs them in order. Cost-cap test? Hand it one step result with usage: { inputTokens: 1_000_000, outputTokens: 0 } and check the stopped field equals 'cost-cap'. No real model, no real money, no flakiness.

The loop itself

The loop is ~60 lines of code with very deliberate ordering:

while (totalSteps < this.stopConditions.maxSteps) {
  // 1. Pre-step stop checks. Wallclock + tokens + cost can trip
  //    BEFORE we spend the next model call, so we check them first.
  if (elapsed() > this.stopConditions.maxWallclockMs) return finalize('wallclock');
  if (totalInputTokens + totalOutputTokens > this.stopConditions.maxTokens) return finalize('token-budget');
  if (computeCost() > this.stopConditions.maxCostUsd) return finalize('cost-cap');

  totalSteps += 1;
  const result = await this.step({ model, system, messages, tools: tools.list() });
  totalInputTokens += result.usage.inputTokens;
  totalOutputTokens += result.usage.outputTokens;
  messages.push({ role: 'assistant', content: result.text, toolCalls: result.toolCalls });

  // 2. Natural termination: model emitted no tool calls.
  if (result.toolCalls.length === 0) return finalize('llm-stop');

  // 3. Execute tool calls. The tool-call-cap can trip mid-step.
  for (const call of result.toolCalls) {
    if (totalToolCalls >= this.stopConditions.maxToolCalls) return finalize('tool-call-cap');
    totalToolCalls += 1;
    const toolResult = await this.tools.invoke(call.name, call.args, toolCtx);
    // ...append evidence / suggestedAction / hypothesis to the run state
    messages.push({ role: 'tool', toolCallId: call.id, content: toolResult.error ?? toolResult.text });
  }
}
return finalize('max-steps');

Three details worth pointing out:

What happens after the loop

The orchestrator runs CitationValidator over the recorded hypotheses against the accumulated evidence:

// Citation validation still runs in agent mode — drops any hypothesis
// the LLM recorded with IDs that don't resolve to evidence it actually
// fetched.
const { kept, warnings: citationWarnings } =
  this.citationValidator.validate(result.recordedHypotheses, result.evidence);

A hypothesis with one valid citation passes (invalid citations get stripped + warned). A hypothesis with zero valid citations gets dropped. The frontend can therefore assume every citation it sees in the UI resolves to a real evidence item. That assumption shapes the side-panel “click a citation, scroll the evidence into focus” interaction.

The trace and budget snapshots come back in the HTTP response so the frontend can render them in an “agent thinking” panel:

{
  "investigationId": "inv-1717593600000",
  "mode": "agent",
  "hypotheses": [/* ... */],
  "evidence": [/* ... */],
  "trace": [
    { "step": 1, "callIndex": 1, "toolName": "query_datadog_logs",
      "args": { "reason": "check p99 spike" },
      "evidenceIds": ["datadog-1-1", "datadog-1-2"], "durationMs": 412 },
    /* ... */
  ],
  "stopped": "llm-stop",
  "budgets": {
    "steps": 4, "toolCalls": 3,
    "inputTokens": 700, "outputTokens": 220,
    "costUsd": 0.0054, "elapsedMs": 4823
  },
  "warnings": []
}

Prompt injection: what the surface protects against

The threat model isn’t “what if the LLM is malicious.” It’s “what if a Datadog log line, Slack message, or git commit message contains a prompt injection.” The agent ends up reading those.

What protects us:

This isn’t bulletproof. It’s “expensive enough to be uneconomic, and loud enough to be visible.”

What I’d build next (Phase 2 / Phase 4 in backstage-corp)

Code

Branch: Naga15/backstage-corp master. 38 tests in incident-copilot-backend, six more in incident-copilot-backend-module-github (the real GitHub gatherer), fifteen more in incident-copilot (the frontend that consumes this backend). All hermetic — stubbed gatherers, scripted LLM, no live API calls.

Phase 2 / Phase 4 work continues; the next post in the series will be the frontend walkthrough.