Programmatic test harness

@abtree/testing is a small programmatic harness for driving an abtree execution end-to-end from a TypeScript file. It complements BDD test specs — instead of a YAML scenario that the agent reads and walks, you script the exchange step-by-step using a when().respond() DSL that mirrors the abtree CLI surface.

Reach for this when:

You're writing regression tests that must run identically across releases (CI pipelines, parity checks between transports).
You need precise assertions at every step — expected name, expected response type, exact $LOCAL value.
The tree under test is deterministic enough that an LLM in the loop would be overkill.

For BDD-style specs that an agent walks through using fixtures for external side effects, see Test a tree and the @abtree/test-tree runner.

Install

bun  add -d @abtree/testing
pnpm add -D @abtree/testing
npm  install -D @abtree/testing

At a glance

import {
  AgentHarness,
  CliTransport,
  setupTreePackageFixture,
  eval as evalAs,
  evaluate,
  instruct,
  localWrite,
  submit,
} from "@abtree/testing";

const fixture = setupTreePackageFixture({
  slug: "hello-world",
  treeDir: "/abs/path/to/trees/hello-world",
});

const agent = new AgentHarness(new CliTransport({ cwd: fixture.cwd }));

try {
  await agent.start("hello-world", "scenario");

  await agent.when(instruct("Acknowledge_Protocol"))
    .respond(submit("success"));

  await agent.when(instruct("Determine_Time"))
    .respond(localWrite("time_of_day", "morning"), submit("success"));

  await agent.when(evaluate("Morning_Greeting"))
    .respond(evalAs(true));

  await agent.when(instruct("Morning_Greeting"))
    .respond(localWrite("greeting", "Good morning!"), submit("success"));

  await agent.expectDone();
  await agent.expectLocal({ greeting: "Good morning!" });
} finally {
  await agent.close();
  fixture.cleanup();
}

Each .when(...).respond(...) line reads as "when the runtime asks me to do X, the agent responds with Y." The terminal action in every chain is either submit(...) or eval(...) — that's what advances the cursor.

Vocabulary

The DSL mirrors the abtree CLI surface verb-for-verb.

Step matchers — what the runtime is asking

Helper	Matches `next` response
`instruct(name)`	`{ type: "instruct", name, instruction }`
`evaluate(name)`	`{ type: "evaluate", name, expression }`

Agent actions — what the agent calls back with

Helper	CLI verb	Terminal?
`submit(status)`	`abtree submit <id> <success\|failure\|running>`	yes
`eval(result)`	`abtree eval <id> <true\|false>`	yes
`localWrite(path, value)`	`abtree local write <id> <path> <val>`	no

Every .respond(...) chain must end with exactly one terminal action. The harness throws if the terminal is missing or not last.

Harness verbs — the rest of the CLI

Method	CLI verb
`agent.start(tree, summary)`	`abtree execution create <tree> <summary>`
`agent.localRead([path])`	`abtree local read <id> [path]`
`agent.globalRead([path])`	`abtree global read <id> [path]`
`agent.expectDone()`	asserts the next `next` returns `{ status: "done" }`
`agent.expectLocal({ ... })`	reads `$LOCAL` and deep-equals each named slot
`agent.close()`	tears the transport down

Transports

Both transports take a cwd (where the abtree runtime resolves .abtree/ from) and optionally command + args for invoking the CLI.

`CliTransport`

Spawns the abtree CLI once per verb. Each call is a fresh subprocess.

new CliTransport({ cwd: fixture.cwd });   // assumes `abtree` on PATH

// or, for in-repo source:
new CliTransport({
  cwd: fixture.cwd,
  command: "bun",
  args: ["packages/cli/index.ts"],
});

Stateless — close() is a no-op.

`McpTransport`

Spawns the abtree mcp server once and drives every subsequent call as an MCP tool invocation over stdio.

new McpTransport({ cwd: fixture.cwd });

Roughly 8× faster wall-clock than CliTransport for end-to-end scenarios — the subprocess startup is paid once instead of per step.

Implementing a new transport

Implement the Transport interface (eight async methods + a NextResponse return type for next) and pass an instance to AgentHarness. Scenario code runs unchanged.

import { type Transport, type NextResponse, AgentHarness } from "@abtree/testing";

class HttpTransport implements Transport {
  constructor(private baseUrl: string) {}

  async createExecution(tree: string, summary: string) {
    const r = await fetch(`${this.baseUrl}/executions`, {
      method: "POST",
      body: JSON.stringify({ tree, summary }),
    });
    return r.json();
  }

  async next(id: string): Promise<NextResponse> { /* … */ }
  // submit, eval, localRead, localWrite, globalRead, close
}

Fixtures

setupTreePackageFixture(opts) mkdtemp's an isolated dir, copies the tree package's main.json into a <slug>/ subdir, and returns { cwd, treePath, cleanup }. Pass treePath to agent.start(...); pair the cleanup with the harness's close() in a finally block:

const fixture = setupTreePackageFixture({
  slug: "hello-world",
  treeDir: "/abs/path/to/trees/hello-world",
});

const agent = new AgentHarness(new CliTransport({ cwd: fixture.cwd }));
try {
  await agent.start(fixture.treePath, "scenario");
  // …
} finally {
  await agent.close();
  fixture.cleanup();
}

Without isolation, executions and snapshots from the run would land in the caller's project tree.

The scenario lives in one file; runners thread their transport through it. This is the "behavioural parity" pattern — both transports must produce identical observable behaviour against the same tree.

// scenario.ts — written once
import { type AgentHarness, instruct, submit } from "@abtree/testing";

export async function runScenario(agent: AgentHarness): Promise<void> {
  await agent.when(instruct("Acknowledge_Protocol"))
    .respond(submit("success"));
  // … rest of the scenario
  await agent.expectDone();
}

// run-cli.ts
import { AgentHarness, CliTransport, setupTreePackageFixture } from "@abtree/testing";
import { runScenario } from "./scenario.ts";

const fixture = setupTreePackageFixture({ slug: "X", treeDir: TREE });
const agent = new AgentHarness(new CliTransport({ cwd: fixture.cwd }));
try {
  await agent.start("X", "cli");
  await runScenario(agent);
} finally {
  await agent.close();
  fixture.cleanup();
}

run-mcp.ts is identical except for McpTransport.

When to reach for the BDD runner instead

@abtree/test-tree (the BDD-style runner) is the better fit when:

The scenario reads naturally as given/when/then English.
External side effects need fixture-driven replay (mocked MR creates, git pushes, HTTP calls).
Tree authors should be able to add scenarios without writing TypeScript.
A markdown report (and the live SVG diagram next to the execution) is the deliverable.

The programmatic harness here is the better fit when the scenario is a precise sequence of runtime behaviours you want to assert against — typically regression suites built and maintained by the tree's author or by abtree itself.

Reference

Package: @abtree/testing — full API on every export via TSDoc.
Worked example: the bundled tests/ directory in the abtree repo runs the same scenario against both transports as a parity check.

Programmatic test harness ​

Install ​

At a glance ​

Vocabulary ​

Step matchers — what the runtime is asking ​

Agent actions — what the agent calls back with ​

Harness verbs — the rest of the CLI ​

Transports ​

CliTransport ​

McpTransport ​

Implementing a new transport ​

Fixtures ​

Sharing one scenario across transports ​

When to reach for the BDD runner instead ​

Reference ​