Scoping a Task So It Can't Wander

An autonomous fails in a specific way: it drifts. You ask for one thing, it "helpfully" refactors three others, touches files you never mentioned, and comes back with a diff you can't read. The fix isn't a smarter agent — it's a tighter brief.

A good autonomous task has four parts: a clear goal, explicit constraints, a concrete done-criterion, and guardrails the agent can run itself. The guardrails are the most important part, because they're the fence the agent can't climb over without you noticing.

Goal: Make the failing tests in tests/checkout/ pass.

Constraints:
- Only edit files under src/checkout/. Do not touch src/auth/,
  the database schema, or anything in config/.
- Do not add new dependencies.
- Do not change the tests themselves — the tests define correct behavior.

Done when:
- `npm test tests/checkout/` is fully green.
- `npm run typecheck` passes.
- `git diff --stat` shows changes only under src/checkout/.

If you get stuck after two real attempts, stop and report what you
tried and what's still failing. Don't paper over a failure to make
the suite go green.

The brief works because it boxes the agent in: constraints wall off what it may touch, the done-criterion is a command it can run, and the verify loop keeps it honest:

 ┌───────────────────────── SCOPE FENCE ─────────────────────────┐
 │  may edit:  src/checkout/                                      │
 │  ✗ off-limits: src/auth/ · db schema · config/ · the tests    │
 │                                                               │
 │     ┌─────────┐   edit    ┌──────────────┐                    │
 │     │  AGENT  │──────────▶│ src/checkout/ │                   │
 │     │         │◀──────────┤              │                    │
 │     └────┬────┘  run test └──────────────┘                    │
 │          │                                                    │
 │          ▼  npm test + typecheck                              │
 │     ┌─────────────┐   green ──▶ DONE                          │
 │     │ VERIFY LOOP │   red   ──▶ retry (max 2)                 │
 │     └─────────────┘   stuck ──▶ STOP & report ────────────────┼─▶ you
 └───────────────────────────────────────────────────────────────┘

Three things make this brief hard to wander out of. The constraints draw a wall around the files the agent may touch. The done-criterion is a command anyone can run, not a vibe. And the last paragraph gives it permission to stop and ask instead of flailing — which prevents the slow-motion disaster where an agent keeps "fixing" things for twenty minutes and leaves the tree worse than it found it.

The tests are the real fence. An agent that can run its own tests has a tight, honest feedback loop and mostly stays inside it; one with no way to check its work will confidently hand you something broken. If you take one habit from this chapter: give the agent a way to verify itself, and make passing it the definition of done.

Scoping a Task So It Can't Wander

Want it offline?