An autonomous product team, built from agents

It finds its own work, ranks it, and never stops shipping.

Point it at a feature. A team of agents discovers what to build, throws out the slop, scores what survives, and ships sprints of fixes and enhancements. When the backlog runs low it discovers more, and keeps going. A human orchestrates and never writes a line of code.

Continuous discovery, not a one-shot idea list Evidence-anchored intake filter Fixes + enhancements every sprint Loops until the work is done

What makes it different

One loop that never stops

Most agent tools generate a list of ideas and stop. This runs the whole team on a cycle: discover, prioritize, ship a sprint of fixes and enhancements, then refill the backlog and go again.

Backlog

drains, refills, repeats

Stage

Hover or tap any node, or the green Replenish link, to see what it does.

Stage hand-off Replenish (closes the loop) Work in flight

Watch one full turn of the loop

See the team take "Run the product team on Altro's rankings screen" from discovery and the slop filter through a shipped sprint, then loop back for more.

Why the output isn't slop

Two jobs, kept strictly apart

The default failure of agentic discovery is confident, well-formatted nonsense. The fix is to make discovery a strict intake gate and the build a self-checking pipeline. Every move is either a human orchestrating or an agent doing, and the two never blur.

The orchestrator

Routes, never writes code

A human scrum-master slices work, writes tight briefs, scores, accepts, ships, and keeps the backlog. Not one line of feature code. It stays lean by passing pointers like file paths and symbol names, never pasting files into its own head.

The agents

Do the work, return receipts

Lens agents find work, the prioritizer scores it, the product agent writes stories, and builders ship them. Each returns a terse manifest, never a code dump, so the orchestrator stays sharp across dozens of hand-offs.

Evidence or it's dropped

Every candidate must cite a file:line, a reproducible flow, or a spec it violates. No anchor, no intake. "This could be nicer" never makes it in.

Reviewer is not the Reviser

One agent audits a diff for runtime correctness; a different agent fixes what it flagged. A critic grading its own rework drifts into self-justification.

Agents never self-score

Discovery can't set its own impact or priority. One deterministic pass scores the whole filtered union against an anchored rubric, so two runs rank the same.

The roster

A full product team, dispatched as agents

Each role has a mode and a single job. Read-only roles always run in parallel; anything that writes the same file is sequenced, never raced.

Role	Mode	Job
Orchestrator	scrum-master	Lifecycle, briefs, scoring, accept, ship, bookkeeping. Writes no feature code.
Discovery × 5 lenses	read-only	Each owns one lens, sweeps every surface, surfaces candidates with evidence.
Prioritization	write	Dedupe, score, rank, assign stable ids, write the backlog.
Product	read-only	Turn top items into dev-ready stories that meet a Definition of Ready.
Builder	write	Implement one story over one disjoint file set.
Diff Reviewer	read-only	Audit the diff for runtime correctness, rank findings P1 / P2 / P3.
Reviser	write	Fix P1 / P2, usually the original Builder resumed. Max two rounds.

Inside discovery

Five lenses, each hunting one kind of problem

Run in parallel, every lens sweeps the whole feature but only through its own question. Convergence across lenses on one gap is the highest-confidence signal there is.

Flow

Dead ends, missing CTAs, multi-tap core jobs, states with no way out.

Runtime

Stale state and closures, wrong-state conditionals, broken CRUD wiring, off-by-one, null/NaN, empty/loading/error paths.

Coverage

Specced-or-listed but not actually built or usable. Real capability gaps against the design.

Consistency

House-style and pattern drift across surfaces, plus cross-document conflicts.

Gates

Brand and token violations, accessibility (labels and hit targets), AI-tells, and other hard-gate failures.

Signal vs. noise

Highest signal: human device reports and runtime-correctness audits. Lowest: generic best-practice suggestions, which the filter rejects.

Phase 2 · Prioritize

One number, computed the same way every time

An anchored rubric turns each item into impact, fit, and effort points, then a single deterministic pass ranks the whole union. Fit is persisted, ids are never reused, ties break impact then effort then id. Every input ends as scored, merged, rejected, or deferred, with no silent drops.

priority score = (impact × fit) / effort

Phase 4 · Sprint

Fixes and enhancements, shipped together

Each sprint pulls 3 to 5 disjoint-file stories at an 85% enhancement / 15% bug mix. Builder builds, a fresh Reviewer audits runtime correctness, a Reviser clears P1/P2 in at most two rounds. Any P1, regression, or device-reported bug is drain-first. Partial sprint? Ship only the files that passed.

85 / 15 mixdisjoint files onlymax 2 reviser rounds

Safety rails

It runs on its own, but never off a cliff

Humans keep the keys

Native builds	routed to a human gate, never auto-shipped
SQL & migrations	blocked, queued for human review
Can't-see-in-source	renders & runtime go to a device report
Each ship	standing authorization, or pause for approval, your call

When something breaks

Regression	a shipped item that breaks reopens, exempt from do-not-propose
Bad ship	revert the deploy first, then fix forward
P1 crash	preempts the running sprint as a solo hotfix, then resumes
Empty sweep	discovery returns nothing valid? pause and notify, never spin

Live demo · worked example

One full turn of the loop

Press play and watch the team take a single instruction through discovery, the slop filter, scoring, stories, a build-review-revise sprint, the ship gate, and the loop back for more. Step through it at your own pace.

Step 1 of 8

It starts with one plain instruction.

You give it one feature and a goal. The team handles discovery, prioritization, stories, and the build, sprint after sprint, reporting at each ship instead of asking what to do next.

Surface: Altro rankings screen Mix: 85% enhancement / 15% bug Human input: one instruction

5 lenses sweep in parallel

Flow

dead ends, missing CTAs

2 found

Runtime

stale state, off-by-one

2 found

Coverage

specced, not built

1 found

Consistency

pattern drift

1 found

Gates

a11y, brand, AI-tells

1 found

Candidates, each with an anchor

bug

Drag-reorder writes a stale rank index

useRankings.ts:88

runtime · reproducible on fast moves

enh

Ranked list has no empty state or CTA

RankingsScreen.tsx:142

flow · new users see a blank screen

enh

Reorder handles lack a11y labels, hit target < 44px

RankRow.tsx:31

gates · fails the a11y hard gate

enh

Compare-trips view specced but never wired

design/altro.md §4

coverage · capability gap vs design

Read-only agents, in parallel. Each finding names a defect or a missing capability and points at exactly where it lives.

The intake filter rules on every candidate

bug

Drag-reorder writes a stale rank index

useRankings.ts:88

kept

enh

Ranked list has no empty state or CTA

RankingsScreen.tsx:142

kept

enh

Reorder handles lack a11y labels

RankRow.tsx:31

kept

enh

Make the rankings screen pop more

no anchor

no evidence

enh

Add a gradient header banner

cosmetic

fails brand gate

bug

Reindex gap on city delete

already ENH-009

duplicate

Three rejected: no evidence anchor, cosmetic-only, and a duplicate of an existing backlog row. This gate is the whole difference between a product team and an idea generator.

BACKLOG.md · scored & ranked(impact × fit) / effort

id	title	i	f	e
BUG-014	Stale rank indexP1	H	High	L
ENH-031	Empty state + CTA	H	High	M
ENH-032	a11y labels + hit target	M	Med	L
ENH-033	Wire compare-trips	H	Med	H
ENH-034	Rank chips to brand tokens	L	Low	L

One deterministic pass. The P1 bug drains first, overriding the ratio; the rest sort by score. Two runs of this rubric produce the same ranking.

ENH-031 → Definition of Ready

✓

User valueNew users land on a screen that tells them what to do

✓

AcceptanceEmpty list shows headline, blurb, and an "Add a city" CTA

✓

Owns filesRankingsEmpty.tsx (new), RankingsScreen.tsx

✓

Constraintsbrand tokens, a11y labels, no AI-tells

Contract symbols verified to exist

$ rg -n "useRankings" RankingsScreen.tsx

✓ 142: const { items } = useRankings()

$ rg -n "AddCityButton" components/

✓ Cta.tsx:8: export function AddCityButton

A story is ready only when its files are disjoint and every symbol it leans on is confirmed real. Missing symbol → the story is blocked, not built.

✎

Builder

implements each story over its own files

◎

Diff Reviewer

audits runtime correctness, flags P1/P2

✏

Reviser

fixes only what was flagged

P1passed

BUG-014

Stale rank index

useRankings.ts

P2passed

ENH-031

Empty state + CTA

RankingsEmpty.tsx

passed

ENH-032

a11y labels + hit target

RankRow.tsx

Three stories, three disjoint file sets, so they build in parallel. The Reviewer flags a P1 and a P2; the Reviser clears both. Disjoint files are why they could run at once instead of in a line.

✓ Shipped

Definition of Done met · ship gate

Ship the sprint

✓

Passed: hard gates clean, typecheck green, P1/P2 cleared

✓

Commit: only the 3 story file sets, no lockfiles or migrations

✓

Held back: a SQL migration stays blocked for your review

Standing authorization ships it, or pause for approval at every sprint. Your call.

Backlog drains as the sprint ships

Ready

→

Low-water

shipped 3 ready 2 below low-water

Discover→ Prioritize→ Stories→ Sprint↺

Ready drops below the low-water mark, so the loop re-runs discovery to refill it, rotating one lens to control cost, and keeps shipping. It stops only when the completion condition is met or the human says so.

1 / 8