A tracker ticket an AI coding agent can actually ship is one specific change, with a clear acceptance test, on a named set of files, framed so the agent knows when it is done. Most tickets are not written that way. They are written for a teammate who will fill in the blanks, not for a tool that takes the words at face value. The result is a familiar one: the agent picks the wrong file, edits too much, breaks an unrelated test, and opens a pull request nobody wants to read.
We have run thousands of issues through Codowave's loop. The single biggest predictor of a merged PR is not the model. It is the issue. A clear issue with a vague title still beats a vague issue with a clever title. The patterns below are the ones that move agent merge rates from coin-flip to boring.
Why most AI-agent issues fail
The usual issue template was designed for humans who already know the codebase. It has fields for description, steps to reproduce, and expected behavior. It does not have fields for scope, file hints, or done conditions, because a teammate would infer those from a Slack thread or a hallway chat.
An AI agent has no Slack thread. It has the title, the body, the labels, and whatever the repo index can tell it. When the issue is missing scope, the agent picks scope from the wording, and the wording is usually broader than the author meant. Refactor the auth module is the canonical bad title. There is no version of that issue an agent can finish, because the issue does not say what "refactor" means or when the work stops.
The most common failure modes we see, in order:
- Verb soup. Titles like
Improve onboarding,Clean up tests,Fix the dashboard. Each one hides three or four real tasks. - No acceptance test. The issue describes the symptom but never says how the author will know the symptom is gone. The agent makes one up, and it usually is not the one the reviewer would pick.
- Silent context. "Use the same pattern as
UserService." Which pattern? The constructor injection one or the static helper one? Both exist. The agent guesses. - Multiple changes in one ticket. "Add Stripe webhook, log it, and surface it on the billing page." That is three PRs. The agent will write one and the diff will be too large to merge.
- Stale assumptions. The issue references a file that was renamed last quarter, or a config key that was removed. The agent edits ghosts.
None of these are model problems. A bigger model writes a more confident wrong answer.
The shape of an issue an agent can ship
A shippable ticket has five parts. None of them need a custom template. They fit in a normal tracker body with three short headings.
1. A title that names one change. The title is a contract. It should be a single change you could write in one PR description without using the word "and". Add Stripe customer ID to the user create webhook is a title. Improve billing reliability is a topic.
2. The change in two sentences. Say what should be different after the PR merges. Not the motivation. Not the history. Just the new state. "After this change, the users.create webhook handler stores stripeCustomerId on the user row." If you cannot say it in two sentences, it is more than one issue. We split issues that fail this test using auto-decomposition before any code is written.
3. An acceptance test. One or two checks that prove the change works. The agent uses these to know when it is done. They also become the review checklist. "A new unit test confirms the customer ID is written to the row. The webhook integration test still passes." That is an acceptance test. "It works" is not.
4. File hints. Two to four paths the agent should read first. You are not telling the agent every file to touch. You are telling it where to start. apps/api/src/modules/users/users.service.ts, apps/api/src/modules/billing/billing.controller.ts. This single line saves more agent time than any prompt engineering.
5. What stays out of scope. A short list of things the agent should not touch. "Do not change the user create flow on the dashboard. Do not add new env variables." Out-of-scope notes prevent drift, the most common reason agent PRs get rejected. We dig into the drift problem in the loop write-up.
That is the whole template. Here is what it looks like in practice:
Title: Persist Stripe customer ID on user create webhook
What should change
After this PR, the users.create webhook handler stores
stripeCustomerId on the user row when Better Auth passes one in metadata.
Acceptance
- A unit test asserts stripeCustomerId is written when present.
- Existing webhook integration tests still pass.
Start here
- apps/api/src/modules/users/users.service.ts
- apps/api/src/modules/billing/billing.controller.ts
Out of scope
- The dashboard create flow.
- New env variables.
That issue has 8 lines of body. An agent can read it, plan it, edit two files, write one test, and open a PR a reviewer can merge in five minutes. That is the whole game.
What an agent can't read between the lines
This is the part most teams miss the first week. An agent does not pick up on the things you forgot to write down. Three categories of silent context bite hardest.
Repo conventions. Every codebase has rules that are not in the linter. "We use the singular form for table names." "We do not throw raw Error, we throw HttpException." "All new modules export from a barrel." These rules live in a senior engineer's head. If they are not in AGENTS.md, CLAUDE.md, or a CONTRIBUTING guide, the agent will violate them and the reviewer will spend the review explaining them. Write them down once, and the agent stops making the same mistake on issue four.
Tests the agent did not write. Flaky tests, integration tests that need a database, tests that depend on a secret env var. If the test fails on main for reasons unrelated to the change, the agent will spend retries trying to fix it, then give up and either skip the test or open a PR that admits it could not run the suite. Mark known-flaky tests in CI. Skip them, quarantine them, or fix them. Do not leave them as background noise the agent has to decode.
Secret env vars and external services. An agent that cannot connect to your test Stripe key cannot finish a webhook ticket. Either provide a mocked path the agent can run locally, or label the issue so the agent knows to deliver a PR with a passing unit test and a noted manual integration check. Hard-capped plans like the ones on our pricing page include a fixed worker count, so retry budgets matter. An agent burning retries on an unreachable service is paying for nothing.
None of this is exotic. It is the same context a new hire needs in week one. The difference is that the agent works through it on every issue, not once.
The 10-minute weekly issue triage
The useful habit is small. Once a week, spend ten minutes on the backlog and do four things in order.
First, kill the topics. Any title that names a topic instead of a change either gets split into specific issues or moved to a roadmap doc. Topics are not issues, and they should not have agent labels.
Second, add file hints to anything that looks ready. You do not need to be exhaustive. Two paths is plenty. The agent reads them first, then expands from there.
Third, add acceptance tests. The fastest way to write one is to imagine the PR is open and ask what one check would tell you it works. That is the acceptance test. Write it as a sentence, not a runnable test.
Fourth, add out of scope lines on anything that touches more than one module. This is the line that saves the most review time, because reviewers can stop arguing with the diff about what was supposed to be in it.
That is the loop. A team running this for two weeks usually sees the agent's first-PR merge rate climb from somewhere around half to somewhere around eight in ten. We saw that pattern when we ran 10 issues over a single weekend, seven merged on first review, and the two that needed changes were the two issues with the weakest acceptance criteria.
The takeaway
The agent is not the bottleneck. The issue is. Three short sections, a couple of file paths, and one out-of-scope line are enough to turn a vague backlog into work the agent can finish, and a reviewer can merge without writing a second pull request to clean up the first.