Post
  1. Portrait of Onur Solmaz

    Implementation plans keep agents on track

    @onusoz · /2026/03/20 · View on
    My agentic workflow these days: I start all major features with an implementation plan. This is a high-level markdown doc containing enough details so that agent will not stray off the path Real example: https://t.co/vU9SnVYHfY This is the most critical part, you need to make sure the plan is not underspecified. Then I just give the following prompt: --- 1. Implement the given plan end-to-end. If context compaction happens, make sure to re-read the plan to stay on track. Finish to completion. If there is a PR open for the implementation plan, do it in the same PR. If there is no PR already, open PR. 2. Once you finish implementing, make sure to test it. This will depend on the nature of the problem. If needed, run local smoke tests, spin up dev servers, make requests and such. Try to test as much as possible, without merging. State explicitly what could not be tested locally and what still needs staging or production verification. 3. Push your latest commits before running review so the review is always against the current PR head. Run codex review against the base branch: `codex review --base <branch_name>`. Use a 30 minute timeout on the tool call available to the model, not the shell `timeout` program. Do this in a loop and address any P0 or P1 issues that come up until there are none left. Ignore issues related to supporting legacy/cutover, unless the plan says so. We do cutover most of the time. 4. Check both inline review comments and PR issue comments dropped by Codex on the PR, and address them if they are valid. Ignore them if irrelevant. Ignore stale comments from before the latest commit unless they still apply. Either case, make sure that the comments are replied to and resolved. Make sure to wait 5 minutes if your last commit was recent, because it takes some time for review comment to come. 5. In the final step, make sure that CI/CD is green. Ignore the fails unrelated to your changes, others break stuff sometimes and don't fix it. Make sure whatever changes you did don't break anything. If CI/CD is not fully green, state explicitly which failures are unrelated and why. 6. Once CI/CD is green and you think that the PR is ready to merge, finish and give a summary with the PR link. Include the exact validation commands you ran and their outcomes. Also comment a final report on the PR. 7. Do not merge automatically unless the user explicitly asks. --- Once it finishes, I skim the code for code smell. If nothing seems out of the ordinary, I tell the agent to merge it and monitor deployment Then I keep testing and finding issues on staging, and repeat all this for each new found issue or new feature...
    @onusoz·
    pro-tip on how to keep your agent on track and make sure it follows PLANS even after multiple compactions. I don't know if this is common knowledge if the thing you are trying to make it do will take more than 1-2 steps, always make it create a plan. an implementation plan, refactor plan, bugfix plan, debugging plan, etc. have a conversation with the agent. crystallize the issue or feature. talk to it until there are no question marks left in your head then make it save it somewhere. "now create an implementation plan for that in docs". it can be /tmp or docs/ in the repo. I personally use YYYY-MM-DD-x-plan .md naming. IMO all plans should be kept in the repo then here is the critical part: you need to prompt it "now implement the plan in <filename>. if context compacts, make sure to re-read the plan and assess the current state, before continuing. finish it to completion" -> something along those lines why? because of COMPACTION. compaction means previous context will get lossily compressed and crucial info will most likely get lost. that is why you need to pin things down before you let your agent loose on the task compaction means, the agent plays the telephone game with itself every few minutes, and most likely forgets the previous conversation except for the VERY LAST USER MESSAGE that you have given it now, every harness might have a different approach to implementing this. but there is one thing that you can always assume to be correct, given that its developers have common sense. that is, harnesses NEVER discard the last user message (i.e. your final prompt) and make sure it is kept verbatim programmatically even after the context compacts since the last user message is the only piece of text that is guaranteed to survive compaction, you then need to include a breadcrumb to your original plan, the md file. and you need to make it aware that it might diverge if it does not read the plan there is good rationale for "breaking the 4th wall" for the model and making it aware of its own context compaction. IMO models should be made aware of the limitations of their context and harnesses. they should also be given tools to access and re-read pre-compaction user messages, if necessary the important thing is to develop mechanical sympathy for these things, harness and model combined. an engineer does not have the luxury to say "oh this thing doesn't work", and instead should ask "why can't I get it to work?" let me know if you have better workflows or tips for this. I know this can be made easier with slash commands in pi, for example, but I haven't had the chance to do that for myself yet