Entries for December 2025
@onusoz · 2025-12-31
Anyone created an agent skill for splitting PRs for good review culture?
GPT-5.2 xhigh feels like a careful systems debugger
GPT 5.2 xhigh feels like a much more careful architecter and debugger, when it comes to complex systems
But most people here think Opus 4.5 is the best model in that category
There are 2 reasons AFAIS:
- xhigh reasoning consumes significantly more tokens. You need to pay for ChatGPT Pro (200 usd) to be able to use it as a daily driver
- It takes like 5x longer to finish a task, and most people lack the patience ...
@onusoz · 2025-12-31
Just 5 months ago, I was swearing at Claude 4 Sonnet like a Balkan uncle
Models one-shotted the right thing only 20-30% of the time but did really stupid things the rest, and had to be handheld tightly
Today they are much, much better. My psychology is a lot more at ease, and instead of swearing, I want to kiss them on the forehead most of the time
Now I trust agents so much that I queue up 5-10 tasks before go...
@onusoz · 2025-12-31
Codex does not have support for subagents. I tried to use Claude Code to launch 8 Codex instances in parallel on separate tasks, but Opus 4.5 had difficulty following instructions
So created a CLI tool to scan pending TODOs from a markdown file, and let me launch as many harnesses as I want (osolmaz/spawn on github)
I currently use this for relatively read-only tasks like planning and finding root causes of bugs...
@onusoz · 2025-12-29
cc @behackl, forgot to mention you in the original post
@onusoz · 2025-12-29
Friends of open source, we need your help!
A lot of Manim Community accounts got compromised and deleted during Christmas
Manim Community is a popular fork of @3blue1brown's original math animation engine Manim, and its accounts have over 5 YEARS of contributions, knowledge and following
Apparently GitHub support already saw the request and in progress of restoring the GitHub org. But if anyone knows how to spe...
@onusoz · 2025-12-28
While a great feature, I never needed such a thing in Codex after GPT 5.2. It just one shots tasks without stopping
So we have proof by existence that this problem can be solved without any such mechanism. Wish to see the same relentlessness in Anthropic models
@onusoz · 2025-12-27
2025 was the year of ̶a̶g̶e̶n̶t̶s̶ bugs
Software felt much buggier compared to before, even from companies like Apple. Presumably because everyone started generating more code with AI
Models are improving so hopefully 2026 will be the opposite. Even less bugs than pre-AI era
Agent progress is compounding faster than teams realize
Have a long flight, so will think about this
I have an internal 2023 TextCortex doc which models chatbots as state machines with internal and external states with immutability constraints on the external state (what is already sent to the user shall not be changed)
Motivation was that a chatbot provider will always have state that they will want to keep hidden
This was way before Responses and now deprecated As...
@onusoz · 2025-12-27
This was simply because webapp fails to create a post and fails silently. The UX is still not good on this app. Make sure to write your posts somewhere else to not lose them
@onusoz · 2025-12-26
I gave Codex a task of porting an OpenCV tracking algorithm (CSRT) from C++ to Rust, so that I can directly use it in my project without having to cross-compile
It one-shot the task perfectly in 1hr, and even developed a GUI on top of it. All I did was to provide the original source and algo paper
I've spent years getting specialized in writing numerical code (computational mechanics, fem), and now AI can automa...
Depth on Demand
I gave Codex a task of porting an OpenCV tracking algorithm (CSRT) from C++ to Rust, so that I can directly use it in my project without having to cross-compile
@onusoz · 2025-12-25
See the repo for the latest changes: https://t.co/YDevGg2rhz
@onusoz · 2025-12-25
If you have a bunch of docs in your repo, give it a try. It will use the timestamps of the commit that created the files while renaming. You can also run with --dry-run to see changes without applying them
@onusoz · 2025-12-25
Now you can migrate your repo to SimpleDoc with a single command:
npx -y @simpledoc/simpledoc migrate
Step by step wizard will add timestamps to your files based on your git history, add missing YAML frontmatter, update your AGENTS md file
https://t.co/yrciS8KtEw
@onusoz · 2025-12-25
@bcherny Would be great if I could queue messages like in Codex
https://t.co/mC25gNKWo3
@onusoz · 2025-12-24
It seems it's impossible to post something on Reddit these days, even when it is a pure text post without links in the body
@onusoz · 2025-12-23
@onusoz · 2025-12-23
Curious to hear what other hardcore agent users @simonw @mitsuhiko @steipete @badlogicgames think. I can't be the only one who does this.
I feel like everybody ended up with the same workflow independent of each other, but somehow did not write about it (or I missed it)
@onusoz · 2025-12-23
How to stop AI agents from littering your codebase with Markdown files?
I wrote a new post on how to create documentations with AI agents, without having it add markdown files in your repo root, and have chronological order to the files it creates
How to stop AI agents from littering your codebase with Markdown files
A simple documentation workflow for AI agents.
For setup instructions, skip to the How to setup SimpleDoc in your repo section.
@onusoz · 2025-12-21
OpenAI won’t be able to monopolize this, the same reason Microsoft couldn’t monopolize the internet. The internet (of agents) is bigger than any one company
@onusoz · 2025-12-21
One tap @Revolut bank account at Berlin airport. Literally.
Dispenses free card with instructions to login. One of the the most insane onboarding experiences I have ever seen
@onusoz · 2025-12-20
Slop bombing
@onusoz · 2025-12-18
Codex feature request: Let me queue up /model changes
Currently, if I try to run /model while responding, it tells me that I can't do that while the model is responding
But I often want to gauge thinking budget in advance, like run a straightforward task with low reasoning and then start another one with high reasoning
cc @thsottiaux
@onusoz · 2025-12-18
Literally the exact same thing happened to me back in 2018. Everybody learns not to use password auth with SSH the hard way
https://t.co/NPqrXwqUUy
@onusoz · 2025-12-17
AI agents make any transductional task (like translation from language A to language B) trivial, especially when you can verify the output with compilers and tests
The bottleneck is now curating the tests
@onusoz · 2025-12-17
I think X removed one of my posts yesterday about the new encrypted "Chat" rolling out to all users, and how you might lose all your past messages if you forget your passcode and do not have the app installed
I can swear I clicked Post. Do they classify posts based on their topic and delete the ones they don't like?
Anyway, we shall see, I am taking a screenshot and saving the URL.
@onusoz · 2025-12-14
very optimistic!
@onusoz · 2025-12-14
Crazy that @cursor_ai disabled Gemini 3 Pro on my installation, toggled it right back on. I wonder why, too many complaints maybe? That it’s hard to control?
On another note, disabling models without notification is dishonest product behavior. I would at least appreciate getting a notification, even when it might be against a company’s interests @sualehasif996
Language-agnostic interoperability layer for LLM APIs
So is somebody already building “LLVM but for LLM APIs” in stealth or not?
We have numerous libraries @langchain, Vercel AI SDK, LiteLLM, OpenRouter, the one we have built at @TextCortex, etc.
But to my knowledge, none of these try to build a language agnostic IR for interoperability between providers (or at least market themselves as such)
Like some standard and set of tools that will not lock you in langchai...
@onusoz · 2025-12-13
For those wondering what project this is: https://t.co/AzNS631PIC
@onusoz · 2025-12-13
This is how an agentic monorepo looks like. What was now a hurdle before is now a child's toy
This side project started as a Python project earlier in 2025
Then I added an iOS app on top of it
I rewrote the most important algorithms in Rust
I rewrote the entire backend in Go and retired Python to be used purely for prototypes
I wrote a webapp with Next.js
With unit and integration tests for each component
Lately ...
@onusoz · 2025-12-13
This is huge. Natively supported stacked PRs on GitHub would make life much easier, especially with human AND AI reviews
AI reviews with Codex/Claude/Gemini/Cursor Bugbot integrations are becoming especially important in small teams who are generating huge amounts of code
AI reviews don't work well if you don't split your work to diffs smaller than a few hundred lines of code, so stacked PRs are already an integ...
@onusoz · 2025-12-12
Read more on my blog post https://t.co/uzCcOXuadB
@onusoz · 2025-12-12
CLI coding tools should give more control over message queueing
Codex waits until end of turn to handle user message, Claude Code injects as soon as possible after tool response/assistant reply
There is no reason why we cannot have both!
New post (link below):
@onusoz · 2025-12-12
Codex v0.71 finally implements a more detailed way of storing permissions
But they are still at user home folder level. Saving rules in a repo still seems TBD
"execpolicy commands are still in preview. The API may have breaking changes in the future."
@onusoz · 2025-12-11
this is outrageous
Agentic coding tools should give more control over message queueing
Below: Why agentic coding tools like Cursor, Claude Code, OpenAI Codex, etc. should implement more ways of letting users queue messages.
@onusoz · 2025-12-03
At least some people at OpenAI must be thinking about buying @astral_sh