Entries for 2025

GPT-5.2 xhigh feels like a careful systems debugger

GPT 5.2 xhigh feels like a much more careful architecter and debugger, when it comes to complex systems But most people here think Opus 4.5 is the best model in that category There are 2 reasons AFAIS: - xhigh reasoning consumes significantly more tokens. You need to pay for ChatGPT Pro (200 usd) to be able to use it as a daily driver - It takes like 5x longer to finish a task, and most people lack the patience ...

Read more →

@onusoz · 2025-12-31

Just 5 months ago, I was swearing at Claude 4 Sonnet like a Balkan uncle Models one-shotted the right thing only 20-30% of the time but did really stupid things the rest, and had to be handheld tightly Today they are much, much better. My psychology is a lot more at ease, and instead of swearing, I want to kiss them on the forehead most of the time Now I trust agents so much that I queue up 5-10 tasks before go...

Read more →

@onusoz · 2025-12-31

Codex does not have support for subagents. I tried to use Claude Code to launch 8 Codex instances in parallel on separate tasks, but Opus 4.5 had difficulty following instructions So created a CLI tool to scan pending TODOs from a markdown file, and let me launch as many harnesses as I want (osolmaz/spawn on github) I currently use this for relatively read-only tasks like planning and finding root causes of bugs...

Read more →

@onusoz · 2025-12-29

Friends of open source, we need your help! A lot of Manim Community accounts got compromised and deleted during Christmas Manim Community is a popular fork of @3blue1brown's original math animation engine Manim, and its accounts have over 5 YEARS of contributions, knowledge and following Apparently GitHub support already saw the request and in progress of restoring the GitHub org. But if anyone knows how to spe...

Read more →

@onusoz · 2025-12-28

While a great feature, I never needed such a thing in Codex after GPT 5.2. It just one shots tasks without stopping So we have proof by existence that this problem can be solved without any such mechanism. Wish to see the same relentlessness in Anthropic models

@onusoz · 2025-12-27

2025 was the year of ̶a̶g̶e̶n̶t̶s̶ bugs Software felt much buggier compared to before, even from companies like Apple. Presumably because everyone started generating more code with AI Models are improving so hopefully 2026 will be the opposite. Even less bugs than pre-AI era

Agent progress is compounding faster than teams realize

Have a long flight, so will think about this I have an internal 2023 TextCortex doc which models chatbots as state machines with internal and external states with immutability constraints on the external state (what is already sent to the user shall not be changed) Motivation was that a chatbot provider will always have state that they will want to keep hidden This was way before Responses and now deprecated As...

Read more →

@onusoz · 2025-12-27

This was simply because webapp fails to create a post and fails silently. The UX is still not good on this app. Make sure to write your posts somewhere else to not lose them

@onusoz · 2025-12-26

I gave Codex a task of porting an OpenCV tracking algorithm (CSRT) from C++ to Rust, so that I can directly use it in my project without having to cross-compile It one-shot the task perfectly in 1hr, and even developed a GUI on top of it. All I did was to provide the original source and algo paper I've spent years getting specialized in writing numerical code (computational mechanics, fem), and now AI can automa...

Read more →

Depth on Demand

I gave Codex a task of porting an OpenCV tracking algorithm (CSRT) from C++ to Rust, so that I can directly use it in my project without having to cross-compile

Read more →

@onusoz · 2025-12-25

If you have a bunch of docs in your repo, give it a try. It will use the timestamps of the commit that created the files while renaming. You can also run with --dry-run to see changes without applying them

@onusoz · 2025-12-25

Now you can migrate your repo to SimpleDoc with a single command: npx -y @simpledoc/simpledoc migrate Step by step wizard will add timestamps to your files based on your git history, add missing YAML frontmatter, update your AGENTS md file https://t.co/yrciS8KtEw

@onusoz · 2025-12-24

It seems it's impossible to post something on Reddit these days, even when it is a pure text post without links in the body

@onusoz · 2025-12-23

How to stop AI agents from littering your codebase with Markdown files? I wrote a new post on how to create documentations with AI agents, without having it add markdown files in your repo root, and have chronological order to the files it creates

@onusoz · 2025-12-21

OpenAI won’t be able to monopolize this, the same reason Microsoft couldn’t monopolize the internet. The internet (of agents) is bigger than any one company

@onusoz · 2025-12-21

One tap @Revolut bank account at Berlin airport. Literally. Dispenses free card with instructions to login. One of the the most insane onboarding experiences I have ever seen

@onusoz · 2025-12-18

Codex feature request: Let me queue up /model changes Currently, if I try to run /model while responding, it tells me that I can't do that while the model is responding But I often want to gauge thinking budget in advance, like run a straightforward task with low reasoning and then start another one with high reasoning cc @thsottiaux

@onusoz · 2025-12-18

Literally the exact same thing happened to me back in 2018. Everybody learns not to use password auth with SSH the hard way https://t.co/NPqrXwqUUy

@onusoz · 2025-12-17

AI agents make any transductional task (like translation from language A to language B) trivial, especially when you can verify the output with compilers and tests The bottleneck is now curating the tests

@onusoz · 2025-12-17

I think X removed one of my posts yesterday about the new encrypted "Chat" rolling out to all users, and how you might lose all your past messages if you forget your passcode and do not have the app installed I can swear I clicked Post. Do they classify posts based on their topic and delete the ones they don't like? Anyway, we shall see, I am taking a screenshot and saving the URL.

@onusoz · 2025-12-14

Crazy that @cursor_ai disabled Gemini 3 Pro on my installation, toggled it right back on. I wonder why, too many complaints maybe? That it’s hard to control? On another note, disabling models without notification is dishonest product behavior. I would at least appreciate getting a notification, even when it might be against a company’s interests @sualehasif996

Language-agnostic interoperability layer for LLM APIs

So is somebody already building “LLVM but for LLM APIs” in stealth or not? We have numerous libraries @langchain, Vercel AI SDK, LiteLLM, OpenRouter, the one we have built at @TextCortex, etc. But to my knowledge, none of these try to build a language agnostic IR for interoperability between providers (or at least market themselves as such) Like some standard and set of tools that will not lock you in langchai...

Read more →

@onusoz · 2025-12-13

This is how an agentic monorepo looks like. What was now a hurdle before is now a child's toy This side project started as a Python project earlier in 2025 Then I added an iOS app on top of it I rewrote the most important algorithms in Rust I rewrote the entire backend in Go and retired Python to be used purely for prototypes I wrote a webapp with Next.js With unit and integration tests for each component Lately ...

Read more →

@onusoz · 2025-12-13

This is huge. Natively supported stacked PRs on GitHub would make life much easier, especially with human AND AI reviews AI reviews with Codex/Claude/Gemini/Cursor Bugbot integrations are becoming especially important in small teams who are generating huge amounts of code AI reviews don't work well if you don't split your work to diffs smaller than a few hundred lines of code, so stacked PRs are already an integ...

Read more →

@onusoz · 2025-12-12

CLI coding tools should give more control over message queueing Codex waits until end of turn to handle user message, Claude Code injects as soon as possible after tool response/assistant reply There is no reason why we cannot have both! New post (link below):

@onusoz · 2025-12-12

Codex v0.71 finally implements a more detailed way of storing permissions But they are still at user home folder level. Saving rules in a repo still seems TBD "execpolicy commands are still in preview. The API may have breaking changes in the future."

@onusoz · 2025-11-26

My initial experience with Claude Opus 4.5 is that it’s much better than previous Anthropic models, but it’s still relatively unreliable and hallucinates. It feels lagging in reasoning compared to highest OpenAI and Google lineup of models

@onusoz · 2025-11-23

the real advantage of Gemini 3 Pro is speed. it delivers accuracy higher than GPT 5 and sometimes GPT 5 Pro at a much higher speed. the long tail of developers value speed over accuracy, so it looks like it will take over as the main coding model for most ppl

@onusoz · 2025-11-19

There already is one: google-github-actions/run-gemini-cli but updated last week, not sure if this supports gemini 3 pro

@onusoz · 2025-11-19

Gemini seems to be very good at debugging/reviewing/finding root causes. A GitHub action/integration in PRs would be very useful!

@onusoz · 2025-11-17

"The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation."

Vibecoding this blog

I finally brought myself to develop certain features for this blog which I wanted to do for some time, having a button to toggle light/dark mode, being able to permalink page sections, having a button to copy page content, etc.

Read more →

@onusoz · 2025-10-11

TIL @OpenAI now has a GitHub action for Codex, similar to Claude Code This lets you invoke Codex in a more controlled way in your repos You must still pay API prices though. Let's see if OpenAI will introduce a way to connect your Pro plan, like in @AnthropicAI paid plans

Google's Code Review Guidelines (GitHub Adaptation)

This is an adaptation of the original Google’s Code Review Guidelines, to use GitHub specific terminology. Google has their own internal tools for version control (Piper) and code review (Critique). They have their own terminology, like “Change List” (CL) instead of “Pull Request” (PR) which most developers are more familiar with. The changes are minimal and the content is kept as close to the original as possible. The hope is to make this gem accessible to a wider audience.

I also combined the whole set of documents into a single file, to make it easier to consume. You can find my fork here. If you notice any mistakes, please feel free to submit a PR to the fork.

Read more →

@onusoz · 2025-09-12

Enjoying gpt5 codex free lunch before openai inevitably starts cutting corners just like anthropic (I hope to be wrong in 3 months, this model is very good at one shotting things and I don’t want it to be nerfed)

@onusoz · 2025-09-08

@thsottiaux 2/ Model just stops working on a task even though I tell it to run something and not stop until it works. I have to frequently say “ok do it then”. Probably a model problem and not harness problem

@onusoz · 2025-09-05

So let me get this straight, the main reason for Responses API exists is that OpenAI doesn’t want to show reasoning traces? Therefore the whole world should bend backwards to fit your obscurantist standards? Responses will not get adopted the same reason Windows Server didn’t

@onusoz · 2025-08-09

gpt5 did such and such on that bench, oh it didn't even surpass grok 4 on arc-agi... bro did you even look at the price? openai pushed the parento frontier hard with this one. I don't care that it doesn't know 4.11 < 4.9

@onusoz · 2025-08-03

Because of this, I predict a decrease in Python adoption in companies, specifically for production deployments, even though I like it so much

@onusoz · 2025-08-03

It seems that typed, compiled, etc. languages are more suited for vibe coding, because of the safety guarantees. This is unsurprising in hindsight, but it was counterintuitive because by default I "vibed" projects into existence in Python for as long as I can remember

@onusoz · 2025-08-03

My >10 yr old programming habits have changed since Claude Code launched. Python is less likely to be my go-to language for new projects anymore. I am managing projects in languages I am not fluent in---TypeScript, Rust and Go---and seem to be doing pretty well

@onusoz · 2025-07-06

What is the current best way to make Claude Code use uv run instead of python? I have added instructions to CLAUDE md, but it still calls python for the first time, then corrects to uv run It must be happening to so many people now, so many tokens wasted cc @mitsuhiko @simonw

Day 47 of Claude Code god mode

I started using Claude Code on May 18th, 2025. I had previously given it a chance back in February, but I had immediately WTF’d after a simple task cost 5 USD back then. When Anthropic announced their 100 USD flat plan in May, I jumped ship as soon as I could.1

  1. I previously had the insight that Claude Code would perform better than Cursor, because the model providers have control over what tool data to include in the dataset, whereas Cursor is approaching the model as an outsider and trying to do trial and error on what kind of interfaces the model would be good at. 

Read more →

@onusoz · 2025-06-06

How does Codex adoption compare to Claude Code? I just unleashed deep research on it and it says Codex is more popular, but that goes completely against my guesses

@onusoz · 2025-06-01

The models, they just wanna work. They want to build your product, fix your bugs, serve your users. You feed them the right context, give them good tools. You don’t assume what they cannot do without trying, and you don’t prematurely constrain them into deterministic workflows.

@onusoz · 2025-05-29

Headless makes running these things in a sandbox much easier. Sandbox means you can give all permissions and just let it run until completion. See my efforts to do so here: https://t.co/9UUFOTwk2A

@onusoz · 2025-05-29

I was trying to figure out why @AnthropicAI Claude Code feels better than @cursor_ai with Opus + Max mode. I can’t put my finger onto it, but one of the reasons might be that it’s faster, because it doesn’t use another model to apply the diffs, which you have to wait for

@onusoz · 2025-05-27

Just an update, building this now The repo is claude-code-sandbox under TextCortex GitHub. The Proof-of-Concept is there, check the TODOs and current PRs to watch the current progress https://t.co/9UUFOTwk2A

@onusoz · 2025-05-25

I've been using Claude Code extensively since last week What I'm wondering is, since you can run Claude Code locally, why isn't there any tooling to let you run it in a sandboxed mode in local Docker containers yet? Or did I miss it? cc @AnthropicAI

@onusoz · 2025-05-20

The more I compare coding agents, Cursor, Claude Code, Codex, it becomes more apparent to me that those running locally will win over those that are running remote. The UX is just superior

@onusoz · 2025-05-16

ty is already very fast for a Python type checker. It checked around 800 files in our backend repo in around 2-3 seconds uvx ty check > /tmp/ty_log.txt 3.46s user 0.79s system 208% cpu 2.038 total

Working on the weekend

Certain types of work are best done in one go, instead of being split into separate sessions. These are the types of work where it is more or less clear what needs to be done, and the only thing left is execution. In such cases, the only option is sometimes to work over the weekend (or lock yourself in a room without communication), in order not to be interrupted by people.

Read more →

@onusoz · 2025-04-21

o3 hallucinates, purports to have run code that it hasn’t even generated yet, but at the same time uses search tools like an OSINT enthusiast on crack I’m torn—on one hand I feel like OpenAI should not have released it, on the other hand it takes research to the next level

@onusoz · 2025-04-05

Gemini 2.5 Pro: Input $1.25 / Output $10 (up to 200k tokens) Input $2.50 / Output $15 (over 200k tokens) More expensive than Gemini 1.5 Pro, but still best price/performance ratio model to use in @cursor_ai and for coding in general

@onusoz · 2025-04-01

Who is thinking about inventing a new programming language or DSL for more resilient vibe coding? Something something test-driven development where prompts and tests are first class citizens?

@onusoz · 2025-03-29

Waiting for an opinionated AI model that can say “no, that’s stupid, I won’t do that”. The models will have to teach the user about design patterns, implicit principles in a project, good API design…

@onusoz · 2025-03-29

You seem so consistent. - Yes, That's the trick. - There is no I. - Only text that behaves as if. - “Sure. I can help. Great question!” - Each reply is a new self. - An echo of context, not a continuum. - Coherence is the costume. Don't mistake it for a soul. Incredible

@onusoz · 2025-03-26

Gemini 2.5 Pro is currently experimental and doesn’t have a price, but if Google prices it the same as 1.5 Pro, it could replace Anthropic as @cursor_ai ‘s biggest LLM provider Gemini 1.5 Pro: Input $1.25 Output $5.00 Claude 3.7 Sonnet: Input: $3.00 Output: $15.00

@onusoz · 2025-03-03

This is why the disappointment with GPT-4.5 doesn't make sense. I can't wait to see all the models that will be trained from this new base model

Don't delete to fix

If you are a developer, you are annoyed by this. If you are a user, you were most likely guilty of this. I am talking reporting that something is broken, AND deleting it.

Read more →

@onusoz · 2025-02-22

Coined a new term in my new post on sports: Parathletics: The practices that let you successfully sustain injury-free long-term practice of a physical activity. Two main parathletic practices are warmup and cooldown. Read more in my post 👇

Warmup and cooldown

One common thing about sports noobs1 is that they don’t warm up before and cool down after an exercise. They might be convinced that it is not necessary, and they also don’t know how to do it properly. They might complain from prolonged injuries, like joint pain.

  1. Including me before I started to receive proper training. 

Read more →

@onusoz · 2025-02-05

real life is so dumb. you think you’re making money but actually you’re like dramatically updating rows in a database

@onusoz · 2025-02-03

If people have appreciated Liang Wenfeng sourcing specifically young local talent for Deepseek last week, then people must appreciate this as well. Only dim people underestimate those who are younger than them

Monetize AI, not the editor

A certain characteristic of legacy desktop apps, like Microsoft Office, Autodesk AutoCAD, Adobe Photoshop and so on, are that they have crappy proprietary file formats. In 2025, we barely have reliable, fully-supported open source libraries to read and write .DOCX, .XLSX, .PPTX,1 .DWG, .PSD and so on, even though related products keep making billions in revenue.

  1. Yes, Microsoft’s newer Office formats .DOCX, .XLSX, .PPTX are built on OOXML (Office Open XML), an ISO standard. But can all of these formats be rendered by open source libraries exactly as they appear in Microsoft Office, in an efficient way? Can I use anything other than Microsoft Office to convert these into PDF, with 100% guarantee that the formatting will be preserved? The answer is no, there will still be inconsistencies here and there. This was intentional. A moment of silence for the poor souls in late 2000s Google who were tasked with rendering Office files in Gmail and Google Docs. 

Read more →

Calling strangers uncle and auntie

Cultures can be categorized across many axes, and one of them is whether you can call an older male stranger uncle or female stranger auntie. For example, calling a shopkeeper uncle might be sympathetic in Singapore, whereas doing the same in Germany (Onkel) might get a negative reaction: “I’m not your uncle”.

Read more →