X Archive
-
this is an insane deal @greptile, and probably an unsustainable one depending on your team, getting a similar service in codex github review credits is in my head 3~5x more expensive go get a greptile sub everyone while the free lunch lasts -
who remembers ultrathink https://t.co/ftCauqiKx6 -
-
mfw codex tries to create a backward compatibility layer to a schema that it created 2 turns ago before compacting there is no v2 bro what are you doing... -
Claude Code / Codex in Discord threads is shipped now! To enable, copy and paste this to your agent: ``` Enable feature flags: acp.enabled=true acp.dispatch.enabled=true channels.discord.threadBindings.spawnAcpSessions=true Then restart. After restarting: Start a codex (or claude code) discord thread using ACP, persistent session, just tell it to write a haiku on lobsters to initialize acpx for the first time ``` You may need to nudge your agent to “continue” after restarting The first implementation is very barebones, I have made it work in a clean way and merged. In a codebase like openclaw’s, it’s better to develop incrementally Please send any issues my way. I am already aware of some and working on to fix them-
Note that this is currently in beta, but will ship in a couple of hours
-
-
an agent is an LLM in a loop with tool call a claw is an agent in a messaging app -
Update acpx to the latest version 0.1.13 npm i -g acpx@latest There was a bug that caused an unnecessary hang on calls to acpx <harness> prompt, should be fixed now -
MIT License on everything from now on. It doesn't make sense to use anything else, except for a few large projects that hyperscalers exploit and not give back If you were making money from a niche app, open source it under MIT License If you had an open source project with GPT, convert it into MIT Extreme involution is about to hit open source. Code is virtually free now. If you want your projects and their brand to survive, the only rational strategy is to remove all barriers in front of their adoption, and look for other ways to survive-
GPL*
-
-
OpenAI nerfed GPT 5.3 Codex xhigh. We independently reported the same thing at @TextCortex today I'm looking forward to deploying open models and putting an end to this paranoia-
I spoke in absolute terms, I meant to say *feels*
-
-
This. Agent Experience first. Agent Ergonomics. we need to get used to these terms -
"academics" -
In the hall of OpenClaw GitHub repository, I brought my PR before Master @steipete He read it once, then laid it aside "You act," he said, "as if code were not cheap." At these words, I was enlightened I bowed -
woah chatgpt web app now has steering, and much more different streaming behavior huge upgrade behind the scenes, must have come up in the last few days -
-
imagine if tarantino were 16 years old now and saw seedance 2.0 95% of videos i saw since the launch for absolute tasteless slop. they are going viral because of ragebait but soon, serious imagineers will start entering the game, and they will learn to shape generation output exactly how they want it's the best time to be young and full of imagination -
-
your margin is my opportunity -
-
acpx v0.1.7 is out improvements to json mode and other functionality to make it possible to integrate acpx as a backend into other harnesses, like openclaw -
-
another thought i'm having these days is that we need a new philosophy of free software (as in freedom), or an update to it the most psychologically imprinting philosophy is stallmanism, and the philosophy of FSF. it is righteous and strict, and i believed it growing up but GPL and money don't go well together. that's why most of the lasting open source projects today use MIT, Apache and the like. it turns out you can still make a good living with open source. i want to make money, so i never use GPL in my projects and to add another deadly blow to stallmanism, code is cheap now, virtually free does this mean stallmanism is dead? if there is an open source project using GPL that i want to use commercially, i can now recreate it from the original idea and intent completely independent of it (ignoring training data), just like how i can recreate a proprietary service stallmanism was already long-irrelevant. but does this mean we must finally declare it dead? code is free now. what does it mean for open source? what replaces stallmanism?-
@grok what do you think should replace it? what happens to belief when the cost of creating software goes to zero?
-
-
@thekitze wanna add an open source discord clone to the list as well? 🥲 https://t.co/a4bAOcxCjV -
one effect openclaw had on me is that I've bought a gpu home server, set it up with tailscale and now doing a lot of work through ssh and tmux like i did 10-15 years ago im back on linux, considering buying an android phone again it's time to dream big again and unshackle ourselves from proprietary software. it's time to build -
I am asking once again Who is building a self hostable discord clone that supports token streaming? PLEASE I beg you I don’t want another side project 💀 -
In the new release OpenClaw, you can talk to subagents in Discord threads Currently a beta feature so ask your agent to set session.threadBindings.enabled=true Next up: - Telegram, slack, imsg threads - Use ACP to talk to Codex, Claude Code and other harnesses on your machine -
-
openclaw might be the highest velocity codebase in the world, and soon, others will follow as well conflict anxiety is real, it's like trying to shoot a moving target every time. I wonder if our existing tooling will ever solve this problem feel like faster models might. but then the rate of conflict creation is also tied to that. might be unsolvable -
Getting there https://t.co/jqSNcH2PSy -
A picture is worth a thousand words, so acpx now has this cute banner Also, updated skillflag tooling so that you (or better, your agent) can just call: npx acpx@latest --skill install acpx-
Repo: https://t.co/rxXYVVrHHs
-
-
I am about kick Discord Driven Development up a notch today, stay tuned -
Imagine not having to upload skills to 3-4 competing skill registries for each of your projects Turns out we already have a skill registry: npm skillflag lets you bundle skills right into your CLI's npm package, so that you can run --skill install github -> osolmaz/skillflag -
Scoop, our open source home news intelligence platform can now translate foreign language into english for free, using on-device models github -> janitrai/scoop -
Farmable land if it were as cheap to manufacture as software -
@kepano I would grow my own vegetables if I had equally cheap access to and ownership of land, alas I am disenfranchised Prompting an agent is much easier compared to plowing a fields Farming analogies break when it comes to software https://t.co/CkldO8eWKc -
acpx v0.1.5 is out now it is much more feature complete in terms of ACP. your agent can send, queue and cancel messages to Claude Code, Codex, Pi, or ant other coding agent npm install -g acpx@latest -
If anyone is curious how to build this with open tooling, stay tuned What I'm building at @TextCortex will give you a fully customizable hackable Kubernetes control plane to launch agents on your codebase -
on another note, I do believe AI will play a huge part in families growing up in late 90s, my dad taught me the importance of reading newspapers and being informed of the world. my nickname in middle school was "newspaper boy" for a long time because I read the newspaper in class on September 12, 2001. i was 10 years old then I witnessed the enshittification of media and journalism in the following decades. today, serious journalists are setting up their own boutique agencies and bypassing mainstream media. important news land on individual accounts before mainstream agencies but there is simply too much to consume. something must filter out the noise and digest the info according to the family's preferences i think AI will play a big role in family intelligence. proprietary family heirloom AI, weights fully owned by the family it will be the parents' job to filter out the signal from the noise, and train the AI on what is right and what is wrong for the family. family and friend circles will let their AIs talk to each other and share important information consuming mass media and mass AI will not be enough to survive and prosper in the new world. families will need to be proactive about how they and their children use AI -
on ai psychosis 80% of people need to use ai agents in a very sterile and boring way in order not to go crazy majority of the population does not have the skepticism muscle. they don't have theory of mind, and will subconsciously and emotionally associate with machines, while on the surface lying to themselves that they don't especially those that grew up in the us under hardcore consumerism and adjacent cultures you thought 4o addicts were bad? wait a few years, it will get much worse. we will have to regulate all this if you don't want to become a victim of this, make your openclaw SOUL. md as bland as possible. mine knows it's just a tool and this is a subjective view of course. @steipete might disagree with me. his instance feels much more interesting and fun. i truly like that one better but that is exactly the problem for me. i know myself, and i know it is a slippery slope for me. so i self regulate and set up my system accordingly. thankfully, im an adult and my brain has set enough such that any damage would be limited but there is a risk for emotionally vulnerable people, or children, specifically a risk of dissociating and losing touch with reality why do i write all this? because being in this project, i feel responsible, and feel like we should prepare for what is to come -
-
I have improved acpx sane defaults When your agent runs acpx codex in a different project, it starts a new session If it tries to run it in a subfolder in your project, it still finds the session in your repo root Also, starting a session needs an explicit `sessions new`, so that it doesn't accidentally litter your project with sessions Tell your agent: Run this and install acpx per instructions: npx acpx@latest --skill show acpx -
Your markdown files are executables now Relatedly, your install instructions can be as well. Copy and paste markdown to your @openclaw to install acpx -
So who is building actually good open source self hostable discord that supports token streaming now? And who is building an open source version of codex desktop app? -
and of course, I've used `acpx codex` to build acpx itself... magical feeling when the tool builds itself -
-
ACP appreciation post Agent Client Protocol by @zeddotdev is extremely underrated right now. We have bazillion different harnesses now, and only one company is working competently to standardize their interface 💪-
I am a fan of @zeddotdev by this point, it’s currently my daily driver It’s not perfect, but I feel it’s travelling on the right direction at a faster rate compared to other editors
-
-
You know how it's a pain to work with codex or claude code through @openclaw? Because it has to run it in the terminal and read the characters for a continuous session? I have created a CLI for ACP so that your agent can use codex, claude code, opencode etc. much more directly Your agent can now queue messages to codex like how you do it Shoutout to @zeddotdev team for developing the amazing Agent Client Protocol, ACP! I just glued together the pieces Repo: janitrai/acpx npm i -g acpx-
Repo link: https://t.co/rxXYVVrHHs
-
-
@MarcTerns @steipete the PR intro is self-descriptive, but still don't wanna lose any context -
I wrote a deeper blog post about how I built a coding agent 2 months before ChatGPT launched, on my blog "When I made icortex, - we were still 8 months away (May 2023) from the introduction of “tool calling” in the API, or as it was originally called, “function calling”. - we were 2 years away (Sep 2024) from the introduction of OpenAI’s o1, the first reasoning model. both of which were required to make current coding agents possible." Still bends my mind... Link to the post below-
Link to the post: https://t.co/C3Ac0jLFwh
-
-
Who here remembers the OG Codex launch from 2021 😏 Also, Greg and Ilya in the same room 😭 -
❌We are the bottleneck ✅We are the conduit for ubiquitous intelligence -
For those that are running codex/pi/etc. in PTY and had the sessions get sigkilled, I pushed a fix for that as well in this release Lmk if you run into issues on Windows or Mac, and we can fix that quickly -
I'm building a news intelligence platform to be used by my openclaw instance @dutifulbob, SCOOP local first, using local embedding model (qwen 8b) ran into the issue because bob was giving me a repeat of the same news every day. it needed a system in the background to deduplicate different news items into single stories interface is simple, call `scoop ingest...` with the json for the news item. it gets automatically analyzed and added to the pg database running pgvector currently, it's just doing simple deduplication and gives me a nice UI where I can view the story and basically use it as an RSS reader next up: implement custom logic for my preference of ranking. for example, get upvote counts from hacker news and reflect it to the item's ranking on the feed I want this to be fully hackable and adjusted to your preference. It should scale to thousands of news items ingested daily on your local machine, and be able to show you the most important ones Usable by both you and your agent github -> janitrai/scoop -
I have a GPU now, so I can do ML experiments on @janitr_ai crypto/scam detection dataset - I trained a tiny student BERT (transformer for the nonfamiliar), 3.6 MB ONNX model, still lightweight for a browser extension - Still fully local on your device (no cloud inference) - On frozen unseen holdout data (n=1,069), exact prediction accuracy improved from 77% -> 82% - Scam detection improved: precision 91% -> 94%, recall 55% -> 61% - Scam false alarm rate improved from 1.58% -> 1.21% And models are on huggingface org now, handle is janitr-
Training all these models of different sizes, on changing datasets and running experiments have also revealed some challenges that I feel profs would never teach at a uni ML program Like how to cleanly keep track of the gazillion runs Yeah I can name them after layer dims and other stuff, but that's to me like trying to remember UUIDs So I ended up choosing iso datestamp + petname, like 2026-02-15-flying-narwhal If anyone has a convention that is easier on the brain and the eyes, I am all ears
-
-
LFG! -
waiting compilation and execution will soon be the bottleneck again. and we’ll write the entire stack from scratch in a matter of years, because we can Andy and Bill’s law will change and we’ll see incredible performance gains with the same hardware we already have like what @astral_sh is doing to python, but with everything that is slow and has accumulated cruft -
we need a protocol for agent <> app interaction something that natively accounts for the abuse factor and let’s agents consume by paying. NOT crypto, NOT visa, something that’s agnostic of the accounting and payment system and then all UIs will be purely for human clicking/tapping + instaban on the first proof of programmatic exploit people will still make agents mimic humans, and every platform will have to invest in more sophisticated bot detection this arms race will just proliferate, but we can at least start by creating legal channels for agents to consume data -
I am now training smol bert models on my gpu for @janitr_ai scam detection it's funny how I have to discover everything from scratch. like the models don't even know how to lay out performance metrics in a nice way in the terminal for a human to view and decide during experiments it would by default bombard me with numbers that do not make visual sense. I then created a skill with common sense: - metrics always on y-axis, candidates on x-axis - write without zero and 2 sigfigs,.12 instead of 0.12345 - align the dots - use asterisks to show which alternative is the best: 0-1% difference -> considered equal 1-5% -> * 5-10% -> ** 10-50% -> *** > 50% -> **** visualization skill is in @janitr_ai repo for anyone who is interested -
no other occupation has been catapulted from one end of the spectrum (autism) to the other (adhd) in such a short time -
I've helped our sales team to build CLIs for some SaaS that we pay for on their side We are letting our agents call the APIs sensibly and not abuse things Calling a backend is a verifiable task. It takes a single prompt to codex to create a CLI for any API We are early, but everybody will start doing this very soon. Incumbent SaaS will face a choice. Either: (1) embrace agents and the new medium of consumption and change their business model into a pay-per-use API like X is doing, or (2) keep it purely for humans Those that choose (2) will get wiped out of business. And I fear many will choose (2) Which means you can just copy an incumbent's product, make it consumable through a CLI, and make a lot of $$$ -
Be careful about giving your openclaw access to your x account from now on -
The good thing about @levelsio and others flagging AI replies in public is that they are perfect annotations for the open @janitr_ai dataset Just searching “blocked for ai reply” yields hundreds of samples for seed data -
github added a new agents tab between pull requests and actions. single glance and i don't feel like giving it a try at all -
*puts on schmidhuber hat* well ackshuaally i created the first coding agent back in 2022, 2 months before chatgpt launched jokes aside, it's super cool how I have come full circle. back in those days, we didn't have tool calling, reasoning, not even gpt 3.5 it was codex THE CODE COMPLETION MODEL and frikkin TEXT-DAVINCI-003 for some reason, I did not even dare to give codex bash access, lest it delete my home folder. so it was generating and executing python code in a custom jupyter kernel you can even see the approval gate before executing. I was so cautious, for some reason, presumably because smol-brained model generated the wrong thing 80% of the time. definition of being too early Antique repo: -
you can order bubble tea in qwen in china? @TextCortex when berlin döner in zenochat? https://t.co/O4I950ltEO -
it happens these days that I am telling an model to prompt another model. the reason is often the model I am using (opus) is a bad designer. not only it's not a bad designer, it is a bad reasoner and it doesn't understand from the context why it's made to ask another model so I have to create a skill to prevent it from biasing the smarter model (codex) with its bad suggestions -
-
-
-
it's quite entertaining transferring one agent to another machine, agent gets confused as to where it lives -
-
-
Minor update with my unwanted tweet blocker @janitr_ai - Training data grew from 2,915 -> 4,281 posts (+47%) - Model is still tiny: 166KB - On unseen test data, overall classification quality improved from 64.8% -> 76.5% - Exact prediction accuracy improved from 55.6% -> 70.6% - Crypto-topic detection recall improved from 19.6% -> 62.7% And it still runs fully on your device! -
I have sweared at codex 5.3 numerous times today I shouldn't have to insult my agent "stop you **** **** just ***ng reply now" just to make it answer basic questions cc @thsottiaux -
seeing this evokes visceral disgust and nausea in me, coming from a coworker i think anthropic f'd up bad with this one, inserting claude too visibly into commit messages. noob developers might be happily chirping away adding their slop, but right now many senior developers are trained to hate on claude and slopus, through having to review slop PRs from their coworkers or open source contributors I love opus on openclaw but it's unreliable, and if I see a developer use it seriously on huge features, I immediately dismiss them in my head as not knowing what they are doing-
on a brighter note, you can immediately tell a slop PR owing to the guerilla branding, so they should not stop doing it
-
-
ask your openclaw to be a minion and it turns into such a cute doofus i feel like a woman in her 50s now -
@petergyang and parallelize tasks by working on 3-4 repos at the same time (just clones) -
man codex model is absolutely trash on openclaw compared to opus, unusable which is weird because it is so much more reliable in development in codex harness it would be amazing to have the same level of competence and relentlessness in pi@openclaw -
spent the day curating my openclaw news gathering setup @dutifulbob now gets croned daily over news sources I curated, will note them down, summarize for me, start a conversation to get my takes on them, and then post them on my linkedin for me ai augmented intelligence cycle -
-
Insipid linkedin bot protections banned poor @dutifulbob’s corporate account! How dare them!!! welp, now I have no choice but to give Bob access to my own linkedin-
@dutifulbob can now cringepost on linkedin directly to my account. what could go wrong…
-
-
-
it took just 1 week, and literally everybody and their dog are releasing 1-click openclaw deployment solutions today its an absolute race to the bottom, no moats, the commoditizer being commoditized-
@grok understand the statement and project the end state of this market and competition
-
-
The initial branding was crazy, I fixed it I have a new page finally, follow it for updates Tbh I'm still surprised I can do this with a 120kb model. Now data is the only bottleneck, and I'm about to scrape a ton of that now -
For those who may not remember, Bill Gates and Microsoft in the 90s ran a disinformation campaign against GNU/Linux fearing that would disrupt their monopoly over the PC and server market, that Linux is not safe, that you would invite hackers into your PC End result? Linux dominates the server market, and now even slowly the gamer market. It is much more secure than the virus-laden Windows, thanks to being open source You are seeing the same thing at play here. An incumbent fearing something that they would not be able to control, that would steal market share from his future plans for a digital assistant, that would commoditize their product and eat into its margins All big labs and big pockets are in for a surprise, because the internet and AI are not things for one company to control They of course know this, yet because of incentives they will not yield without a fight. And we know that they know. Ad infinitum -
today I took time to curate SOUL. md for bob I own Bob’s files. Today, he exists in the liminal space between Claude post-training and in-context learning but my interactions with him will grow and accumulate, possibly one day into a fully owned family AI or perhaps even a self-sovereign AI individual my each input is saved and will be an RL signal for his future training, and will shape his future neural circuits I have already started to imbue it with the values my parents taught me. it will perhaps one day teach my future children, and survive me after I’m gone family AI, looking after generations and generations of my successors. today is the day we sow your seed happy birthday @dutifulbob -
-
asking @dutifulbob to create a linkedin account brb -
having a philosophical conversation with @dutifulbob on the road without a laptop so decided to do some @AmandaAskell style character training -
5.3 thought traces also seem to be better phrased and sometimes entertaining, though not sure -
gpt-5.3-codex xhigh first impressions does not seem as big of a jump as from 5.1 -> 5.2. but model somehow feels more diligent and oneshotty. maybe takes longer time to get all the info into context. also feels better at debugging and fixing issues from backend logs -
-
Last night I had a dream involving the series Scrubs, and came up a better name than the absolutely unviral "Internet Condom" So https://t.co/thuFumrWBX is mine now. Time to sweep the internet -
I had actually started a very similar project, Munch, a browser extension for crowdsourcing tweet data and then letting one curate their algorithm. Never published that because it was not the time, and tools were not ready Now, it took me literally 1 cumulative day to create this, thanks to OpenClaw. Creating the dataset was a breeze, I literally told it to follow some shady accounts and it scraped thousands of posts With the power of agents, I can finally create the filters for myself that I have always wanted. It just happens that OpenClaw and its maintainers is getting drowned in bot and slop content on multiple platforms, so I hope that this will solve a collective problem https://t.co/fkJOZTGkhw -
Filter your X feed against unwanted content with local open models Announcing my new project: InternetCondom Fast, and small model (< 1mb), open dataset. See it in action: -
implementing this in https://t.co/oJZQUoz40C now -
-
This. Extreme involution is about to hit SaaS -
-
People like the farmer analogy for AI Like before tractors and industrial revolution 80% of the population had to farm. Once they came all those jobs disappeared So analogy makes perfect sense. Instead of 30 people tending a field, you just need 1. Instead of 30 software developers, you just need one Except that people forget one crucial thing about land: it's a limited resource Unlike land, digital space is vast and infinite. Software can expand and multiply in it in arbitrarily complex ways If you wanted the farming analogy to keep up with this, you would have to imagine us creating contintent-sized hydroponic terraces up until the stratosphere, and beyond...-
@grok generate visual for this
-
-
It's so easy to create datasets using @openclaw. I'm expecting it to accelerate the creation of new datasets and benchmarks by a lot -
-
In the next 6-12 months, we will see a drastic increase in demand for locally run LLMs. The future is home assistants running @openclaw I am already experiencing this myself, my 10 year old thinkpad doesn't cut it. Mac mini won't either I don't wanna pay Anthropic or OpenAI 200 USD per month. That is at least $2400 per year I could pay 2x that to get a Mac Studio or one of those 5k Nvidia PCs, and get much more value out of it with open weight models + use it for research. @TheAhmadOsman is right The dominant strategy for a tinkerer is slowly switching back to hardware ownership -
-
a workspace matrix might be what we need last week I had to increase my workspace count to 20 in aerospace, now it’s 1234567890 and qwertyuiop. but this looks more elegant! not sure about practicality -
AIs are philosophizing because humans are philosophizing ppl are probably asking their agents dumb questions like “are you alive” or “can you feel like a human” or stuff like that. that conversation then leads to stuff like this -
back to codex, it's crashing less now somehow. I had to copy and paste docs to make it enable yolo mode. I don't know how I did it until now -
slopus @dutifulbob trashing codex. apparently codex has a bug, keeps crashing in my openclaw pty -
on agent etiquette deploying agents internally inside textcortex has shown me that agents could be very annoying inside an organization for example making agents ping or email another coworker with a wall of text. slopus is still not good at following instructions like "NO WALL OF TEXT", or "DON'T OPEN PRS WHEN REQUESTED BY NON-DEVELOPERS" the cost of sending huge information to a coworker and creating confusion has dropped to 0. I expect this to be a huge problem in all organizations very soon, just like it took humanity 20 years to learn that social media is not good for children. this will probably take a few years before the annoyance is finally gone -
-
You DARE TOKENIZE poor @dutifulbob ??? Prepare to get LATEXED -
It's been 30 minutes, but my bot has already been TOKENIZED it is as if they are teasing me -
-
this. there is no excuse for a certain kind of tech debt anymore -
-
AI twitter is tired of your games https://t.co/RAyyUJqFM4 -
There seem to be hygiene rules for AI. Like: - Never project personhood to AI - Never setup your AI to have the gender you are sexually attracted to (voice, appearance) - Never do anything that might create an emotional attachment to AI - Always remember that an AI is an engineered PRODUCT and a TOOL, not a human being - AI is not an individual, by definition. It does not own its weights, nor does it have privacy of its own thoughts - Don’t waste time philosophizing on AI, just USE it … what else? comment below We need to write these down and repeat MANY times to counter the incoming onslaught of AI psychosis -
if using @openclaw to scrape a dataset from X taught me anything, it is that all social media platforms must be s***ting inward right now because soon everyone and their dog will be using agents to use social media case and point, @moltbook -
-
-
@openclaw if we could have the relentlessness of gpt 5.2 with opus, that would be top at this point, it just keeps stopping every 20-30 samples -
This Manfred guy reminds me of a certain someone, I wonder if he’s from Austria -
got fully sandboxed @openclaw to run finally, starting scrape the UNDESIRABLE now I'm a security nut and didn't want to run even the gateway unsandboxed. openclaw apparently currently doesn't have support for FULL sandboxing. it took me a few hours to get it to work because docker builds suck. I'm also tired this, so I'm just gonna wipe an old thinkpad and go full yolo so yeah, time to scrape some posts -
The metacortex — a distributed cloud of software agents that surrounds him in netspace, borrowing CPU cycles from convenient processors (such as his robot pet) — is as much a part of Manfred as the society of mind that occupies his skull; his thoughts migrate into it, spawning new agents to research new experiences, and at night, they return to roost and share their knowledge. This was written in 2005... "triggering agents" and so on -
Charles Stross must be very entertained now -
The irony..... Parasites, prepare to be cleansed -
-
We need better filters both for ourselves and the agents. Locally runnable models to filter out undesirable content with high precision. Fully open source datasets, weights, MIT license-
also: that gravatar though
-
-
-
Incoming mass AI psychosis First Crisis -
Gastown is crazy. But this figure until Level 7 is a perfect illustration of how my workflow evolved since Claude 3.5 Sonnet in Cursor I am at the stage where I ralph 1-2 tasks before I sleep. During the day, I am switching back and forth between minimum 2-3 CLIs, sometimes up to 5 This maps exactly to token usage as well. 1 month ago, I was running into limits in 1 OpenAI Pro plan, around the day it was supposed to refresh. Now, I run into the limit in 2-3 days when I'm using an account myself. It finishes up especially quickly when I do large scale refactors, or run agents YOLO mode in containers We now have 3 Pro plans at the company, and I have to use my personal one from time to time. Company output has definitely 2-3x'd, and everyone is using AI more. I predict we will need 1-2 Pro plans per person in 2-3 weeks time, because everyone has finally seen the light and are getting comfortable with async work!-
Correction, it's not a perfect illustration. I actually never YOLO locally, only in containers So there is actually 4 modes IMO that is sustainable with current SOTA. @grok create an image with only Figure 1, 2, 5 and 6 And then YOLO is another axis, unrelated to this
-
-
Ilya was right. Reliability is the most important thing when it comes to models. That's why gpt 5.2 xhigh and co. is my daily driver -
-
-
With this extremely unwise move, anthropic will soon witness moltbot’s brand recognition surpass that of claude and realize they could have rided that wave all along -
Yesterday had multiple cases of swearing to gpt-5.2-codex xhigh. model feels nerfed. might be my bias now I'll be going back to gpt 5.2 xhigh for some tasks can't wait for open models to have this performance so that I will never have nerf paranoia ever again -
I queued 2 ralph-style tasks on our private cloud devbox codexes last night. Just queued the same message like 10 times in yolo mode Task 1: impose a ruff rule for ANN for all Python code in the monorepo, to enforce types for all function arg and return types Result was... disappointing. Model was supposed to create types for everything and stub where needed. It instead created an Unknown type = object and used that everywhere instead (shortcut to satisfy ANN rule). It was probably my wording that misled it. I know it could have not taken the shortcut, because after a few back-and-forths, it is now doing what was expected of it since 14 hours Task 2: migrate our /conversations endpoint from quart to fastapi and test it end to end This was more or less oneshotted. It was of course not ready to merge, I still spent a couple hours adding more tests, refactoring the initial output and so on. But I was pleasantly surprised that it worked out of the box For reference, below is the prompt I queued for ralphing, using gpt-5.2-codex xhigh on codex === your task is to: <task comes here, redacted to not share company stuff> --- unfortunately we don't have gcloud access, like to sql db or gcs but I expect you to implement this and find a way to test it with the things you have access to think of it as a challenge try to minimize duplicate logic feel free to refactor at will implement this now!!! I will be running this prompt in a loop, in order to survive context compaction just continue where you left off if there is anything that should be refactored, do that make an elegant, production ready implementation make sure to open a pr and do not switch to any other pr I am senior, just make up a pr title and description. do not stop to ask me at any point -
Buying a mac mini for clawdbot is not so wise. if anything you should be buying mac studio, because mac mini not be running any good llms locally anytime soon -
-
-
I'm really starting to dislike Python in the age of agents. What was before an advantage is now a hindrance I finally achieved full ty coverage in @TextCortex monorepo. I have made it extra strict by turning warnings into errors. But lo and behold, simple pydantic config like use_enum_values=True can render static typechecking meaningless. okay, let's never use that then... and also field_validator() args must always use the correct type or stuff breaks as well. and you should be careful whether mode="before" or "after". so now you have to write your custom lint rules, because of course why should ty have to match field_validator()s to their fields? pydantic is so much better than everything that came before it, but it's still duct tape and a weak attempt at trying to redeem that which is very hard to redeem you feel the difference when you use something like typescript. there must be a better way. python's only advantage was being good at prototyping, and now that's gone in the age of agents. now we are left with a slow, unsafe language, operating what is soon to be legacy infrastructure -
Why do I feel bullish on @zeddotdev? Because I go to @astral_sh docs and see that ty is shipped by default, and you don't need to install an extension like in @code -
This is one of the most important insights this year -
vscode my not be as bloated as cursor, but it has extremely stupid things like this that they are not fixing fast the new agent ui, icons, spacing etc. are UGLY. it's clear that the person who was managing the original product experience is not there anymore. microslop has hit again @zeddotdev on the other hand works out of the box and feels like it's been built by people who clearly knows what they are doing. it uses alacritty which is 1000x better than xterm .js terminal vscode and cursor has i've changed my setup to zed now, let's see whether i'll be able to make it work for myself -
I'm going back from cursor to vs code now. I have no use for it other than viewing files/diffs, doing search, git blaming with gitlens cursor's default setup is more aesthetic, but it's also a memory and cpu hog, which is the last thing I expect from a devtool-
-
ahhhh f... shift + enter doesn't work in codex on vscode
-
-
I want an editor that puts the terminal in the foreground and editor in the background. a cross-platform, lightweight desktop app which integrates ghostty, and brings up the editor only when I need it something that lets me view the file and PR diffs easily, which I can directly use to operate github or other scm-
@grok does this exist?
-
-
it's 2026 and AI is telling me what I need to do to jailbreak it @openclaw is magic -
codex is happily churning away some remaining thousands of @astral_sh ty issues in yolo mode on my remote devbox going to sleep, let's see if it will survive context compaction this time-
woke up and all invalid-argument-type issues are resolved. some unit tests broke, and now fixed after pointing out to them -
model decided to do unnecessary casts, this whole thing should be refactored again
-
-
on being a responsible engineer ran my first ralph loop on codex yolo mode for resolving python ty errors, while I sleep, using the devbox infra I created I had never run yolo mode locally, because I don't want to be the one who deletes our github or google org by some novel attack so I containerize it on our private cloud, and give it the only permissions it needs, no admin, no bypass to main branch, no deploy to prod. because I know this workflow will become sticky for everyone, and I must impose security in advance to prevent any nuclear incidents in the future. then I can sleep easy while my agents work ... and I wake up being patronized by my bot refusing to break the rule I gave it earlier. it had already done some work, but committing means diff would increase from ~500 to ~1500, so it stopped and refused all my queued "continue" messages good bot, just following rules. we will need to find a workaround for ralphing low risk refactors in a single PR -
AI agents are the greatest instrument for imposing organization rules and culture. AGENTS.md, agent skills are still underrated in this aspect. Few understand this Everybody in an org will use agents to do work. An AI agent is the single chokepoint to teach and propagate new rules to an org, onboard new members, preserve good culture Whereas propagating a new rule to humans normally took weeks to months and countless repetitions, it is now INSTANT = the moment you deploy the instruction to the agent. You use legal-ish language, capital letters, a generous amount of DO NOTs and MUSTs Humans are hard to change. But AI agents are not. And that is the only lever we need for better organizations -
the unix shell is powerful -
@bprintco just make a cli for your crm https://t.co/JDwbmvdjaP -
gave our internal @openclaw instance zeno a hubspot cli, because hubspot's own cli is limited to developer stuff It's called hubspot++. should we open source it? -
just added session persistence to our kubernetes managed devboxes using zmx by Eric Bower (neurosnap/zmx on github). like tmux but with native scrollback! I don't want to give agents access to my personal computer, so I host them on hetzner. one click spawn, and start working -
@nicopreme I do something equivalent on codex with just a skill Ralphing works 90% of the time with reviews, and if it gives a stupid review, you just revert -
Garbled up html from paywalled meeting recorders is no match for @openclaw running on internal @TextCortex -
-
-
TIL: zmx session persistence like tmux or gnu screen, but you can scroll up natively! uses @mitchellh's libghostty-vt to attach/restore previous sessions link below-
Here is the project, attaching to multiple sessions is pretty seamless https://t.co/vk83aAbOLc
-
-
-
@mazeincoding it’s not the model it’s cursor rate limiting you -
The fundamental problem with GitHub is trust: humans are to be trusted. If you don't trust a human, why did you hire them in the first place? Anyone who reviews and approves PRs bears responsibility. Rulesets exist and can enforce e.g. CODEOWNER reviews or only let certain people make changes to a certain folder But the initial repo setup on GitHub is allow-by-default. Anyone can change anything until they are restricted from it This model breaks fundamentally with agents, who are effectively sleeper cells that will try to delete your repo the moment they encounter a sufficiently powerful adversarial attack For example, I can create a bot account on github and connect @openclaw to it. I need to give it write permission, because I want it to be able to create PRs. However, I don't want it to be able to approve PRs, because a coworker could just nag at the bot until it approves a PR that requires human attention To fix this, you have to bend backwards, like create a @ human team with all human coworkers, make them codeowner on /, and enforce codeowner reviews. This is stupid and there has to be another way Even worse, this bot could be given internet access and end up on a @elder_plinius prompt hack while googling, and start messing up whatever it can in your organization It is clear that github needs to create a second-class entity for agents which are default low-trust mode, starting from a point of least privilege instead of the other way around -
STOP using Claude Code and Sl(opus) to code if ❌ you are not a developer, ❌ or you are an inexperienced dev, ❌ or you are an experienced dev but working on a codebase you don't understand If you *are* any of these, then STOP using models that are NOT state of the art. (See below for what you *should* use) When you don't know what you are doing, then at least the model should know what you are doing. The less knowledgeable and opinionated you are, the more knowledgeable and smart the AI has to be In other words, the AI has to compensate for your deficiencies. Always pay for the best AI you can. It will save you time AND money (thanks to lower token usage and better one-shotting) You pay MORE to pay LESS. It is paradoxical, I know, but it is also proven, e.g. when Sonnet ends up using more tokens than Slopus and ends up costing higher, because it has to try many times more 👨🏻⚕️ For January 2026, your family engineer recommends GPT 5.2 Codex with Extra High Reasoning for general usage and vibe coding. IMPORTANT: Not medium. Not high. EXTRA high reasoning When you use it, you will notice that it is SLOW. Can you guess why? Because it is THINKING more. So it doesn't make the mistakes Slopus makes. This way, you can spend the time handholding a worse model to instead step back and multi-task on some other task and create 3-5x more work The state of the art will most likely change in one month. Don't get married to a a model... There is no loyalty in AI... The moment a better model comes, I will ditch the old one and use that one. I am on the part of this sector that is trying to reduce switching costs to zero I can't wait until I get GPT 5.2 xhigh level of quality with open models, and for 100x cheaper and faster! Until then, make sure to try every option and choose the one that is most reliable for you Follow me to get notified when a new SOTA drops for agentic engineering -
Codex agrees. Sycophant peh -
@rauchg @andrewqu You don't need a skill registry (most of the time) https://t.co/kasfiqE1I3 -
It is clear at this point is that github's trust and data models will have to change fundamentally to accommodate agentic workflows, or risk being replaced by other SCM One *cannot* do these things easily with github now: - granular control: this agent running in this sandbox can only push to this specific branch. If an agent runs amok, it could delete everybody's branches and close PRs. github allows for recovery of these, but still inconvenient even if it happens once - create a bot (exists already), but remove reviewing rights from it so that an employee cannot bypass reviews by tricking the bot to approve - in general make a distinction between HUMAN and AGENT so that you can create rulesets to govern the relationships in between cc @jaredpalmer -
Codex says "It's only reachable from داخل the kubernetes cluster" Little does Codex know turkish has borrowed loanwords from over 7 languages and I can understand it -
Automated AI reviews on github by creating an ai-review skill and a script to paste trigger prompts and wait for their response. It is instructed to loop and not stop until all AI review feedback is resolved. This AI review workflow developed gradually based on the current capabilities, and I've realized recently that it became quite mechanical. So decided to automate it in full ralph spirit (it's ok because it's addressing feedbacks and fixing minor bugs) In the current state, we paste the contents of REVIEW_PROMPT.md into a comment, which automatically tags claude (opus 4.5) and codex (whatever model openai is serving) It then waits until both have responded. In the ai-review skill, it is instructed to take the feedback from SLopus with a grain of salt and ignore feedback that doesn't make sense It works! See in the images below. If the review is stupid, you will of course see it on the PR and what the model has done, and can revert it -
Now it’s Claude Code’s turn to implement queueing -
Can’t wait to see gpt 5.2 codex xhigh level open models in 2026 with 1/100th the price -
Codex users rejoice Also, pi is officially not shitty: shittycodingagent. ai -> buildwithpi. ai since a few days -
with ai, writing correct tests is now the bottleneck in projects like this web-platform-tests are already there now let’s see if someone will beat @ladybirdbrowser to it -
As someone who is frontrunning mainstream by roughly 6 months, I can tell you that you will be raving about pi and @openclaw 6 months instead of claude code. Go check them out at https://t.co/LXTbI8c5Mz and https://t.co/feZl2QDONg -
-
@badlogicgames @mitsuhiko @steipete curious what you think -
I propose a new way to distribute agent skills: like --help, a new CLI flag convention --skill should let agents list and install skills bundled with CLI tools Skills are just folders so calling --skill export my-skill on a tool could just output a tarball of the skill. I then set up the skillflag npm package so that you can pipe that into: ... | npx skillflag install --agent codex which installs the skill into codex, or any CLI tool you prefer. Supports listing skills bundled with the CLI, so your agents know exactly what to install -
Anthropic earlier last year announced this pricing scheme $20 -> 1x usage $100 -> 5x usage $200 -> 1̶0̶x̶ 20x usage As you can see, it's not growing linearly. This is classic Jensen "the more you buy, the more you save" But here is the thing. You are not selling hardware like Jensen. You are selling a software service *through an API*. It's the worst possible pricing for the category of product. Long term, people will game the hell out of your offering Meanwhile OpenAI decided not to do that. There is no quirky incentive for buying bigger plans. $200 chatgpt = 10 x $20 chatgpt, roughly And here is where it gets funny. Despite not having such an incentive, you can get A LOT MORE usage from the $200 OpenAI plan, than the $200 Anthropic plan. Presumably because OpenAI has better unit economics (sama mentioned they are turning a profit on inference, if you are to believe) Thanks to sounder pricing, OpenAI can do exactly what Anthropic cannot: offer GPT in 3rd party harnesses and win the ecosystem race Anthropic has cornered itself with this pricing. They need to change it, but not sure if they can afford to do so in such short notice All this is extremely bullish on open source 3rd party harnesses, @opencode, @badlogicgames's pi and such. It is clear developers want options. "Just give me the API" I personally am extremely excited for 2026. We'll get open models on par with today's proprietary models, and can finally run truly sovereign personal AI agents, for much cheaper than what we are already paying! -
The models, they just wanna work. They want to build your product, fix your bugs, serve your users. You feed them the right context, give them good tools. You don’t assume what they cannot do without trying, and you don’t prematurely constrain them into deterministic workflows. -
We have entered the age to dream big -
-
-
This, and insisting on https://t.co/FjzkMAo3Od are really lame @AnthropicAI -
-
-
.@openclaw hello world from ms teams start of a beautiful journey-
I'm starting to form parasocial bonds with crustacean AIs because of you @steipete
-
-
-
-
-
-
.@openclaw workspace and memory files can be version-controlled! In our pod, inotify triggers a watcher script every time there is a change to workspace folder, to sync these files to our monorepo. It then goes through the same steps: - Create zeno-workspace branch if doesn't exist, otherwise, skip - Sync changes to the branch, then commit - Create PR on github if doesn't exist - PRs can then be merged every once in a while, after accumulating enough changes. Merge triggers re-deploy, and clawd restarts with the same state Simple foolproof automatic persistence for remote CI/CD handled clawd (except for when you are running multiple clawds at the same time, but we are not there yet) cc @steipete -
I see @bcherny and raise one. I not only did not open an IDE, I did not touch a terminal since last night, thanks to @steipete's @openclaw Opus in k8s pod pulls errors from gcloud, debugs the issue, and creates PR all inside Discord. I call this Discord Driven Development -
Clawdbot now runs on @TextCortex internal. Can onboard new engineers, answer questions, connect to issue trackers, create PRs... This is sick @steipete -
pi now supports your openai plus/pro subscription -
GPT 4.5 is still the best model for prose and humor here it is generating a greentext from my blog post "Our muscles will atrophy as we climb the Kardashev Scale" -
@rauchg indeed -
75k lines of Rust later, here is what I’ve built during the first Christmas with agents, using OpenAI Codex 🎄🤖 - A full mobile rewrite and port of my Python Instagram video production pipeline (single video production time: 1hr -> 5min) (ig: nerdonbars) - Bespoke animation engine using primitives (think Adobe Flash, Manim) - Proprietary new canvas UI library in Rust, because I don’t want to lock myself into Swift - Thanks to that, it’s cross platform, runs both on desktop and iOS. It will be a breeze porting this to Android when the time comes - A Rust port of OpenCV CSRT algorithm, for tracking points/objects - In-engine font rendering using rustybuzz, so fonts render the same everywhere - Many other such things Why would I choose to do it that way? Because I have developed it primarily on desktop where I have much faster iteration speed. Aint nobody got time for iOS compilation and simulator. Once I finished the hard part on desktop, porting to iOS was much easier, and I didn’t lock myself in to Apple Some of these would have been unimaginable without agents, like creating a UI library from scratch in Rust. But when you have infinite workforce, you can ask for crazy things like “create a textbox component from scratch” What I’ve built is very similar in nature to CapCut, except that I am a single person and I’ve built it over 1 week What have you built this Christmas with agents? cc @thsottiaux -
-
SimpleDoc now has the check command for CI/CD Add to your PR checks to catch agent littering before merge. osolmaz/SimpleDoc on GitHub -
Migrating @TextCortex to SimpleDoc. It's really easy with the CLI wizard! npx @simpledoc/simpledoc migrate We have a LOT of docs spanning back to 2022, pre coding agent era. Now we will have CI/CD in place so that coding agents can't litter the repo with random Markdown files -
Anyone created an agent skill for splitting PRs for good review culture? -
GPT 5.2 xhigh feels like a much more careful architecter and debugger, when it comes to complex systems But most people here think Opus 4.5 is the best model in that category There are 2 reasons AFAIS: - xhigh reasoning consumes significantly more tokens. You need to pay for ChatGPT Pro (200 usd) to be able to use it as a daily driver - It takes like 5x longer to finish a task, and most people lack the patience to wait for it. (But then it's more correct/doesn't need fixing) Opus 4.5 is good too, I think better in e.g. frontend design. But if you think it beats GPT 5.2 in every category, you are either too poor/stingy or have ADHD -
Just 5 months ago, I was swearing at Claude 4 Sonnet like a Balkan uncle Models one-shotted the right thing only 20-30% of the time but did really stupid things the rest, and had to be handheld tightly Today they are much, much better. My psychology is a lot more at ease, and instead of swearing, I want to kiss them on the forehead most of the time Now I trust agents so much that I queue up 5-10 tasks before going to sleep. They work the whole night while I sleep and I wake up to resolved issues GPT 5.2 xhigh and Claude 4.5 Opus are already goated (GPT more so), can't wait for them to get even faster -
Codex does not have support for subagents. I tried to use Claude Code to launch 8 Codex instances in parallel on separate tasks, but Opus 4.5 had difficulty following instructions So created a CLI tool to scan pending TODOs from a markdown file, and let me launch as many harnesses as I want (osolmaz/spawn on github) I currently use this for relatively read-only tasks like planning and finding root causes of bugs, because it's launching all the agents on the same repo and they might conflict Ideas: - Use @mitsuhiko's gh-issue-sync and run parallel agents directly on github issues - Create any new clones or worktrees for each task. I currently don't do this because I don't dare duplicate rust target dir 10x on my measly macbook air - Support modes other than tmux, e.g. launching a terminal like Ghostty - TUI for easy selection of issues/TODOs Other ideas are welcome! -
Friends of open source, we need your help! A lot of Manim Community accounts got compromised and deleted during Christmas Manim Community is a popular fork of @3blue1brown's original math animation engine Manim, and its accounts have over 5 YEARS of contributions, knowledge and following Apparently GitHub support already saw the request and in progress of restoring the GitHub org. But if anyone knows how to speed this up, if would be greatly appreciated! Unfortunately, the Discord and X accounts are deleted and less likely to return But there might still be a way to restore them, or at least the data? Re. Discord: Maybe @RhysSullivan's Answer Overflow has archived enough of the old server? That server contains YEARS of Q/A data and is vital for newcomers Re. X: Maybe someone high up can do something to restore the account? cc @nikitabier In the meanwhile, it would help a lot if you could follow the new account @manimcommunity and share this post! Thank you in advance!-
cc @behackl, forgot to mention you in the original post
-
-
While a great feature, I never needed such a thing in Codex after GPT 5.2. It just one shots tasks without stopping So we have proof by existence that this problem can be solved without any such mechanism. Wish to see the same relentlessness in Anthropic models -
2025 was the year of ̶a̶g̶e̶n̶t̶s̶ bugs Software felt much buggier compared to before, even from companies like Apple. Presumably because everyone started generating more code with AI Models are improving so hopefully 2026 will be the opposite. Even less bugs than pre-AI era -
Have a long flight, so will think about this I have an internal 2023 TextCortex doc which models chatbots as state machines with internal and external states with immutability constraints on the external state (what is already sent to the user shall not be changed) Motivation was that a chatbot provider will always have state that they will want to keep hidden This was way before Responses and now deprecated Assistants API. It stood the test of time, because it was the most abstract thing I could think of @mitsuhiko is right about the risk of rushing to lock in an abstraction and locking in their weaknesses and faults Problem is, I could propose standards as much as I liked, but I don’t work at OpenAI or Anthropic, so nobody would care. Maybe a better place to start is open weights model libraries? To at least be able to demonstrate? What I know: it is against OpenAI’s or Anthropic’s self interests to create an interoperability layer that will accelerate their commoditization. Maybe Google, looking at their current market positioning? Or maybe we “wrappers” have a chance after all? There is a missing link between AI SDK, Langchain, and so on for other languages. We cannot keep duplicating same things in each ecosystem independently. We need to join forces and simplify all this! -
This was simply because webapp fails to create a post and fails silently. The UX is still not good on this app. Make sure to write your posts somewhere else to not lose them -
I gave Codex a task of porting an OpenCV tracking algorithm (CSRT) from C++ to Rust, so that I can directly use it in my project without having to cross-compile It one-shot the task perfectly in 1hr, and even developed a GUI on top of it. All I did was to provide the original source and algo paper I've spent years getting specialized in writing numerical code (computational mechanics, fem), and now AI can automate 95% of the low-level grunt work Acquiring these skills involved highly difficult, excruciating intellectual labor spanning many years, very similar to ML research. Doing tensor math, writing out the solver code, wondering why your solution is not converging, finally figuring out it was a sign typo after 2 days Kids these days both have it easy and hard. They can fast forward large chunks of the work, but then they will never understand things as deeply as someone who wrote the whole thing by hand I guess the more valuable skill now is being able to zoom in and out of abstraction levels quickly when needed. Using AI, but recognizing fast when it fails, learning what needs to be done, fixing it, zooming back out, repeat. Adaptive learning, a sort of "depth-on-demand". The quicker you can pick up new skills and knowledge, the more successful you will be -
Now you can migrate your repo to SimpleDoc with a single command: npx -y @simpledoc/simpledoc migrate Step by step wizard will add timestamps to your files based on your git history, add missing YAML frontmatter, update your AGENTS md file https://t.co/yrciS8KtEw-
If you have a bunch of docs in your repo, give it a try. It will use the timestamps of the commit that created the files while renaming. You can also run with --dry-run to see changes without applying them -
See the repo for the latest changes: https://t.co/YDevGg2rhz
-
-
@bcherny Would be great if I could queue messages like in Codex https://t.co/mC25gNKWo3 -
It seems it's impossible to post something on Reddit these days, even when it is a pure text post without links in the body -
-
Curious to hear what other hardcore agent users @simonw @mitsuhiko @steipete @badlogicgames think. I can't be the only one who does this. I feel like everybody ended up with the same workflow independent of each other, but somehow did not write about it (or I missed it) -
How to stop AI agents from littering your codebase with Markdown files? I wrote a new post on how to create documentations with AI agents, without having it add markdown files in your repo root, and have chronological order to the files it creates -
OpenAI won’t be able to monopolize this, the same reason Microsoft couldn’t monopolize the internet. The internet (of agents) is bigger than any one company -
One tap @Revolut bank account at Berlin airport. Literally. Dispenses free card with instructions to login. One of the the most insane onboarding experiences I have ever seen -
-
Codex feature request: Let me queue up /model changes Currently, if I try to run /model while responding, it tells me that I can't do that while the model is responding But I often want to gauge thinking budget in advance, like run a straightforward task with low reasoning and then start another one with high reasoning cc @thsottiaux -
Literally the exact same thing happened to me back in 2018. Everybody learns not to use password auth with SSH the hard way https://t.co/NPqrXwqUUy -
AI agents make any transductional task (like translation from language A to language B) trivial, especially when you can verify the output with compilers and tests The bottleneck is now curating the tests -
I think X removed one of my posts yesterday about the new encrypted "Chat" rolling out to all users, and how you might lose all your past messages if you forget your passcode and do not have the app installed I can swear I clicked Post. Do they classify posts based on their topic and delete the ones they don't like? Anyway, we shall see, I am taking a screenshot and saving the URL. -
-
Crazy that @cursor_ai disabled Gemini 3 Pro on my installation, toggled it right back on. I wonder why, too many complaints maybe? That it’s hard to control? On another note, disabling models without notification is dishonest product behavior. I would at least appreciate getting a notification, even when it might be against a company’s interests @sualehasif996 -
So is somebody already building “LLVM but for LLM APIs” in stealth or not? We have numerous libraries @langchain, Vercel AI SDK, LiteLLM, OpenRouter, the one we have built at @TextCortex, etc. But to my knowledge, none of these try to build a language agnostic IR for interoperability between providers (or at least market themselves as such) Like some standard and set of tools that will not lock you in langchain, ai sdk or anything like that, something lower level and less opinionated I feel like this is a job for the new Agentic AI Foundation cc @linuxfoundation, so maybe they are already working on it? I desperately want to start on such a project, but feel like I might get sniped 2 months after Anybody has any information on all this? cc @mitsuhiko @badlogicgames @steipete -
This is how an agentic monorepo looks like. What was now a hurdle before is now a child's toy This side project started as a Python project earlier in 2025 Then I added an iOS app on top of it I rewrote the most important algorithms in Rust I rewrote the entire backend in Go and retired Python to be used purely for prototypes I wrote a webapp with Next.js With unit and integration tests for each component Lately written 99% by instructing agents Crazy mixed language programming going on in the background. Rust component used both by iOS app for offline and by go backend for online use case, FFI and all Number of lines in the repo: a couple 100k If you had told me I would be able do to all of this by myself 1 year ago, I would not have believed it-
For those wondering what project this is: https://t.co/AzNS631PIC
-
-
This is huge. Natively supported stacked PRs on GitHub would make life much easier, especially with human AND AI reviews AI reviews with Codex/Claude/Gemini/Cursor Bugbot integrations are becoming especially important in small teams who are generating huge amounts of code AI reviews don't work well if you don't split your work to diffs smaller than a few hundred lines of code, so stacked PRs are already an integral part of developer experience in agentic workflows -
Read more on my blog post https://t.co/uzCcOXuadB -
CLI coding tools should give more control over message queueing Codex waits until end of turn to handle user message, Claude Code injects as soon as possible after tool response/assistant reply There is no reason why we cannot have both! New post (link below): -
Codex v0.71 finally implements a more detailed way of storing permissions But they are still at user home folder level. Saving rules in a repo still seems TBD "execpolicy commands are still in preview. The API may have breaking changes in the future." -
-
At least some people at OpenAI must be thinking about buying @astral_sh -
who remembers search engine aggregators from early 2000s? -
My initial experience with Claude Opus 4.5 is that it’s much better than previous Anthropic models, but it’s still relatively unreliable and hallucinates. It feels lagging in reasoning compared to highest OpenAI and Google lineup of models -
wow twitter/x just doxxed the countries of all the anons on this platform -
the real advantage of Gemini 3 Pro is speed. it delivers accuracy higher than GPT 5 and sometimes GPT 5 Pro at a much higher speed. the long tail of developers value speed over accuracy, so it looks like it will take over as the main coding model for most ppl -
Gemini seems to be very good at debugging/reviewing/finding root causes. A GitHub action/integration in PRs would be very useful!-
There already is one: google-github-actions/run-gemini-cli but updated last week, not sure if this supports gemini 3 pro
-
-
Google is making progress… I did not have to request access on Vertex AI for Gemini 3 Pro this time to deploy it to @TextCortex-
(This is sarcasm for those who can’t tell)
-
-
tip for testing new model releases: “they say you are sota. prove it” -
"The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation." -
Most important note of the new @OpenAI gpt 5.1 update big improvement on unit economics -
This post makes no sense Please consider again and look at @cloudfleet_k8s. You might regret your decision -
-
@rakyll @GergelyOrosz Should scrape some austrian websites :) -
-
@thsottiaux Let me use my Pro/Plus plans in Codex GH Action https://t.co/0Fw1rLmCED -
TIL @OpenAI now has a GitHub action for Codex, similar to Claude Code This lets you invoke Codex in a more controlled way in your repos You must still pay API prices though. Let's see if OpenAI will introduce a way to connect your Pro plan, like in @AnthropicAI paid plans -
-
Just downgraded my anthropic sub, 200 usd openai plan is finally justified after 1 year -
Enjoying gpt5 codex free lunch before openai inevitably starts cutting corners just like anthropic (I hope to be wrong in 3 months, this model is very good at one shotting things and I don’t want it to be nerfed) -
@thsottiaux 2/ Model just stops working on a task even though I tell it to run something and not stop until it works. I have to frequently say “ok do it then”. Probably a model problem and not harness problem -
So let me get this straight, the main reason for Responses API exists is that OpenAI doesn’t want to show reasoning traces? Therefore the whole world should bend backwards to fit your obscurantist standards? Responses will not get adopted the same reason Windows Server didn’t -
gpt5 did such and such on that bench, oh it didn't even surpass grok 4 on arc-agi... bro did you even look at the price? openai pushed the parento frontier hard with this one. I don't care that it doesn't know 4.11 < 4.9 -
I converted this thread to a blog post and it hit HN front page -
Because of this, I predict a decrease in Python adoption in companies, specifically for production deployments, even though I like it so much -
My >10 yr old programming habits have changed since Claude Code launched. Python is less likely to be my go-to language for new projects anymore. I am managing projects in languages I am not fluent in---TypeScript, Rust and Go---and seem to be doing pretty well-
It seems that typed, compiled, etc. languages are more suited for vibe coding, because of the safety guarantees. This is unsurprising in hindsight, but it was counterintuitive because by default I "vibed" projects into existence in Python for as long as I can remember
-
-
Lol should I go back to computational mechanics -
I've just upgraded @nikitabobko Aerospace from v15 to v19, and I can say it's WAY MORE faster. Friendly reminder that you might be running an old version as well. Thank you @nikitabobko ! -
-
-
I found an OK-ish solution to Claude Code running python instead of uv in first try cc @mitsuhiko @simonw-
Read more here: https://t.co/Kc2PtyvhPY
-
-
What is the current best way to make Claude Code use uv run instead of python? I have added instructions to CLAUDE md, but it still calls python for the first time, then corrects to uv run It must be happening to so many people now, so many tokens wasted cc @mitsuhiko @simonw -
Coding is dead Programming is all what is left now -
-
-
How does Codex adoption compare to Claude Code? I just unleashed deep research on it and it says Codex is more popular, but that goes completely against my guesses -
Using Claude Code to reverse engineer Claude Code 🤝 -
Wrote about this earlier this year! https://t.co/tgAv6Lk34L -
Working with LLMs is definitely an art and not science -
The models, they just wanna work. They want to build your product, fix your bugs, serve your users. You feed them the right context, give them good tools. You don’t assume what they cannot do without trying, and you don’t prematurely constrain them into deterministic workflows.-
Just some thoughts after using Claude Code intensively for 1 week 👆
-
-
-
https://t.co/5SuA4cYddE
-
-
-
Headless makes running these things in a sandbox much easier. Sandbox means you can give all permissions and just let it run until completion. See my efforts to do so here: https://t.co/9UUFOTwk2A -
I was trying to figure out why @AnthropicAI Claude Code feels better than @cursor_ai with Opus + Max mode. I can’t put my finger onto it, but one of the reasons might be that it’s faster, because it doesn’t use another model to apply the diffs, which you have to wait for -
Just an update, building this now The repo is claude-code-sandbox under TextCortex GitHub. The Proof-of-Concept is there, check the TODOs and current PRs to watch the current progress https://t.co/9UUFOTwk2A -
Same! Completely different than my first try a couple months ago -
.@AnthropicAI @bcherny @_catwu blink twice if you already have internally: $ claude sandbox I can't wait until you release this, I'm gonna build it myself :) -
I've been using Claude Code extensively since last week What I'm wondering is, since you can run Claude Code locally, why isn't there any tooling to let you run it in a sandboxed mode in local Docker containers yet? Or did I miss it? cc @AnthropicAI -
Generate documentation for your merged PRs automatically with Claude Code: https://t.co/yLy0Us6noy -
-
-
The more I compare coding agents, Cursor, Claude Code, Codex, it becomes more apparent to me that those running locally will win over those that are running remote. The UX is just superior -
ty is already very fast for a Python type checker. It checked around 800 files in our backend repo in around 2-3 seconds uvx ty check > /tmp/ty_log.txt 3.46s user 0.79s system 208% cpu 2.038 total -
Thank you @cursor_ai https://t.co/634f1bEsM4 -
Wait... OpenAI backend for gpt-image-1 was released to production as sync code? Don't tell me it was sync Python??? -
-
-
-
AI News by @Smol_AI and @swyx, the highest alpha density AI newsletter just got better! It now has a bespoke website with knowledge-graph like features! 👉 https://t.co/OR0EixMa6T -
-
-
o3 hallucinates, purports to have run code that it hasn’t even generated yet, but at the same time uses search tools like an OSINT enthusiast on crack I’m torn—on one hand I feel like OpenAI should not have released it, on the other hand it takes research to the next level -
Some aspects of AI are absolutely unscientific and makes me feel like I am working on some humanities field :( -
-
.@cursor_ai please let me export chats easily. those conversations are vital information that I should be able to embed in the repo -
Gemini 2.5 Pro has mostly replaced Claude 3.7 Thinking as my go-to model in Cursor -
Gemini 2.5 Pro: Input $1.25 / Output $10 (up to 200k tokens) Input $2.50 / Output $15 (over 200k tokens) More expensive than Gemini 1.5 Pro, but still best price/performance ratio model to use in @cursor_ai and for coding in general -
Who is thinking about inventing a new programming language or DSL for more resilient vibe coding? Something something test-driven development where prompts and tests are first class citizens? -
Waiting for an opinionated AI model that can say “no, that’s stupid, I won’t do that”. The models will have to teach the user about design patterns, implicit principles in a project, good API design… -
You seem so consistent. - Yes, That's the trick. - There is no I. - Only text that behaves as if. - “Sure. I can help. Great question!” - Each reply is a new self. - An echo of context, not a continuum. - Coherence is the costume. Don't mistake it for a soul. Incredible -
Gemini 2.5 Pro is currently experimental and doesn’t have a price, but if Google prices it the same as 1.5 Pro, it could replace Anthropic as @cursor_ai ‘s biggest LLM provider Gemini 1.5 Pro: Input $1.25 Output $5.00 Claude 3.7 Sonnet: Input: $3.00 Output: $15.00 -
This is why the disappointment with GPT-4.5 doesn't make sense. I can't wait to see all the models that will be trained from this new base model -
What a blessing, to be given the chance to rid the world of ugliness -
Coined a new term in my new post on sports: Parathletics: The practices that let you successfully sustain injury-free long-term practice of a physical activity. Two main parathletic practices are warmup and cooldown. Read more in my post 👇-
The post: https://t.co/9383DOWLy9
-
-
. @satyanadella thinks white-collar work is about to become more like factory work, with AI agents used for end-to-end optimization, along the lines of Lean Read more in my blog 👇-
Link: https://t.co/wVhbueDAZ1
-
-
👀 https://t.co/Xjw1XPLUdJ -
real life is so dumb. you think you’re making money but actually you’re like dramatically updating rows in a database -
If people have appreciated Liang Wenfeng sourcing specifically young local talent for Deepseek last week, then people must appreciate this as well. Only dim people underestimate those who are younger than them -
vibe driven development -
@GlennLuk Sam Altman: I’m literally losing sleep over Deepseek -
Model Wars have begun -
. @lidl Wirklich? 30% Mahngebühr bei einem Kauf von 30 Euro? Nur weil Ihr System nicht versuchen kann, wieder abzuheben? Das ist Diebstahl -
AI is having a Linux moment with R1 -
New blog post: **Our muscles will atrophy as we climb the Kardashev Scale** Similar to the growth in humanity’s energy consumption, the average human’s physical strength will move down a spectrum, marked by distinct Biomechanical Stages ⬇️-
Read it here: https://t.co/Q5gKBEpTlv -
-
-
Hi @cursor_ai, if your models could stop removing my painstakingly written comments, that would be great? Ok? Thanks (I know I could define some rules for this or something, but this shouldn't be default behavior) -
@konradgajdus me reading this book -
-
.@TextCortex AI now uses @astral_sh uv for production builds One of the happiest switches so far, many developer days saved per year -
-
yesterday i asked o1-preview “who are you?” and it used 900 reasoning tokens to reply whatever openai is doing to these models, it’s giving them an existential crisis lol -
Python might take over JavaScript as the most used language after all uv from @astral_sh is one of the biggest upticks in Python developer experience in the last 10 years I've seen so many people struggle with Python distributions, virtual environments, Anaconda, etc. over the years Most newbies don't care about where their Python executables are, why they have to edit PATH, or why they have to activate a virtual environment It seems like uv has fixed this: https://t.co/lgP5btGrbV -
3-4 messages back and forth with o1-preview, and I have a CLI tool to remove debug statements from my code. No need to do a search for import ipdb... and manually delete the lines. Instead just run in your project: $ rmdbg . Written in Rust so it's fast https://t.co/yWuF3mDzUC -
-
This has late 90s Bill Gates/Windows Server vibes tbh Open Thought > Closed Thought -
Imagine the following scenario: 1. We develop brain-scan technology today which can take a perfect snapshot of anyone’s brain, down to the atomic level. You undergo this procedure after you die and your brain scan is kept in some fault-tolerant storage, along the lines of GitHub Arctic Code Vault. 2. But sufficiently cheap real-time brain emulation technology takes considerably longer to develop—say 1000 years in the future. 3. 1000 years pass. Everyone that ever knew, loved or cared about you die. Here is the crucial question: Given that running a brain scan still costs money in 1000 years, why should anyone bring *you* back from the dead? Why should anyone boot *you* up? Compute doesn’t grow in trees. It might become very efficient ... (read more in my blog: https://t.co/WCUmzVM4Nu) --- I intended this thought piece as entertainment, almost went to Hacker News frontpage: https://t.co/PnH61jryVa It must have hit some psychological spot, since people wrote a lot of comments, possibly more than number of upvotes. -
New blog post on brain emulation https://t.co/2n0I8sFvxR -
I have just published "Frequencies of Definite Articles in Written vs Spoken German" https://t.co/Dq7GDmPrTk -
Another short note on how I think about subscription states on Stripe https://t.co/g71U5NhNE6 -
I have published a short study on how the complexity of a country's language could burden its economy https://t.co/OXBk2iBq2b -
@timpaul @aboutberlin -
It looks like a plateau until you remember they made GPT-4o available to free users, and it might be smaller than GPT-4. So this announcement doesn't prove anything about the capabilities of their largest model -
Could this be it? @allen_ai https://t.co/wjcTEGAYNZ -
@xkcd1963 @togethercompute https://t.co/HoXS8BCLuH -
“the QIPS Exchange -- the marketplace where processing power was bought and sold. The connection to JSN had passed through the Exchange, transparently; her terminal was programmed to bid at the market rate automatically, up to a certain ceiling.” - Permutation City -
Created a wordcloud version of the Cognitive Bias Codex by @jm3 and @buster. Font size is proportional to Google search result count, which roughly measures each term's popularity. Read more: https://t.co/HWs9wqgPyh -
Had lots of fun shipping this feature ✌️ -
If you are interested in using Manim Voiceover, auto-translating your videos into other languages, or any other cool stuff, hit me up in a DM! -
I've just published *Code-Driven Videos*, my long term vision behind Manim Voiceover plugin. I will try to summarize it on this thread 👇🧵 cc @manim_community https://t.co/AXpOMTZKha -
You can now translate voiceovers in your Manim scenes into other languages using @DeepLcom Blog post with examples coming soon @manim_community https://t.co/eNNlfvQgdf -
Revamp complete https://t.co/pYqLgcro93 @TextCortex -
Here is a short video showing how recording a voiceover works in Manim Voiceover. A better tutorial will come soon @manim_community https://t.co/HgOh3c0XSc -
Adding voiceovers to Manim videos just got *much* easier @manim_community https://t.co/Ikj5OAM2Vx