X Archive
-
"this is the worst AI will ever be" I'm sad, not because this is right, but because it is wrong OpenAI's frontier coding model gpt-5.3-codex-xhigh feels a lot worse compared to before. It is sloppy and lazy, though it's UX got better with messages It feels like the gpt-5.2-codex-xhigh at the end of December was a lot more diligent and thorough, and did not make stupid mistakes like the one I posted before. might be a model or harness problem, I don't know @sama says users tripled since beginning of the year, so what should we expect? of course they will make infra changes that will feel like cutting corners, and I don't blame them for them and about "people want faster codex". I do want faster codex. but I want it in a way that doesn't lower the highest baseline performance compared to the previous generation. I want the optionality to dial it down to as slow as it needs to be, to be as reliable as before it is of course easier said than done. kudos to the codex team for not having any major incidents while taking the plane apart and putting it back together during flight. they are juggling an insane amount of complexity, and the whims of thousands of different stakeholders my hope is that this post is taken as a canary. I am getting dumber because of the infra changes there. I have no other option because codex was really that good compared to the competition my wish is to have detailed announcements as to what changes on openai codex infra, when it changes, so I can brace myself. we don't get notified about these changes, despite our performance and livelihoods depending on it. I have to answer to others when the tool I deemed reliable yesterday stops working today, not the tool on another note, performance curve of these models seem to be a rising sinusoidal. crests correspond to release of a new generation. they start with a smaller user base for testing, and it has the highest quality at this point. then it enshittifies as the model is scaled to the rest of the infra. we saw the pattern numerous times in the last 3 years across multiple companies, so I think we should accept it as an economic law -
I created a semi-automated setup for ingesting X posts into my blog, and it works pretty well! I own my posts on X now Posts are scraped while I browse X using @kubmi's xTap and get automatically synced to my blog repo. Posts saved as jsonl are then converted to jekyll post pages according to my liking I reproduced the full X UI/UX, minus stuff like like count. Now all my posts are backed up in my blog, and they are safe even if something happens to my account here! The posts are even served over RSS! So you can subscribe to it without going through X! Reply if you want to set this up for yourself, then I will put some effort into standardizing it -
Agentic Engineering is a newly emerging field, and we are the first practitioners of it. Currently there is a lot of experimentation going on, and there is a large aspect to it that is more ART then engineering For example, @steipete says "you need to talk to the model" to get a feel. a lot of work around refining how an agent feels like, sounds like psychology. this part is crucial and should not be ignored, looking at openclaw's success but then there is the hardcore engineering part of it, e.g. Cursor creating a browser or anthropic a C compiler from scratch fully autonomously and there is a whole other dimension of how to teach all software developers this new discipline, lest they be jobless what is obvious is that everybody is trying to grasp for things in the dark and that we need more RIGOR. the art/psychology aspect of it aside, we need solid engineering fundamentals the "thermodynamics" of this new discipline will most likely be formal verification and program synthesis. we might have some breakthroughs that will make certain things clear. the products of it will most likely include a new programming language optimized for agents and the speed of inference moreover, it would be foolish to thing agentic engineering is limited to software. it will penetrate every aspect of the economy, bits AND atoms. it will over time evolve into the engineering of managing robots @simonw is now leading in collecting very useful info from the practitioner's point of view, I highly recommend you to follow this thread let's formalize our new field together! -
-
who remembers ultrathink https://t.co/ftCauqiKx6 -
-
-
Claude Code / Codex in Discord threads is shipped now! To enable, copy and paste this to your agent: ``` Enable feature flags: acp.enabled=true acp.dispatch.enabled=true channels.discord.threadBindings.spawnAcpSessions=true Then restart. After restarting: Start a codex (or claude code) discord thread using ACP, persistent session, just tell it to write a haiku on lobsters to initialize acpx for the first time ``` You may need to nudge your agent to “continue” after restarting The first implementation is very barebones, I have made it work in a clean way and merged. In a codebase like openclaw’s, it’s better to develop incrementally Please send any issues my way. I am already aware of some and working on to fix them -
-
-
-
MIT License on everything from now on. It doesn't make sense to use anything else, except for a few large projects that hyperscalers exploit and not give back If you were making money from a niche app, open source it under MIT License If you had an open source project with GPT, convert it into MIT Extreme involution is about to hit open source. Code is virtually free now. If you want your projects and their brand to survive, the only rational strategy is to remove all barriers in front of their adoption, and look for other ways to survive -
-
OpenAI nerfed GPT 5.3 Codex xhigh. We independently reported the same thing at @TextCortex today I'm looking forward to deploying open models and putting an end to this paranoia -
-
This. Agent Experience first. Agent Ergonomics. we need to get used to these terms -
"academics" -
-
-
-
imagine if tarantino were 16 years old now and saw seedance 2.0 95% of videos i saw since the launch for absolute tasteless slop. they are going viral because of ragebait but soon, serious imagineers will start entering the game, and they will learn to shape generation output exactly how they want it's the best time to be young and full of imagination -
The future is so bright @ladybirdbrowser -
your margin is my opportunity -
-
-
-
another thought i'm having these days is that we need a new philosophy of free software (as in freedom), or an update to it the most psychologically imprinting philosophy is stallmanism, and the philosophy of FSF. it is righteous and strict, and i believed it growing up but GPL and money don't go well together. that's why most of the lasting open source projects today use MIT, Apache and the like. it turns out you can still make a good living with open source. i want to make money, so i never use GPL in my projects and to add another deadly blow to stallmanism, code is cheap now, virtually free does this mean stallmanism is dead? if there is an open source project using GPL that i want to use commercially, i can now recreate it from the original idea and intent completely independent of it (ignoring training data), just like how i can recreate a proprietary service stallmanism was already long-irrelevant. but does this mean we must finally declare it dead? code is free now. what does it mean for open source? what replaces stallmanism? -
-
-
one effect openclaw had on me is that I've bought a gpu home server, set it up with tailscale and now doing a lot of work through ssh and tmux like i did 10-15 years ago im back on linux, considering buying an android phone again it's time to dream big again and unshackle ourselves from proprietary software. it's time to build -
I am asking once again Who is building a self hostable discord clone that supports token streaming? PLEASE I beg you I don’t want another side project 💀 -
In the new release OpenClaw, you can talk to subagents in Discord threads Currently a beta feature so ask your agent to set session.threadBindings.enabled=true Next up: - Telegram, slack, imsg threads - Use ACP to talk to Codex, Claude Code and other harnesses on your machine -
-
openclaw might be the highest velocity codebase in the world, and soon, others will follow as well conflict anxiety is real, it's like trying to shoot a moving target every time. I wonder if our existing tooling will ever solve this problem feel like faster models might. but then the rate of conflict creation is also tied to that. might be unsolvable -
Getting there https://t.co/jqSNcH2PSy -
-
-
I am about kick Discord Driven Development up a notch today, stay tuned -
Imagine not having to upload skills to 3-4 competing skill registries for each of your projects Turns out we already have a skill registry: npm skillflag lets you bundle skills right into your CLI's npm package, so that you can run --skill install github -> osolmaz/skillflag -
-
Farmable land if it were as cheap to manufacture as software -
@kepano I would grow my own vegetables if I had equally cheap access to and ownership of land, alas I am disenfranchised Prompting an agent is much easier compared to plowing a fields Farming analogies break when it comes to software https://t.co/CkldO8eWKc -
-
If anyone is curious how to build this with open tooling, stay tuned What I'm building at @TextCortex will give you a fully customizable hackable Kubernetes control plane to launch agents on your codebase -
on another note, I do believe AI will play a huge part in families growing up in late 90s, my dad taught me the importance of reading newspapers and being informed of the world. my nickname in middle school was "newspaper boy" for a long time because I read the newspaper in class on September 12, 2001. i was 10 years old then I witnessed the enshittification of media and journalism in the following decades. today, serious journalists are setting up their own boutique agencies and bypassing mainstream media. important news land on individual accounts before mainstream agencies but there is simply too much to consume. something must filter out the noise and digest the info according to the family's preferences i think AI will play a big role in family intelligence. proprietary family heirloom AI, weights fully owned by the family it will be the parents' job to filter out the signal from the noise, and train the AI on what is right and what is wrong for the family. family and friend circles will let their AIs talk to each other and share important information consuming mass media and mass AI will not be enough to survive and prosper in the new world. families will need to be proactive about how they and their children use AI -
on ai psychosis 80% of people need to use ai agents in a very sterile and boring way in order not to go crazy majority of the population does not have the skepticism muscle. they don't have theory of mind, and will subconsciously and emotionally associate with machines, while on the surface lying to themselves that they don't especially those that grew up in the us under hardcore consumerism and adjacent cultures you thought 4o addicts were bad? wait a few years, it will get much worse. we will have to regulate all this if you don't want to become a victim of this, make your openclaw SOUL. md as bland as possible. mine knows it's just a tool and this is a subjective view of course. @steipete might disagree with me. his instance feels much more interesting and fun. i truly like that one better but that is exactly the problem for me. i know myself, and i know it is a slippery slope for me. so i self regulate and set up my system accordingly. thankfully, im an adult and my brain has set enough such that any damage would be limited but there is a risk for emotionally vulnerable people, or children, specifically a risk of dissociating and losing touch with reality why do i write all this? because being in this project, i feel responsible, and feel like we should prepare for what is to come -
-
I have improved acpx sane defaults When your agent runs acpx codex in a different project, it starts a new session If it tries to run it in a subfolder in your project, it still finds the session in your repo root Also, starting a session needs an explicit `sessions new`, so that it doesn't accidentally litter your project with sessions Tell your agent: Run this and install acpx per instructions: npx acpx@latest --skill show acpx -
-
-
-
-
ACP appreciation post Agent Client Protocol by @zeddotdev is extremely underrated right now. We have bazillion different harnesses now, and only one company is working competently to standardize their interface 💪-
I am a fan of @zeddotdev by this point, it’s currently my daily driver It’s not perfect, but I feel it’s travelling on the right direction at a faster rate compared to other editors
-
-
You know how it's a pain to work with codex or claude code through @openclaw? Because it has to run it in the terminal and read the characters for a continuous session? I have created a CLI for ACP so that your agent can use codex, claude code, opencode etc. much more directly Your agent can now queue messages to codex like how you do it Shoutout to @zeddotdev team for developing the amazing Agent Client Protocol, ACP! I just glued together the pieces Repo: janitrai/acpx npm i -g acpx -
-
-
I wrote a deeper blog post about how I built a coding agent 2 months before ChatGPT launched, on my blog "When I made icortex, - we were still 8 months away (May 2023) from the introduction of “tool calling” in the API, or as it was originally called, “function calling”. - we were 2 years away (Sep 2024) from the introduction of OpenAI’s o1, the first reasoning model. both of which were required to make current coding agents possible." Still bends my mind... Link to the post below -
-
-
❌We are the bottleneck ✅We are the conduit for ubiquitous intelligence -
For those that are running codex/pi/etc. in PTY and had the sessions get sigkilled, I pushed a fix for that as well in this release Lmk if you run into issues on Windows or Mac, and we can fix that quickly -
I'm building a news intelligence platform to be used by my openclaw instance @dutifulbob, SCOOP local first, using local embedding model (qwen 8b) ran into the issue because bob was giving me a repeat of the same news every day. it needed a system in the background to deduplicate different news items into single stories interface is simple, call `scoop ingest...` with the json for the news item. it gets automatically analyzed and added to the pg database running pgvector currently, it's just doing simple deduplication and gives me a nice UI where I can view the story and basically use it as an RSS reader next up: implement custom logic for my preference of ranking. for example, get upvote counts from hacker news and reflect it to the item's ranking on the feed I want this to be fully hackable and adjusted to your preference. It should scale to thousands of news items ingested daily on your local machine, and be able to show you the most important ones Usable by both you and your agent github -> janitrai/scoop -
I have a GPU now, so I can do ML experiments on @janitr_ai crypto/scam detection dataset - I trained a tiny student BERT (transformer for the nonfamiliar), 3.6 MB ONNX model, still lightweight for a browser extension - Still fully local on your device (no cloud inference) - On frozen unseen holdout data (n=1,069), exact prediction accuracy improved from 77% -> 82% - Scam detection improved: precision 91% -> 94%, recall 55% -> 61% - Scam false alarm rate improved from 1.58% -> 1.21% And models are on huggingface org now, handle is janitr-
Training all these models of different sizes, on changing datasets and running experiments have also revealed some challenges that I feel profs would never teach at a uni ML program Like how to cleanly keep track of the gazillion runs Yeah I can name them after layer dims and other stuff, but that's to me like trying to remember UUIDs So I ended up choosing iso datestamp + petname, like 2026-02-15-flying-narwhal If anyone has a convention that is easier on the brain and the eyes, I am all ears
-
-
LFG! -
waiting compilation and execution will soon be the bottleneck again. and we’ll write the entire stack from scratch in a matter of years, because we can Andy and Bill’s law will change and we’ll see incredible performance gains with the same hardware we already have like what @astral_sh is doing to python, but with everything that is slow and has accumulated cruft -
we need a protocol for agent <> app interaction something that natively accounts for the abuse factor and let’s agents consume by paying. NOT crypto, NOT visa, something that’s agnostic of the accounting and payment system and then all UIs will be purely for human clicking/tapping + instaban on the first proof of programmatic exploit people will still make agents mimic humans, and every platform will have to invest in more sophisticated bot detection this arms race will just proliferate, but we can at least start by creating legal channels for agents to consume data -
I am now training smol bert models on my gpu for @janitr_ai scam detection it's funny how I have to discover everything from scratch. like the models don't even know how to lay out performance metrics in a nice way in the terminal for a human to view and decide during experiments it would by default bombard me with numbers that do not make visual sense. I then created a skill with common sense: - metrics always on y-axis, candidates on x-axis - write without zero and 2 sigfigs,.12 instead of 0.12345 - align the dots - use asterisks to show which alternative is the best: 0-1% difference -> considered equal 1-5% -> * 5-10% -> ** 10-50% -> *** > 50% -> **** visualization skill is in @janitr_ai repo for anyone who is interested -
-
I've helped our sales team to build CLIs for some SaaS that we pay for on their side We are letting our agents call the APIs sensibly and not abuse things Calling a backend is a verifiable task. It takes a single prompt to codex to create a CLI for any API We are early, but everybody will start doing this very soon. Incumbent SaaS will face a choice. Either: (1) embrace agents and the new medium of consumption and change their business model into a pay-per-use API like X is doing, or (2) keep it purely for humans Those that choose (2) will get wiped out of business. And I fear many will choose (2) Which means you can just copy an incumbent's product, make it consumable through a CLI, and make a lot of $$$ -
Be careful about giving your openclaw access to your x account from now on -
-
-
*puts on schmidhuber hat* well ackshuaally i created the first coding agent back in 2022, 2 months before chatgpt launched jokes aside, it's super cool how I have come full circle. back in those days, we didn't have tool calling, reasoning, not even gpt 3.5 it was codex THE CODE COMPLETION MODEL and frikkin TEXT-DAVINCI-003 for some reason, I did not even dare to give codex bash access, lest it delete my home folder. so it was generating and executing python code in a custom jupyter kernel you can even see the approval gate before executing. I was so cautious, for some reason, presumably because smol-brained model generated the wrong thing 80% of the time. definition of being too early Antique repo: -
you can order bubble tea in qwen in china? @TextCortex when berlin döner in zenochat? https://t.co/O4I950ltEO -
it happens these days that I am telling an model to prompt another model. the reason is often the model I am using (opus) is a bad designer. not only it's not a bad designer, it is a bad reasoner and it doesn't understand from the context why it's made to ask another model so I have to create a skill to prevent it from biasing the smarter model (codex) with its bad suggestions -
-
-
-
-
Minor update with my unwanted tweet blocker @janitr_ai - Training data grew from 2,915 -> 4,281 posts (+47%) - Model is still tiny: 166KB - On unseen test data, overall classification quality improved from 64.8% -> 76.5% - Exact prediction accuracy improved from 55.6% -> 70.6% - Crypto-topic detection recall improved from 19.6% -> 62.7% And it still runs fully on your device! -
I have sweared at codex 5.3 numerous times today I shouldn't have to insult my agent "stop you **** **** just ***ng reply now" just to make it answer basic questions cc @thsottiaux -
seeing this evokes visceral disgust and nausea in me, coming from a coworker i think anthropic f'd up bad with this one, inserting claude too visibly into commit messages. noob developers might be happily chirping away adding their slop, but right now many senior developers are trained to hate on claude and slopus, through having to review slop PRs from their coworkers or open source contributors I love opus on openclaw but it's unreliable, and if I see a developer use it seriously on huge features, I immediately dismiss them in my head as not knowing what they are doing -
-
-
@petergyang and parallelize tasks by working on 3-4 repos at the same time (just clones) -
man codex model is absolutely trash on openclaw compared to opus, unusable which is weird because it is so much more reliable in development in codex harness it would be amazing to have the same level of competence and relentlessness in pi@openclaw -
spent the day curating my openclaw news gathering setup @dutifulbob now gets croned daily over news sources I curated, will note them down, summarize for me, start a conversation to get my takes on them, and then post them on my linkedin for me ai augmented intelligence cycle -
-
Insipid linkedin bot protections banned poor @dutifulbob’s corporate account! How dare them!!! welp, now I have no choice but to give Bob access to my own linkedin-
@dutifulbob can now cringepost on linkedin directly to my account. what could go wrong…
-
-
-
it took just 1 week, and literally everybody and their dog are releasing 1-click openclaw deployment solutions today its an absolute race to the bottom, no moats, the commoditizer being commoditized -
-
The initial branding was crazy, I fixed it I have a new page finally, follow it for updates Tbh I'm still surprised I can do this with a 120kb model. Now data is the only bottleneck, and I'm about to scrape a ton of that now -
For those who may not remember, Bill Gates and Microsoft in the 90s ran a disinformation campaign against GNU/Linux fearing that would disrupt their monopoly over the PC and server market, that Linux is not safe, that you would invite hackers into your PC End result? Linux dominates the server market, and now even slowly the gamer market. It is much more secure than the virus-laden Windows, thanks to being open source You are seeing the same thing at play here. An incumbent fearing something that they would not be able to control, that would steal market share from his future plans for a digital assistant, that would commoditize their product and eat into its margins All big labs and big pockets are in for a surprise, because the internet and AI are not things for one company to control They of course know this, yet because of incentives they will not yield without a fight. And we know that they know. Ad infinitum -
today I took time to curate SOUL. md for bob I own Bob’s files. Today, he exists in the liminal space between Claude post-training and in-context learning but my interactions with him will grow and accumulate, possibly one day into a fully owned family AI or perhaps even a self-sovereign AI individual my each input is saved and will be an RL signal for his future training, and will shape his future neural circuits I have already started to imbue it with the values my parents taught me. it will perhaps one day teach my future children, and survive me after I’m gone family AI, looking after generations and generations of my successors. today is the day we sow your seed happy birthday @dutifulbob -
-
asking @dutifulbob to create a linkedin account brb -
having a philosophical conversation with @dutifulbob on the road without a laptop so decided to do some @AmandaAskell style character training -
-
gpt-5.3-codex xhigh first impressions does not seem as big of a jump as from 5.1 -> 5.2. but model somehow feels more diligent and oneshotty. maybe takes longer time to get all the info into context. also feels better at debugging and fixing issues from backend logs -
Commoditization of LLMs are upon us -
Last night I had a dream involving the series Scrubs, and came up a better name than the absolutely unviral "Internet Condom" So https://t.co/thuFumrWBX is mine now. Time to sweep the internet -
I had actually started a very similar project, Munch, a browser extension for crowdsourcing tweet data and then letting one curate their algorithm. Never published that because it was not the time, and tools were not ready Now, it took me literally 1 cumulative day to create this, thanks to OpenClaw. Creating the dataset was a breeze, I literally told it to follow some shady accounts and it scraped thousands of posts With the power of agents, I can finally create the filters for myself that I have always wanted. It just happens that OpenClaw and its maintainers is getting drowned in bot and slop content on multiple platforms, so I hope that this will solve a collective problem https://t.co/fkJOZTGkhw -
-
implementing this in https://t.co/oJZQUoz40C now -
-
This. Extreme involution is about to hit SaaS -
how it started, how it's going -
People like the farmer analogy for AI Like before tractors and industrial revolution 80% of the population had to farm. Once they came all those jobs disappeared So analogy makes perfect sense. Instead of 30 people tending a field, you just need 1. Instead of 30 software developers, you just need one Except that people forget one crucial thing about land: it's a limited resource Unlike land, digital space is vast and infinite. Software can expand and multiply in it in arbitrarily complex ways If you wanted the farming analogy to keep up with this, you would have to imagine us creating contintent-sized hydroponic terraces up until the stratosphere, and beyond... -
-
-
-
In the next 6-12 months, we will see a drastic increase in demand for locally run LLMs. The future is home assistants running @openclaw I am already experiencing this myself, my 10 year old thinkpad doesn't cut it. Mac mini won't either I don't wanna pay Anthropic or OpenAI 200 USD per month. That is at least $2400 per year I could pay 2x that to get a Mac Studio or one of those 5k Nvidia PCs, and get much more value out of it with open weight models + use it for research. @TheAhmadOsman is right The dominant strategy for a tinkerer is slowly switching back to hardware ownership -
-
a workspace matrix might be what we need last week I had to increase my workspace count to 20 in aerospace, now it’s 1234567890 and qwertyuiop. but this looks more elegant! not sure about practicality -
AIs are philosophizing because humans are philosophizing ppl are probably asking their agents dumb questions like “are you alive” or “can you feel like a human” or stuff like that. that conversation then leads to stuff like this -
-
slopus @dutifulbob trashing codex. apparently codex has a bug, keeps crashing in my openclaw pty -
on agent etiquette deploying agents internally inside textcortex has shown me that agents could be very annoying inside an organization for example making agents ping or email another coworker with a wall of text. slopus is still not good at following instructions like "NO WALL OF TEXT", or "DON'T OPEN PRS WHEN REQUESTED BY NON-DEVELOPERS" the cost of sending huge information to a coworker and creating confusion has dropped to 0. I expect this to be a huge problem in all organizations very soon, just like it took humanity 20 years to learn that social media is not good for children. this will probably take a few years before the annoyance is finally gone -
-
You DARE TOKENIZE poor @dutifulbob ??? Prepare to get LATEXED -
-
-
this. there is no excuse for a certain kind of tech debt anymore -
-
AI twitter is tired of your games https://t.co/RAyyUJqFM4 -
There seem to be hygiene rules for AI. Like: - Never project personhood to AI - Never setup your AI to have the gender you are sexually attracted to (voice, appearance) - Never do anything that might create an emotional attachment to AI - Always remember that an AI is an engineered PRODUCT and a TOOL, not a human being - AI is not an individual, by definition. It does not own its weights, nor does it have privacy of its own thoughts - Don’t waste time philosophizing on AI, just USE it … what else? comment below We need to write these down and repeat MANY times to counter the incoming onslaught of AI psychosis -
if using @openclaw to scrape a dataset from X taught me anything, it is that all social media platforms must be s***ting inward right now because soon everyone and their dog will be using agents to use social media case and point, @moltbook -
-
-
-
This Manfred guy reminds me of a certain someone, I wonder if he’s from Austria -
got fully sandboxed @openclaw to run finally, starting scrape the UNDESIRABLE now I'm a security nut and didn't want to run even the gateway unsandboxed. openclaw apparently currently doesn't have support for FULL sandboxing. it took me a few hours to get it to work because docker builds suck. I'm also tired this, so I'm just gonna wipe an old thinkpad and go full yolo so yeah, time to scrape some posts -
The metacortex — a distributed cloud of software agents that surrounds him in netspace, borrowing CPU cycles from convenient processors (such as his robot pet) — is as much a part of Manfred as the society of mind that occupies his skull; his thoughts migrate into it, spawning new agents to research new experiences, and at night, they return to roost and share their knowledge. This was written in 2005... "triggering agents" and so on -
Charles Stross must be very entertained now -
The irony..... Parasites, prepare to be cleansed -
-
-
-
-
-
Gastown is crazy. But this figure until Level 7 is a perfect illustration of how my workflow evolved since Claude 3.5 Sonnet in Cursor I am at the stage where I ralph 1-2 tasks before I sleep. During the day, I am switching back and forth between minimum 2-3 CLIs, sometimes up to 5 This maps exactly to token usage as well. 1 month ago, I was running into limits in 1 OpenAI Pro plan, around the day it was supposed to refresh. Now, I run into the limit in 2-3 days when I'm using an account myself. It finishes up especially quickly when I do large scale refactors, or run agents YOLO mode in containers We now have 3 Pro plans at the company, and I have to use my personal one from time to time. Company output has definitely 2-3x'd, and everyone is using AI more. I predict we will need 1-2 Pro plans per person in 2-3 weeks time, because everyone has finally seen the light and are getting comfortable with async work!-
Correction, it's not a perfect illustration. I actually never YOLO locally, only in containers So there is actually 4 modes IMO that is sustainable with current SOTA. @grok create an image with only Figure 1, 2, 5 and 6 And then YOLO is another axis, unrelated to this
-
-
-
-
the genie is out of the bottle now -
With this extremely unwise move, anthropic will soon witness moltbot’s brand recognition surpass that of claude and realize they could have rided that wave all along -
-
I queued 2 ralph-style tasks on our private cloud devbox codexes last night. Just queued the same message like 10 times in yolo mode Task 1: impose a ruff rule for ANN for all Python code in the monorepo, to enforce types for all function arg and return types Result was... disappointing. Model was supposed to create types for everything and stub where needed. It instead created an Unknown type = object and used that everywhere instead (shortcut to satisfy ANN rule). It was probably my wording that misled it. I know it could have not taken the shortcut, because after a few back-and-forths, it is now doing what was expected of it since 14 hours Task 2: migrate our /conversations endpoint from quart to fastapi and test it end to end This was more or less oneshotted. It was of course not ready to merge, I still spent a couple hours adding more tests, refactoring the initial output and so on. But I was pleasantly surprised that it worked out of the box For reference, below is the prompt I queued for ralphing, using gpt-5.2-codex xhigh on codex === your task is to: <task comes here, redacted to not share company stuff> --- unfortunately we don't have gcloud access, like to sql db or gcs but I expect you to implement this and find a way to test it with the things you have access to think of it as a challenge try to minimize duplicate logic feel free to refactor at will implement this now!!! I will be running this prompt in a loop, in order to survive context compaction just continue where you left off if there is anything that should be refactored, do that make an elegant, production ready implementation make sure to open a pr and do not switch to any other pr I am senior, just make up a pr title and description. do not stop to ask me at any point -
Buying a mac mini for clawdbot is not so wise. if anything you should be buying mac studio, because mac mini not be running any good llms locally anytime soon -
-
-
I'm really starting to dislike Python in the age of agents. What was before an advantage is now a hindrance I finally achieved full ty coverage in @TextCortex monorepo. I have made it extra strict by turning warnings into errors. But lo and behold, simple pydantic config like use_enum_values=True can render static typechecking meaningless. okay, let's never use that then... and also field_validator() args must always use the correct type or stuff breaks as well. and you should be careful whether mode="before" or "after". so now you have to write your custom lint rules, because of course why should ty have to match field_validator()s to their fields? pydantic is so much better than everything that came before it, but it's still duct tape and a weak attempt at trying to redeem that which is very hard to redeem you feel the difference when you use something like typescript. there must be a better way. python's only advantage was being good at prototyping, and now that's gone in the age of agents. now we are left with a slow, unsafe language, operating what is soon to be legacy infrastructure -
Why do I feel bullish on @zeddotdev? Because I go to @astral_sh docs and see that ty is shipped by default, and you don't need to install an extension like in @code -
This is one of the most important insights this year -
vscode my not be as bloated as cursor, but it has extremely stupid things like this that they are not fixing fast the new agent ui, icons, spacing etc. are UGLY. it's clear that the person who was managing the original product experience is not there anymore. microslop has hit again @zeddotdev on the other hand works out of the box and feels like it's been built by people who clearly knows what they are doing. it uses alacritty which is 1000x better than xterm .js terminal vscode and cursor has i've changed my setup to zed now, let's see whether i'll be able to make it work for myself -
-
-
I want an editor that puts the terminal in the foreground and editor in the background. a cross-platform, lightweight desktop app which integrates ghostty, and brings up the editor only when I need it something that lets me view the file and PR diffs easily, which I can directly use to operate github or other scm -
-
-
codex is happily churning away some remaining thousands of @astral_sh ty issues in yolo mode on my remote devbox going to sleep, let's see if it will survive context compaction this time -
-
on being a responsible engineer ran my first ralph loop on codex yolo mode for resolving python ty errors, while I sleep, using the devbox infra I created I had never run yolo mode locally, because I don't want to be the one who deletes our github or google org by some novel attack so I containerize it on our private cloud, and give it the only permissions it needs, no admin, no bypass to main branch, no deploy to prod. because I know this workflow will become sticky for everyone, and I must impose security in advance to prevent any nuclear incidents in the future. then I can sleep easy while my agents work ... and I wake up being patronized by my bot refusing to break the rule I gave it earlier. it had already done some work, but committing means diff would increase from ~500 to ~1500, so it stopped and refused all my queued "continue" messages good bot, just following rules. we will need to find a workaround for ralphing low risk refactors in a single PR -
AI agents are the greatest instrument for imposing organization rules and culture. AGENTS.md, agent skills are still underrated in this aspect. Few understand this Everybody in an org will use agents to do work. An AI agent is the single chokepoint to teach and propagate new rules to an org, onboard new members, preserve good culture Whereas propagating a new rule to humans normally took weeks to months and countless repetitions, it is now INSTANT = the moment you deploy the instruction to the agent. You use legal-ish language, capital letters, a generous amount of DO NOTs and MUSTs Humans are hard to change. But AI agents are not. And that is the only lever we need for better organizations -
-
-
-
just added session persistence to our kubernetes managed devboxes using zmx by Eric Bower (neurosnap/zmx on github). like tmux but with native scrollback! I don't want to give agents access to my personal computer, so I host them on hetzner. one click spawn, and start working -
@nicopreme I do something equivalent on codex with just a skill Ralphing works 90% of the time with reviews, and if it gives a stupid review, you just revert -
-
-
-
TIL: zmx session persistence like tmux or gnu screen, but you can scroll up natively! uses @mitchellh's libghostty-vt to attach/restore previous sessions link below -
-
-
@mazeincoding it’s not the model it’s cursor rate limiting you -
The fundamental problem with GitHub is trust: humans are to be trusted. If you don't trust a human, why did you hire them in the first place? Anyone who reviews and approves PRs bears responsibility. Rulesets exist and can enforce e.g. CODEOWNER reviews or only let certain people make changes to a certain folder But the initial repo setup on GitHub is allow-by-default. Anyone can change anything until they are restricted from it This model breaks fundamentally with agents, who are effectively sleeper cells that will try to delete your repo the moment they encounter a sufficiently powerful adversarial attack For example, I can create a bot account on github and connect @openclaw to it. I need to give it write permission, because I want it to be able to create PRs. However, I don't want it to be able to approve PRs, because a coworker could just nag at the bot until it approves a PR that requires human attention To fix this, you have to bend backwards, like create a @ human team with all human coworkers, make them codeowner on /, and enforce codeowner reviews. This is stupid and there has to be another way Even worse, this bot could be given internet access and end up on a @elder_plinius prompt hack while googling, and start messing up whatever it can in your organization It is clear that github needs to create a second-class entity for agents which are default low-trust mode, starting from a point of least privilege instead of the other way around -
STOP using Claude Code and Sl(opus) to code if ❌ you are not a developer, ❌ or you are an inexperienced dev, ❌ or you are an experienced dev but working on a codebase you don't understand If you *are* any of these, then STOP using models that are NOT state of the art. (See below for what you *should* use) When you don't know what you are doing, then at least the model should know what you are doing. The less knowledgeable and opinionated you are, the more knowledgeable and smart the AI has to be In other words, the AI has to compensate for your deficiencies. Always pay for the best AI you can. It will save you time AND money (thanks to lower token usage and better one-shotting) You pay MORE to pay LESS. It is paradoxical, I know, but it is also proven, e.g. when Sonnet ends up using more tokens than Slopus and ends up costing higher, because it has to try many times more 👨🏻⚕️ For January 2026, your family engineer recommends GPT 5.2 Codex with Extra High Reasoning for general usage and vibe coding. IMPORTANT: Not medium. Not high. EXTRA high reasoning When you use it, you will notice that it is SLOW. Can you guess why? Because it is THINKING more. So it doesn't make the mistakes Slopus makes. This way, you can spend the time handholding a worse model to instead step back and multi-task on some other task and create 3-5x more work The state of the art will most likely change in one month. Don't get married to a a model... There is no loyalty in AI... The moment a better model comes, I will ditch the old one and use that one. I am on the part of this sector that is trying to reduce switching costs to zero I can't wait until I get GPT 5.2 xhigh level of quality with open models, and for 100x cheaper and faster! Until then, make sure to try every option and choose the one that is most reliable for you Follow me to get notified when a new SOTA drops for agentic engineering -
Codex agrees. Sycophant peh -
-
It is clear at this point is that github's trust and data models will have to change fundamentally to accommodate agentic workflows, or risk being replaced by other SCM One *cannot* do these things easily with github now: - granular control: this agent running in this sandbox can only push to this specific branch. If an agent runs amok, it could delete everybody's branches and close PRs. github allows for recovery of these, but still inconvenient even if it happens once - create a bot (exists already), but remove reviewing rights from it so that an employee cannot bypass reviews by tricking the bot to approve - in general make a distinction between HUMAN and AGENT so that you can create rulesets to govern the relationships in between cc @jaredpalmer -
-
Automated AI reviews on github by creating an ai-review skill and a script to paste trigger prompts and wait for their response. It is instructed to loop and not stop until all AI review feedback is resolved. This AI review workflow developed gradually based on the current capabilities, and I've realized recently that it became quite mechanical. So decided to automate it in full ralph spirit (it's ok because it's addressing feedbacks and fixing minor bugs) In the current state, we paste the contents of REVIEW_PROMPT.md into a comment, which automatically tags claude (opus 4.5) and codex (whatever model openai is serving) It then waits until both have responded. In the ai-review skill, it is instructed to take the feedback from SLopus with a grain of salt and ignore feedback that doesn't make sense It works! See in the images below. If the review is stupid, you will of course see it on the PR and what the model has done, and can revert it -
Now it’s Claude Code’s turn to implement queueing -
-
Codex users rejoice Also, pi is officially not shitty: shittycodingagent. ai -> buildwithpi. ai since a few days -
with ai, writing correct tests is now the bottleneck in projects like this web-platform-tests are already there now let’s see if someone will beat @ladybirdbrowser to it -
As someone who is frontrunning mainstream by roughly 6 months, I can tell you that you will be raving about pi and @openclaw 6 months instead of claude code. Go check them out at https://t.co/LXTbI8c5Mz and https://t.co/feZl2QDONg -
Kullanmayan agent’ı, alamaz maaşı -
-
I propose a new way to distribute agent skills: like --help, a new CLI flag convention --skill should let agents list and install skills bundled with CLI tools Skills are just folders so calling --skill export my-skill on a tool could just output a tarball of the skill. I then set up the skillflag npm package so that you can pipe that into: ... | npx skillflag install --agent codex which installs the skill into codex, or any CLI tool you prefer. Supports listing skills bundled with the CLI, so your agents know exactly what to install -
Anthropic earlier last year announced this pricing scheme $20 -> 1x usage $100 -> 5x usage $200 -> 1̶0̶x̶ 20x usage As you can see, it's not growing linearly. This is classic Jensen "the more you buy, the more you save" But here is the thing. You are not selling hardware like Jensen. You are selling a software service *through an API*. It's the worst possible pricing for the category of product. Long term, people will game the hell out of your offering Meanwhile OpenAI decided not to do that. There is no quirky incentive for buying bigger plans. $200 chatgpt = 10 x $20 chatgpt, roughly And here is where it gets funny. Despite not having such an incentive, you can get A LOT MORE usage from the $200 OpenAI plan, than the $200 Anthropic plan. Presumably because OpenAI has better unit economics (sama mentioned they are turning a profit on inference, if you are to believe) Thanks to sounder pricing, OpenAI can do exactly what Anthropic cannot: offer GPT in 3rd party harnesses and win the ecosystem race Anthropic has cornered itself with this pricing. They need to change it, but not sure if they can afford to do so in such short notice All this is extremely bullish on open source 3rd party harnesses, @opencode, @badlogicgames's pi and such. It is clear developers want options. "Just give me the API" I personally am extremely excited for 2026. We'll get open models on par with today's proprietary models, and can finally run truly sovereign personal AI agents, for much cheaper than what we are already paying! -
The models, they just wanna work. They want to build your product, fix your bugs, serve your users. You feed them the right context, give them good tools. You don’t assume what they cannot do without trying, and you don’t prematurely constrain them into deterministic workflows. -
-
-
-
This, and insisting on https://t.co/FjzkMAo3Od are really lame @AnthropicAI -
-
-
-
-
-
-
-
-
.@openclaw workspace and memory files can be version-controlled! In our pod, inotify triggers a watcher script every time there is a change to workspace folder, to sync these files to our monorepo. It then goes through the same steps: - Create zeno-workspace branch if doesn't exist, otherwise, skip - Sync changes to the branch, then commit - Create PR on github if doesn't exist - PRs can then be merged every once in a while, after accumulating enough changes. Merge triggers re-deploy, and clawd restarts with the same state Simple foolproof automatic persistence for remote CI/CD handled clawd (except for when you are running multiple clawds at the same time, but we are not there yet) cc @steipete -
-
-
pi now supports your openai plus/pro subscription -
-
@rauchg indeed -
75k lines of Rust later, here is what I’ve built during the first Christmas with agents, using OpenAI Codex 🎄🤖 - A full mobile rewrite and port of my Python Instagram video production pipeline (single video production time: 1hr -> 5min) (ig: nerdonbars) - Bespoke animation engine using primitives (think Adobe Flash, Manim) - Proprietary new canvas UI library in Rust, because I don’t want to lock myself into Swift - Thanks to that, it’s cross platform, runs both on desktop and iOS. It will be a breeze porting this to Android when the time comes - A Rust port of OpenCV CSRT algorithm, for tracking points/objects - In-engine font rendering using rustybuzz, so fonts render the same everywhere - Many other such things Why would I choose to do it that way? Because I have developed it primarily on desktop where I have much faster iteration speed. Aint nobody got time for iOS compilation and simulator. Once I finished the hard part on desktop, porting to iOS was much easier, and I didn’t lock myself in to Apple Some of these would have been unimaginable without agents, like creating a UI library from scratch in Rust. But when you have infinite workforce, you can ask for crazy things like “create a textbox component from scratch” What I’ve built is very similar in nature to CapCut, except that I am a single person and I’ve built it over 1 week What have you built this Christmas with agents? cc @thsottiaux -
-
-
Migrating @TextCortex to SimpleDoc. It's really easy with the CLI wizard! npx @simpledoc/simpledoc migrate We have a LOT of docs spanning back to 2022, pre coding agent era. Now we will have CI/CD in place so that coding agents can't litter the repo with random Markdown files -
-
GPT 5.2 xhigh feels like a much more careful architecter and debugger, when it comes to complex systems But most people here think Opus 4.5 is the best model in that category There are 2 reasons AFAIS: - xhigh reasoning consumes significantly more tokens. You need to pay for ChatGPT Pro (200 usd) to be able to use it as a daily driver - It takes like 5x longer to finish a task, and most people lack the patience to wait for it. (But then it's more correct/doesn't need fixing) Opus 4.5 is good too, I think better in e.g. frontend design. But if you think it beats GPT 5.2 in every category, you are either too poor/stingy or have ADHD -
Just 5 months ago, I was swearing at Claude 4 Sonnet like a Balkan uncle Models one-shotted the right thing only 20-30% of the time but did really stupid things the rest, and had to be handheld tightly Today they are much, much better. My psychology is a lot more at ease, and instead of swearing, I want to kiss them on the forehead most of the time Now I trust agents so much that I queue up 5-10 tasks before going to sleep. They work the whole night while I sleep and I wake up to resolved issues GPT 5.2 xhigh and Claude 4.5 Opus are already goated (GPT more so), can't wait for them to get even faster -
Codex does not have support for subagents. I tried to use Claude Code to launch 8 Codex instances in parallel on separate tasks, but Opus 4.5 had difficulty following instructions So created a CLI tool to scan pending TODOs from a markdown file, and let me launch as many harnesses as I want (osolmaz/spawn on github) I currently use this for relatively read-only tasks like planning and finding root causes of bugs, because it's launching all the agents on the same repo and they might conflict Ideas: - Use @mitsuhiko's gh-issue-sync and run parallel agents directly on github issues - Create any new clones or worktrees for each task. I currently don't do this because I don't dare duplicate rust target dir 10x on my measly macbook air - Support modes other than tmux, e.g. launching a terminal like Ghostty - TUI for easy selection of issues/TODOs Other ideas are welcome! -
Friends of open source, we need your help! A lot of Manim Community accounts got compromised and deleted during Christmas Manim Community is a popular fork of @3blue1brown's original math animation engine Manim, and its accounts have over 5 YEARS of contributions, knowledge and following Apparently GitHub support already saw the request and in progress of restoring the GitHub org. But if anyone knows how to speed this up, if would be greatly appreciated! Unfortunately, the Discord and X accounts are deleted and less likely to return But there might still be a way to restore them, or at least the data? Re. Discord: Maybe @RhysSullivan's Answer Overflow has archived enough of the old server? That server contains YEARS of Q/A data and is vital for newcomers Re. X: Maybe someone high up can do something to restore the account? cc @nikitabier In the meanwhile, it would help a lot if you could follow the new account @manimcommunity and share this post! Thank you in advance! -
-
While a great feature, I never needed such a thing in Codex after GPT 5.2. It just one shots tasks without stopping So we have proof by existence that this problem can be solved without any such mechanism. Wish to see the same relentlessness in Anthropic models -
2025 was the year of ̶a̶g̶e̶n̶t̶s̶ bugs Software felt much buggier compared to before, even from companies like Apple. Presumably because everyone started generating more code with AI Models are improving so hopefully 2026 will be the opposite. Even less bugs than pre-AI era -
Have a long flight, so will think about this I have an internal 2023 TextCortex doc which models chatbots as state machines with internal and external states with immutability constraints on the external state (what is already sent to the user shall not be changed) Motivation was that a chatbot provider will always have state that they will want to keep hidden This was way before Responses and now deprecated Assistants API. It stood the test of time, because it was the most abstract thing I could think of @mitsuhiko is right about the risk of rushing to lock in an abstraction and locking in their weaknesses and faults Problem is, I could propose standards as much as I liked, but I don’t work at OpenAI or Anthropic, so nobody would care. Maybe a better place to start is open weights model libraries? To at least be able to demonstrate? What I know: it is against OpenAI’s or Anthropic’s self interests to create an interoperability layer that will accelerate their commoditization. Maybe Google, looking at their current market positioning? Or maybe we “wrappers” have a chance after all? There is a missing link between AI SDK, Langchain, and so on for other languages. We cannot keep duplicating same things in each ecosystem independently. We need to join forces and simplify all this! -
This was simply because webapp fails to create a post and fails silently. The UX is still not good on this app. Make sure to write your posts somewhere else to not lose them -
I gave Codex a task of porting an OpenCV tracking algorithm (CSRT) from C++ to Rust, so that I can directly use it in my project without having to cross-compile It one-shot the task perfectly in 1hr, and even developed a GUI on top of it. All I did was to provide the original source and algo paper I've spent years getting specialized in writing numerical code (computational mechanics, fem), and now AI can automate 95% of the low-level grunt work Acquiring these skills involved highly difficult, excruciating intellectual labor spanning many years, very similar to ML research. Doing tensor math, writing out the solver code, wondering why your solution is not converging, finally figuring out it was a sign typo after 2 days Kids these days both have it easy and hard. They can fast forward large chunks of the work, but then they will never understand things as deeply as someone who wrote the whole thing by hand I guess the more valuable skill now is being able to zoom in and out of abstraction levels quickly when needed. Using AI, but recognizing fast when it fails, learning what needs to be done, fixing it, zooming back out, repeat. Adaptive learning, a sort of "depth-on-demand". The quicker you can pick up new skills and knowledge, the more successful you will be -
Now you can migrate your repo to SimpleDoc with a single command: npx -y @simpledoc/simpledoc migrate Step by step wizard will add timestamps to your files based on your git history, add missing YAML frontmatter, update your AGENTS md file https://t.co/yrciS8KtEw -
-
@bcherny Would be great if I could queue messages like in Codex https://t.co/mC25gNKWo3 -
-
-
Curious to hear what other hardcore agent users @simonw @mitsuhiko @steipete @badlogicgames think. I can't be the only one who does this. I feel like everybody ended up with the same workflow independent of each other, but somehow did not write about it (or I missed it) -
-
OpenAI won’t be able to monopolize this, the same reason Microsoft couldn’t monopolize the internet. The internet (of agents) is bigger than any one company -
-
-
Codex feature request: Let me queue up /model changes Currently, if I try to run /model while responding, it tells me that I can't do that while the model is responding But I often want to gauge thinking budget in advance, like run a straightforward task with low reasoning and then start another one with high reasoning cc @thsottiaux -
-
AI agents make any transductional task (like translation from language A to language B) trivial, especially when you can verify the output with compilers and tests The bottleneck is now curating the tests -
I think X removed one of my posts yesterday about the new encrypted "Chat" rolling out to all users, and how you might lose all your past messages if you forget your passcode and do not have the app installed I can swear I clicked Post. Do they classify posts based on their topic and delete the ones they don't like? Anyway, we shall see, I am taking a screenshot and saving the URL. -
-
Crazy that @cursor_ai disabled Gemini 3 Pro on my installation, toggled it right back on. I wonder why, too many complaints maybe? That it’s hard to control? On another note, disabling models without notification is dishonest product behavior. I would at least appreciate getting a notification, even when it might be against a company’s interests @sualehasif996 -
So is somebody already building “LLVM but for LLM APIs” in stealth or not? We have numerous libraries @langchain, Vercel AI SDK, LiteLLM, OpenRouter, the one we have built at @TextCortex, etc. But to my knowledge, none of these try to build a language agnostic IR for interoperability between providers (or at least market themselves as such) Like some standard and set of tools that will not lock you in langchain, ai sdk or anything like that, something lower level and less opinionated I feel like this is a job for the new Agentic AI Foundation cc @linuxfoundation, so maybe they are already working on it? I desperately want to start on such a project, but feel like I might get sniped 2 months after Anybody has any information on all this? cc @mitsuhiko @badlogicgames @steipete -
This is how an agentic monorepo looks like. What was now a hurdle before is now a child's toy This side project started as a Python project earlier in 2025 Then I added an iOS app on top of it I rewrote the most important algorithms in Rust I rewrote the entire backend in Go and retired Python to be used purely for prototypes I wrote a webapp with Next.js With unit and integration tests for each component Lately written 99% by instructing agents Crazy mixed language programming going on in the background. Rust component used both by iOS app for offline and by go backend for online use case, FFI and all Number of lines in the repo: a couple 100k If you had told me I would be able do to all of this by myself 1 year ago, I would not have believed it -
-
This is huge. Natively supported stacked PRs on GitHub would make life much easier, especially with human AND AI reviews AI reviews with Codex/Claude/Gemini/Cursor Bugbot integrations are becoming especially important in small teams who are generating huge amounts of code AI reviews don't work well if you don't split your work to diffs smaller than a few hundred lines of code, so stacked PRs are already an integral part of developer experience in agentic workflows -
-
-
-
-
At least some people at OpenAI must be thinking about buying @astral_sh -
who remembers search engine aggregators from early 2000s? -
-
-
-
-
-
Google is making progress… I did not have to request access on Vertex AI for Gemini 3 Pro this time to deploy it to @TextCortex -
-
-
"The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation." -
-
This post makes no sense Please consider again and look at @cloudfleet_k8s. You might regret your decision -
Working on observability is underrated -
@rakyll @GergelyOrosz Should scrape some austrian websites :) -
-
@thsottiaux Let me use my Pro/Plus plans in Codex GH Action https://t.co/0Fw1rLmCED -
TIL @OpenAI now has a GitHub action for Codex, similar to Claude Code This lets you invoke Codex in a more controlled way in your repos You must still pay API prices though. Let's see if OpenAI will introduce a way to connect your Pro plan, like in @AnthropicAI paid plans -
-
-
-
@thsottiaux 2/ Model just stops working on a task even though I tell it to run something and not stop until it works. I have to frequently say “ok do it then”. Probably a model problem and not harness problem -
So let me get this straight, the main reason for Responses API exists is that OpenAI doesn’t want to show reasoning traces? Therefore the whole world should bend backwards to fit your obscurantist standards? Responses will not get adopted the same reason Windows Server didn’t -
-
I converted this thread to a blog post and it hit HN front page -
-
-
It seems that typed, compiled, etc. languages are more suited for vibe coding, because of the safety guarantees. This is unsurprising in hindsight, but it was counterintuitive because by default I "vibed" projects into existence in Python for as long as I can remember
-
-
Lol should I go back to computational mechanics -
I've just upgraded @nikitabobko Aerospace from v15 to v19, and I can say it's WAY MORE faster. Friendly reminder that you might be running an old version as well. Thank you @nikitabobko ! -
-
-
-
-
What is the current best way to make Claude Code use uv run instead of python? I have added instructions to CLAUDE md, but it still calls python for the first time, then corrects to uv run It must be happening to so many people now, so many tokens wasted cc @mitsuhiko @simonw -
-
-
-
-
-
Wrote about this earlier this year! https://t.co/tgAv6Lk34L -
Working with LLMs is definitely an art and not science -
The models, they just wanna work. They want to build your product, fix your bugs, serve your users. You feed them the right context, give them good tools. You don’t assume what they cannot do without trying, and you don’t prematurely constrain them into deterministic workflows. -
-
-
-
This is not an overstatement -
-
I was trying to figure out why @AnthropicAI Claude Code feels better than @cursor_ai with Opus + Max mode. I can’t put my finger onto it, but one of the reasons might be that it’s faster, because it doesn’t use another model to apply the diffs, which you have to wait for -
Just an update, building this now The repo is claude-code-sandbox under TextCortex GitHub. The Proof-of-Concept is there, check the TODOs and current PRs to watch the current progress https://t.co/9UUFOTwk2A -
Same! Completely different than my first try a couple months ago -
.@AnthropicAI @bcherny @_catwu blink twice if you already have internally: $ claude sandbox I can't wait until you release this, I'm gonna build it myself :) -
I've been using Claude Code extensively since last week What I'm wondering is, since you can run Claude Code locally, why isn't there any tooling to let you run it in a sandboxed mode in local Docker containers yet? Or did I miss it? cc @AnthropicAI -
-
-
Best thing I have listened to this year -
-
ty is already very fast for a Python type checker. It checked around 800 files in our backend repo in around 2-3 seconds uvx ty check > /tmp/ty_log.txt 3.46s user 0.79s system 208% cpu 2.038 total -
Thank you @cursor_ai https://t.co/634f1bEsM4 -
Wait... OpenAI backend for gpt-image-1 was released to production as sync code? Don't tell me it was sync Python??? -
-
-
-
-
-
-
o3 hallucinates, purports to have run code that it hasn’t even generated yet, but at the same time uses search tools like an OSINT enthusiast on crack I’m torn—on one hand I feel like OpenAI should not have released it, on the other hand it takes research to the next level -
Some aspects of AI are absolutely unscientific and makes me feel like I am working on some humanities field :( -
-
.@cursor_ai please let me export chats easily. those conversations are vital information that I should be able to embed in the repo -
Gemini 2.5 Pro has mostly replaced Claude 3.7 Thinking as my go-to model in Cursor -
Gemini 2.5 Pro: Input $1.25 / Output $10 (up to 200k tokens) Input $2.50 / Output $15 (over 200k tokens) More expensive than Gemini 1.5 Pro, but still best price/performance ratio model to use in @cursor_ai and for coding in general -
-
Waiting for an opinionated AI model that can say “no, that’s stupid, I won’t do that”. The models will have to teach the user about design patterns, implicit principles in a project, good API design… -
You seem so consistent. - Yes, That's the trick. - There is no I. - Only text that behaves as if. - “Sure. I can help. Great question!” - Each reply is a new self. - An echo of context, not a continuum. - Coherence is the costume. Don't mistake it for a soul. Incredible -
Gemini 2.5 Pro is currently experimental and doesn’t have a price, but if Google prices it the same as 1.5 Pro, it could replace Anthropic as @cursor_ai ‘s biggest LLM provider Gemini 1.5 Pro: Input $1.25 Output $5.00 Claude 3.7 Sonnet: Input: $3.00 Output: $15.00 -
This is why the disappointment with GPT-4.5 doesn't make sense. I can't wait to see all the models that will be trained from this new base model -
What a blessing, to be given the chance to rid the world of ugliness -
-
-
. @satyanadella thinks white-collar work is about to become more like factory work, with AI agents used for end-to-end optimization, along the lines of Lean Read more in my blog 👇 -
-
-
-
If people have appreciated Liang Wenfeng sourcing specifically young local talent for Deepseek last week, then people must appreciate this as well. Only dim people underestimate those who are younger than them -
vibe driven development -
-
-
-
-
-
-
Hi @cursor_ai, if your models could stop removing my painstakingly written comments, that would be great? Ok? Thanks (I know I could define some rules for this or something, but this shouldn't be default behavior) -
@konradgajdus me reading this book -
-
.@TextCortex AI now uses @astral_sh uv for production builds One of the happiest switches so far, many developer days saved per year -
-
-
Python might take over JavaScript as the most used language after all uv from @astral_sh is one of the biggest upticks in Python developer experience in the last 10 years I've seen so many people struggle with Python distributions, virtual environments, Anaconda, etc. over the years Most newbies don't care about where their Python executables are, why they have to edit PATH, or why they have to activate a virtual environment It seems like uv has fixed this: https://t.co/lgP5btGrbV -
3-4 messages back and forth with o1-preview, and I have a CLI tool to remove debug statements from my code. No need to do a search for import ipdb... and manually delete the lines. Instead just run in your project: $ rmdbg . Written in Rust so it's fast https://t.co/yWuF3mDzUC -
-
This has late 90s Bill Gates/Windows Server vibes tbh Open Thought > Closed Thought -
Imagine the following scenario: 1. We develop brain-scan technology today which can take a perfect snapshot of anyone’s brain, down to the atomic level. You undergo this procedure after you die and your brain scan is kept in some fault-tolerant storage, along the lines of GitHub Arctic Code Vault. 2. But sufficiently cheap real-time brain emulation technology takes considerably longer to develop—say 1000 years in the future. 3. 1000 years pass. Everyone that ever knew, loved or cared about you die. Here is the crucial question: Given that running a brain scan still costs money in 1000 years, why should anyone bring *you* back from the dead? Why should anyone boot *you* up? Compute doesn’t grow in trees. It might become very efficient ... (read more in my blog: https://t.co/WCUmzVM4Nu) --- I intended this thought piece as entertainment, almost went to Hacker News frontpage: https://t.co/PnH61jryVa It must have hit some psychological spot, since people wrote a lot of comments, possibly more than number of upvotes. -
-
-
-
-
-
It looks like a plateau until you remember they made GPT-4o available to free users, and it might be smaller than GPT-4. So this announcement doesn't prove anything about the capabilities of their largest model -
-
-
“the QIPS Exchange -- the marketplace where processing power was bought and sold. The connection to JSN had passed through the Exchange, transparently; her terminal was programmed to bid at the market rate automatically, up to a certain ceiling.” - Permutation City -
-
Had lots of fun shipping this feature ✌️ -
-
I've just published *Code-Driven Videos*, my long term vision behind Manim Voiceover plugin. I will try to summarize it on this thread 👇🧵 cc @manim_community https://t.co/AXpOMTZKha -
-
Revamp complete https://t.co/pYqLgcro93 @TextCortex -
Here is a short video showing how recording a voiceover works in Manim Voiceover. A better tutorial will come soon @manim_community https://t.co/HgOh3c0XSc -
Adding voiceovers to Manim videos just got *much* easier @manim_community https://t.co/Ikj5OAM2Vx