@onusoz · /2026/06/30 · 04:30 PM View on

I hate to admit that Opus 4.8 is much, much better at prose and writing academic text, compared to GPT-5.5 GPT 4.5 was the last openai model that was good at writing, and it's gone now 😭

@onusoz · /2026/06/29 · 06:07 AM View on

Another annoying GPT-ism (circa June 2026): while describing something, it always describes what it *does*, but never what it *is* > "LocalPerf benchmarks local LLM inference servers and keeps the evidence in one portable run artifact." If I were a philosopher, I would say that "AI lacks ontology". Or that it is "anti-essentialist", believes that things cannot be things in themselves But all models literally have an ontology. They have it since word2vec days, you can plot it out. It's just an annoying tendency in GPT's writing So using philosophy to understand AI might be dumb sometimes If you don't want your README's to sound like slop, then you can steal my write-readme skill: After write-readme: "LocalPerf is a local LLM inference benchmark CLI. It runs benchmark plans against local inference servers and stores the evidence in one portable run artifact." Skill: github.com/osolmaz/tools/blob/main/agents/skill…

Image hidden

@onusoz · /2026/06/27 · 04:28 AM View on

Give @LottoLabs a follow if you are not already He is building localmaxxing.com, crowdsourced LLM benchmark results and performance profilings Very very cool

Image hidden

@onusoz · /2026/06/27 · 03:22 AM View on

This. Especially when the whole machine hangs instead of OOMing when my agent accidentally loads too many models into memory (lmk if there is a firmware update or sth that fixes this on the spark now, creating a cgroup doesn’t work) forums.developer.nvidia.com/t/dgx-spark-be…

@ekzhang1 · Jun 26, 2026

shower thought: AI inference engines like vLLM/SGLang might be the hardest “server”-shaped OSS to configure, ever made. Like it’s probably harder to configure them optimally than even Postgres, Nginx, Kubernetes, maybe even Kafka/Zookeeper … just in terms of all the issues, bugs, crashes, scheduler dynamics, kernels etc and it changes every month

@onusoz · /2026/06/26 · 02:48 AM View on

Apple hiked prices as I was posting this 💀

@onusoz · Jun 25, 2026

The Local Frontier is advancing The amount of AI memory for inference we can get for less than $3000 has been steadily increasing The memory crunch has slowed this down and even made it retrograde. However, once we bounce back from it, the progress will be glorious

Image hidden

@onusoz · /2026/06/26 · 01:37 AM View on

Was great to be there, thanks for the invite @lionelsimai!

@lionelsimai · Jun 26, 2026

500 plus builders signed up for one night. Here is what that room taught us about agentic AI. Last night we hosted our first major OpenClaw Singapore Agentic Night, in partnership with the SMU Institute of Innovation and Entrepreneurship and the SMU AI Club. The room was full of people who do not just talk about AI, they build with it. David, who leads solution architecture at Alibaba Cloud, opened with Qwen 3.7 Plus. Built for agentic work, it can see, code and generalise across harnesses while handling long tasks by holding memory, planning in steps and correcting itself along the way. @vincent_koc, Chief Architect of OpenClaw, followed with The State of the Claw and showed what the team is building next. Mobile apps now launched, Microsoft Scout, a serverless gateway, agent profiles, agent identity and ShellBench. @onusoz from Hugging Face showed the power of pairing OpenClaw with local models, proving you can run powerful agents without sending your data to the cloud. Our fireside chat brought together Vincent Koc, Professor Sun Sun Lim and myself. We went deep on agentic AI, digital upskilling, and trust and collaboration in the age of AI. To close, Queenie Mengyun Wu from imToken demoed Sigil, a security plugin that makes running agents on the OpenClaw harness far safer. Three lessons stood out for any enterprise building with this. The model is not the product, the harness is. The real unlock is the layer around the model that decides what an agent can see, do and touch. You are not buying a model, you are building a system. Local models are ready for real work. You can run powerful agents without sending sensitive data to the cloud, which changes everything for finance, healthcare and legal. Safety is built in, not bolted on. The moment you let an agent act, you need a way to guide and contain it. The winners will treat trust and security as part of the build from day one. Thank you to the SMU Institute of Innovation and Entrepreneurship and the SMU AI Club for partnering with us, to every speaker who gave so generously, and to the 500 plus builders who showed up ready to build. Singapore is not just adopting agentic AI, it is ready to build it. The talent is here, the curiosity is here, and now the room is here too.

Image hidden

@onusoz · /2026/06/25 · 05:01 PM View on

How I imagine @TheAhmadOsman I actually bought my GB10 thanks to him back in feb, at a discount Give him a follow if you are not already!

@TheAhmadOsman · Jun 24, 2026

We're eating good boys with a DGX Station tonight

Image hidden

@onusoz · /2026/06/25 · 04:51 PM View on

The Local Frontier is advancing The amount of AI memory for inference we can get for less than $3000 has been steadily increasing The memory crunch has slowed this down and even made it retrograde. However, once we bounce back from it, the progress will be glorious

Image hidden

@onusoz · /2026/06/25 · 04:21 PM View on

Was great to talk, thank you @ben_burtenshaw for inviting me!

@huggingface · Jun 25, 2026

Welcome to Open Source AI: Run Your Own Models Locally x.com/i/broadcasts/1…

@onusoz · /2026/06/24 · 04:22 PM View on

I knew it would find me, sooner or later 🫠

@victormustar · Jun 24, 2026

some genius invented a kebab benchmark for llms

@onusoz · /2026/06/24 · 05:11 AM View on

Besides being hot, this take is very correct and points out to a fundamental tradeoff in the storage layer being centralized versus distributed git is distributed and that makes total sense for code which takes small space by its nature. it is cheap for everyone to duplicate it locally. this proved to be very useful over e.g svn, when devs could develop independently from the centralized server AI artifacts, however, are 1 million times bigger than code. in that case, the bottleneck becomes storage and network. decentralization and version control become lower priority. they can be sacrificed the tradeoff tilts towards getting the cheapest possible storage and transfer. because you will need a LOT of that I regret to announce to competitors that Hugging Face has already won this game when they acquired Xet. the tech just works, and the network effects are immense

@ClementDelangue · Apr 2, 2026

Hot take: Git was the wrong abstraction for 90% of ML data. Checkpoints, optimizer states, training logs, agent traces - none of this needs version control. It needs fast, cheap, mutable storage. So we built Buckets. S3-like storage on the @huggingface Hub with Xet dedup and zero egress. Train in a bucket. Publish to a repo. One platform. 🤗🤗🤗

Image hidden

@onusoz · /2026/06/24 · 03:48 AM View on

Awesome. A feat unimaginable before agents

@ishaan_jaff · Jun 23, 2026

LiteLLM is moving to Rust. 🦀 Sub-1ms overhead. A sub-100MB binary. The same Python SDK and AI gateway you already use. Here's why and what's changing 🧵 docs.litellm.ai/blog/litellm-r…

@onusoz · /2026/06/24 · 03:31 AM View on

My syncer: github.com/osolmaz/xta… xTap: github.com/mkubicek/xTap My blog: solmaz.io

@onusoz · /2026/06/24 · 03:31 AM View on

My posts here on X now sync automatically to my blog, giving me full ownership of my content and zero effort SEO For free, no API costs My long-form posts are automatically featured and titled on the front page of solmaz [dot] io. Filtering is done by my claw running a sync-x skill daily, which then notifies me on Discord How do I scrape the posts? ALL the posts I view (including the ones I post) are saved locally using @kubmi's xTap and then synced to a private repo using my extension on it xtap-sync My claw has access to that private repo and can run programmatic tasks like sync-x, summarize what happened that day, notify me about any topics I want

Image hidden

@onusoz · /2026/06/24 · 02:08 AM View on

Excited!

@ClementDelangue · Jun 23, 2026

Shot a video this morning to announce a new collaboration with...

@onusoz · /2026/06/23 · 04:50 PM View on

If you are interested in running such demos, look into --demo mode in my local model swiss army knife localpi github.com/osolmaz/loc… Thank you @googlegemma for the shoutout

@googlegemma · Jun 23, 2026

16 parallel runs of Gemma 4 26B A4B on a single NVIDIA DGX Spark! Pushing 18 tok/s per instance and a 300 tok/s aggregate. It can even hit 32 parallel runs. This level of concurrency highlights how efficient the architecture is.

@onusoz · /2026/06/23 · 03:15 PM View on

i meant to qt this one x.com/herdrdev/statu…

@herdrdev · Jun 22, 2026

herdr plugin marketplace is live! herdr.dev/plugins/ if you'd like to share your plugins with other people, all you have to do is add "herdr-plugin" to your github repo's topic list, and it will be automatically added to the website!

Image hidden

@onusoz · /2026/06/23 · 03:13 PM View on

Huge. Added my github TUI github.com/osolmaz/ghz…

@herdrdev · Jun 15, 2026

herdr 0.7.0 is out, and it's a major one: it introduces plugins! the idea is simple: herdr stays lean, and everything custom gets extended through plugins. shareable, scoped, built however you want, to fit your own flow. with this release we're also shipping a few examples of what the plugin system can do. first up: a telegram plugin. herdr already controls your agents and knows their status, so the plugin just hooks into agent events and pings telegram the moment one needs you. notification lands → `herdr --remote` or ssh from your phone → straight back to the agent that needs you.

Image hidden

@onusoz · /2026/06/23 · 02:29 PM View on

Will talk about my recent adventures running local models on the Spark, Pi and OpenClaw Click Notify Me on youtube to stay tuned

@ben_burtenshaw · Jun 23, 2026

the livestream will be here: youtube.com/watch?v=wRcByx… and on @huggingface x profile.

@onusoz · /2026/06/23 · 01:34 PM View on

New blog post: Using local models for agentic zero-shot classification, in real-time, high frequency triage If you have a 128gb of memory for models (a DGX spark like I do for example), you can create a real time classifier and notifier for yourself that can classify more than >20 items per minute, using mid-sized @googlegemma and @Alibaba_Qwen models, with over 200-300 output tok/s aggregate throughput Like processing new tweets on twitter, issues/prs on github, messages on telegram and discord, in real-time Over the past few weeks, I have built one for myself, to filter and get notified about local model related issues on the OpenClaw repo I initially thought gemma-4-e4b would give me the best tradeoff I was wrong. I learned that if one has enough memory already, one should not bother with <10b models like gemma4 e4b or e2b. Precision and recall were much higher zero-shot with gemma-4-26b-a4b, whereas the smaller e4b needed significant prompt optimization to eventually not perform nearly as good To provide more context to the model, I created a restricted bash-like shell, called reposhell. In that shell, it can run read-only commands to ls/find/grep/cat openclaw source code, but only that. When the PR description/diffs are not clear enough as to categorize it, the agent reads the code to figure it out Because small models can get prompt injected, and I need to make sure that someone can't harm my setup by creating a malicious issue or PR in the openclaw repo I found that for specific systems like this, it is very convenient to extend and bundle Pi. You can create agentic CLI tools that work fully locally and for free, and keep that separate from your main pi coding setup. localpager-agent has its own session dir and tools, and I ensure that it will run local models in a secure way by isolating it from my main pi setup Once localpager-agent categorizes a PR/issue as local_models and related labels, I automatically receive it as a notification on Discord The whole implementation is fully open source and MIT licensed, alongside the dataset we used to benchmark the performance I believe zero-shot agentic classification running on local hardware will find many use cases across a wide variety of business applications, like news gathering, open source software development, customer support, content moderation, sales and so on Agents increase the amount of information produced in a lot of systems, and hence we will need to set up cheap ways to wrangle all that information In times where governments can cut off access to SOTA models on a whim, it is more important than ever to build your business on open models and if possible, run them on your own hardware! Big thanks to @evalstate and @ben_burtenshaw for their valuable feedback, especially with helping me evaluate this more rigorously! One take-away is that categorizing contributions in an open source repo is a *hard* problem, and that it is not trivial to reliably create a golden dataset with LLMs, for evaluation purposes Read more here: huggingface.co/blog/local-models-pr-triage

@onusoz · /2026/06/23 · 11:26 AM View on

One sweep over 100 samples takes around 4 hours. Next up: cross reference ground truth with predictions from hf-mem by @alvarobartt github.com/alvarobartt/hf…

Image hidden

@onusoz · /2026/06/23 · 03:01 AM View on

gpt5.5 and most other models are very bad at one-shotting nice data models gpt5.5 also has this annoying property that once it decides for a schema (or any design), it's very hard to trigger thinking again. and if you ask to "rewrite from scratch", it will write create something even more ridiculous To solve this problem, I have built a meta-harness over codex just for simplifying slop data models called schemator (work in progress) Basic idea: it mimics what I myself do while I am designing a schema: scrutinize and question each field one by one It starts a fresh codex session for each field with a fixed prompt like "Try to come up with the most Lindy data model" + a prompt for side notes It does that with a fresh context for each field, so that they are independent from each other. At the end of a review run over a field, the reviewer can propose to keep, rename or remove the field When all fields are reviewed once, that makes one iteration. Then this is looped over until the review results stabilize, and do not propose any further changes I get better results by just asking my agent to "use schemator on this" after it creates a JSON schema or SQL table Give it a try if you have codex! It has a skill, so should be easy for an agent to figure out how to use github.com/osolmaz/schemator

Image hidden

@onusoz · /2026/06/22 · 04:49 PM View on

gpt 5.5 is not naturally good at modeling and cannot create simplified nice mathematical models completely autonomously I did a parameter sweep with gemma-4-31b-a4b on memory usage, output tok/s etc. while varying context window, concurrency and other parameters. It took quite a few tries, and I still do not trust the model that gpt5 fit to the data besides, it measured linux cgroup memory and not the actual gpu memory used, so the whole sweep is wasted... output tok/s looks more accurate though, soon I will have a model that can give the optimal parameters over the space of context window <> concurrency <> tok/s <> memory usage off to do another run

@onusoz · /2026/06/22 · 07:10 AM View on

For my recent LLM leaderboard osolmaz-leaderboard.hf.space, I sum up all time total downloads (or likes) across model variants, and then divide it by the age of that model. I.e. "time decay" for popularity This gives a more time-agnostic metric for the popularity of that model. In an ideal ranking, older models that are not popular anymore should be demoted, like 2 year old Llama 3 models. If you don't do that, they might still occupy top 10 needlessly, despite having been replaced by e.g. qwen in practice Thanks to that, qwen-3-6b which came up 1 year ago and has 150m downloads can surpass llama-3-1-8b which came up 2 years ago and has 200m downloads More notes on my post: solmaz.io/popularity-ranking

Image hidden

@onusoz · /2026/06/22 · 06:58 AM View on

if you take the Most Downloaded Models of All Time, Llama 3.1 makes it to Top 10 with around 200 million total downloads (ranking is done w.r. to time-averaged downloads) RIP Llama, you walked so @googlegemma and @Alibaba_Qwen can run Also a reminder that if you build your branding on top of open weight models developed by big corps, you might eventually be the de facto owner of that brand if they pull the plug on it. Like llama.cpp @ggml_org Huge fumble by Meta

Image hidden

@onusoz · /2026/06/22 · 05:19 AM View on

My LLM leaderboard osolmaz-leaderboard.hf.space auto discovers different variants of model releases, even if they are not linked by base_model From this, I found out that @RedHat_AI was the first to release NVFP4 quantization for qwen3-6-35b-a3b Nice to see everything in one place

Image hidden

@onusoz · /2026/06/22 · 02:24 AM View on

"big token will bless me with free tokens today inshallah" is not a healthy mindset to nurture don't get me wrong, I love the subsidies and the memes

@thsottiaux · Jun 21, 2026

Image hidden

Onur Solmaz · Post · /2026/06/22

Popularity ranking cheatsheet

I recently did some work ranking models on Hugging Face. While doing that, I remembered some concepts I had known years ago from studying recommender systems. But I couldn’t find any personal notes from that time.

So I’m leaving this cheatsheet for my future self, if I ever need it again.

The main idea with popularity metrics is that it is proportional to e.g. total likes/views, and inversely proportional to the time passed to accumulate those likes/views. A lot of different platforms came up with many different ways to calculate this. And while you can model this in a certain way that maximizes some imaginary objective, what ends up being implemented first is the cheapest/most efficient algorithm.

Below are some examples, generated by GPT 5.5 xhigh.

<slop>

The abstract problem is:

\text{rank items by scarce attention}

A platform has many items and a limited front page. It needs to decide what deserves visibility now. That is usually not the same as “best,” “most useful,” or “most popular all time.”

A clean taxonomy:

\text{popular} = \text{received a lot of attention}

\text{hot} = \text{received a lot of attention recently}

\text{trending} = \text{receiving more attention than expected}

Here $A$ means attention: views, downloads, likes, votes, streams, sales, stars, comments, clicks, etc.

Raw popularity

This is the simplest ranking.

S = A

Use it when you want “biggest ever.”

Examples: most downloaded, most viewed, most sold, most starred.

Problem: old items dominate because they had more time.

Velocity

This measures speed of attention.

S = \frac{A}{t}

where $t$ is age.

Use it when you want “how fast is this spreading?”

A stricter version:

S = \frac{A}{(1+t)^\alpha}

If $\alpha = 1$ , this is close to attention per unit time. If $\alpha < 1$ , old items are penalized more gently. If $\alpha > 1$ , new items are favored aggressively.

This family is close to what Hacker News describes at a high level: HN says its basic ranking divides points by a power of time since submission, while also applying other factors such as flags, anti-abuse systems, demotions, account/site weighting, and moderator action.¹

Log-scaled velocity

Raw attention often follows a power law: a few items get enormous numbers. So platforms often compress the signal.

S = \frac{\log(1+A)}{(1+t)^\alpha}

This keeps huge items ahead, but prevents them from crushing everything else.

This is usually a better “hotness” formula than plain:

S = \frac{A}{t}

because it rewards scale without making scale the only thing that matters.

Recent-window popularity

Instead of lifetime attention, count only a recent window.

S = A_r

where $A_r$ is recent attention.

Or normalize by window size:

S = \frac{A_r}{w}

where $w$ is the time window.

Examples:

\text{most viewed today}

\text{most streamed this week}

\text{most downloaded in the last 30 days}

Spotify’s daily and weekly charts are this kind of family, though Spotify also says it uses chart-eligible streams and filtering formulas to protect chart integrity; it does not simply expose raw app stream counts as chart counts.²

Momentum

Momentum compares the current period with the previous period.

S = \frac{A_r + 1}{A_p + 1}

where $A_p$ is previous-period attention.

Example:

S = \frac{\text{downloads this week}+1}{\text{downloads last week}+1}

This finds things that are accelerating.

Problem: small items can look extreme. Going from 1 to 20 is a $20\times$ jump, but it may still be tiny in absolute terms.

A safer version mixes ratio and volume:

S = \log(1+A_r)\frac{A_r+1}{A_p+1}

Trend detection

Trending is not just “popular.” It usually means “unusually active relative to expectation.”

S = \frac{A_r + 1}{E + 1}

where $E$ is expected attention.

If something normally gets 100 views/day and now gets 10,000, it is trending. If something normally gets 10 million views/day and now gets 10.5 million, it is popular but not necessarily trending.

Another version:

S = A_r - E

The ratio version favors surprise. The difference version favors large absolute surges.

Google Trends is a useful example of normalization: it divides search interest by total searches for the relevant geography and time range, then scales results from 0 to 100, so large regions do not automatically dominate raw volume rankings.³

Hotness

Hotness combines attention and freshness.

A simple hotness score:

S = \log(1+A) - \lambda t

Popularity pushes up. Age pulls down.

Another common form:

S = \frac{\log(1+A)}{(1+t)^\alpha}

This says: “large attention matters, but old attention decays.”

Classic Reddit-style hotness

The old open-source Reddit code had a “hot” formula based on vote balance, logarithmic scaling, and time. In simplified notation:

S = \operatorname{sign}(u-d)\log_{10}(\max(|u-d|,1)) + \frac{T}{45000}

where $u$ is upvotes, $d$ is downvotes, and $T$ is time since a reference epoch. This is specifically the archived open-source Reddit implementation, not a guarantee of current Reddit production ranking.⁴

The important idea: votes matter logarithmically, and time strongly affects ordering. This makes the ranking feel alive.

Time-decayed attention

Instead of using age directly, you can make every attention event fade over time.

S = \sum A_i e^{-\lambda \Delta t_i}

where each attention event $A_i$ contributes less as it gets older.

Plain language: a view today counts more than a view last month.

This is good when you have event-level data.

A simpler approximate version:

S = A_r + \beta A_p

where $0 < \beta < 1$ .

Example:

S = \text{attention this week} + 0.5 \times \text{attention last week}

Steam’s real-time Top Sellers use this general idea in a revenue context: Steam says it rolls up player spending from the trailing 24 hours and gives extra weight to spending in the last 3 hours, across base game purchases, DLC, and in-game transactions.⁵

Quality-adjusted popularity

Sometimes attention alone rewards clickbait. So platforms mix attention with satisfaction.

S = A q

where $q$ is a quality signal.

Examples of $q$ :

\text{like rate}

\text{rating}

\text{completion rate}

\text{return rate}

\text{positive vote share}

YouTube Charts disclose this kind of multi-signal logic: they consider view count, how quickly views are growing, where views come from, topic, age, and performance compared with recent uploads from the same channel; YouTube explicitly says the highest-view-count video is not necessarily ranked first.⁶

Confidence-adjusted ranking

This prevents tiny samples from winning too easily.

Bad ranking:

S = q

This lets an item with 2 perfect ratings beat an item with 10,000 very good ratings.

A Bayesian shrinkage version:

S = \frac{n}{n+k}q + \frac{k}{n+k}\bar{q}

where $n$ is sample size, $q$ is the item’s observed quality, $\bar{q}$ is the global average, and $k$ controls how much evidence you need before trusting the item.

Plain language: with little data, pull the score toward the average.

IMDb is an example of this family in spirit: IMDb says it publishes weighted vote averages rather than raw averages, that not all votes have the same impact, and that it does not disclose the exact method.⁷

Wilson score ranking

For up/down votes, a common confidence-based formula is the Wilson lower bound.

Let:

p = \frac{u}{n}

where $u$ is positive votes and $n$ is total votes.

Then:

S = \frac{ p + \frac{z^2}{2n} ------------------ z\sqrt{\frac{p(1-p)}{n}+\frac{z^2}{4n^2}} }{ 1+\frac{z^2}{n} }

This estimates a conservative lower bound for true positive rate.

Use it when you want “best-rated with enough evidence,” not merely “highest average rating.”

Evan Miller’s “How Not To Sort By Average Rating” popularized this for web rankings, and the archived Reddit code includes a confidence sort using the Wilson method; Stack Overflow also discussed the same family of sorting methods for comments/answers.⁸

Category-normalized popularity

Raw popularity is unfair across categories.

S = \frac{A}{\bar{A}}

where $\bar{A}$ is average attention in that category.

Example: a niche item with 10,000 downloads may be huge in its category, while a general consumer app with 10,000 downloads may be irrelevant.

A velocity version:

S = \frac{A/t}{\bar{A}/\bar{t}}

Use this for “popular relative to peers.”

Spotify’s Local Pulse is a real-world example of relative popularity: Spotify says Local Pulse shows songs uniquely popular in a city relative to their overall popularity.²

Composite ranking

Most mature platforms do not use one pure formula. They combine signals.

S = a\log(1+A) + b\log(1+A_r) + cq - d\log(1+t)

where $a,b,c,d$ are weights.

Then platforms add penalties:

S = S - \text{spam penalty} - \text{abuse penalty} - \text{duplicate penalty}

Product Hunt is explicit that its homepage leaderboard changes based on upvotes, comments, time since submission, and other factors, while withholding exact details to reduce gaming.⁹

Amazon’s book sales ranking is also a composite/decayed-relative system: Amazon says rankings reflect recent and historical activity, recent activity is weighted more heavily, ranks are relative to other books, and rank can change even if the item’s own activity stays constant.¹⁰

Examples out in the wild

Platform	Ranking type	Disclosed logic
Reddit Hot, classic open-source version	Hotness	Vote balance, log scaling, and time term.⁴
Hacker News	Hotness / age decay	Points divided by a power of time, plus flags, anti-abuse, demotions, weighting, moderator action.¹
Product Hunt	Launch hotness	Upvotes, comments, time since submission, and undisclosed anti-gaming factors.⁹
YouTube Charts	Trending / hotness	View count, growth speed, traffic source, topic, video age, channel-relative performance, safety filters.⁶
GitHub Trending	Developer attention	The public page exposes total stars/forks and “stars today”; GitHub does not publish a full ranking formula there.¹¹
Google Trends	Normalized search interest	Search interest divided by total searches for that time/place, scaled 0–100.³
Spotify Charts	Stream popularity with filtering	Chart-eligible streams; local charts; Local Pulse is city popularity relative to overall popularity.²
Steam Top Sellers	Revenue hotness	Trailing 24h revenue, with extra weight on last 3h; includes DLC and in-game transactions.⁵
Amazon Best Sellers Rank	Decayed relative sales/activity	Recent and historical activity, recent activity weighted more heavily, rank relative to peers.¹⁰
IMDb ratings	Weighted reputation	Weighted vote averages, not raw averages; exact method undisclosed.⁷

The design choices

Every attention-ranking system chooses answers to these questions:

Choice	Meaning
What counts as attention?	Views, downloads, stars, likes, votes, sales, comments, plays, installs.
Is old attention still valuable?	Use lifetime totals if yes; decay if no.
Do you care about speed?	Use velocity or recent-window ranking.
Do you care about surprise?	Use trending vs expected baseline.
Do you care about quality?	Mix in ratings, retention, completion, votes, reviews.
Do you need confidence?	Use Bayesian shrinkage or Wilson scoring.
Do categories differ?	Normalize within category, geography, language, genre, or cohort.
Can it be gamed?	Add anti-spam filters, trust weighting, duplicate penalties, anomaly detection.
Is it public or personalized?	Public rankings use global signals; feeds use global signals plus user relevance.

Practical formula families

For all-time popularity:

S = A

For average popularity over lifetime:

S = \frac{A}{t}

For hotness:

S = \frac{\log(1+A)}{(1+t)^\alpha}

For recent popularity:

S = A_r

For momentum:

S = \frac{A_r+1}{A_p+1}

For trending:

S = \frac{A_r+1}{E+1}

For quality-adjusted popularity:

S = A q

For confidence-adjusted quality:

S = \frac{n}{n+k}q + \frac{k}{n+k}\bar{q}

For relative category popularity:

S = \frac{A}{\bar{A}}

For a practical general-purpose front page:

S = a\log(1+A) + b\log(1+A_r) + cq - d\log(1+t)

Then apply filters and penalties.

Best mental model

There are three core ranking concepts:

\textbf{Popular: } S = A

“Has accumulated a lot of attention.”

\textbf{Hot: } S = \frac{\log(1+A)}{(1+t)^\alpha}

“Has a lot of attention for its age.”

\textbf{Trending: } S = \frac{A_r+1}{E+1}

“Is getting more attention than expected.”

Most real platforms are combinations of these, with normalization, confidence adjustment, and anti-gaming rules layered on top.

</slop>

@onusoz · /2026/06/21 · 03:18 AM View on

If you are in AI, just don’t be anon here I see a bunch of anon accounts posting great local model content… what’s the point of being anon? To seem cool? Most of those accounts are not doing anything illegal, so there is no point. It would add so much more legitimacy to your work if you just put your real face and not a slop or anime girl pfp it only makes sense for those who are abliterating models. otherwise, it makes you seem sus just put your real face anon

@onusoz · /2026/06/20 · 05:07 PM View on

I created an LLM leaderboard based on Hugging Face download and like counts, grouped, filtered and time-averaged. Top 5 downloads is shared by @Alibaba_Qwen and @googlegemma 👑🤝👑 Top 5 likes, on the other hand also includes @deepseek_ai V4 Pro 👑 Even @OpenAI makes it to #8 top downloads with gpt-oss-20b 👑 qwen3-6-35b-a3b is the second most CIRCULATED LLM of this year, with an average of 21 million downloads per month, since the day it was released 2 months ago 📈📈📈 Despite first place belonging to 8mo old qwen3-vl-2b-instruct, the highlight belongs to the mid-sized MoE model, which has hit a size/performance sweet spot so hard that it absolutely 💥 SHATTERED 💥 Hugging Face leaderboards in the 2 months since it has launched qwen3-6-35b-a3b is followed closely by its dense sibling 27b --- and then the mid-sized gemma 4 models 26b-a4b and 31b Note that a model's distribution is inversely proportional to its size, but not strictly! Usefulness plays a factor as well, since gemma 4 26b-a4b is being downloaded more than the smaller gemma 4 e4b I created this leaderboard because Hugging Face's all time highest downloads and likes did not give me enough information about what is really popular, neither today, nor all-time. I wanted something in between How do I calculate this ranking? - Get models that with n_downloads >= 100k - Exclude models older than 1 year - Deduplicate and group quantizations and variants of the same model based on slug prefix heuristics - For each group, sum up total downloads of all time - Sort by descending total_downloads / age = average_downloads_per_day (can also sort w.r. to likes per month) - Repeat every day to get the most up to date ranking More info and source on the leaderboard page, hosted on a Hugging Face space: osolmaz-leaderboard.hf.space This is a work in progress, please reply below if you see a model that should be there is missing, or any other mistakes

Image hidden

@onusoz · /2026/06/20 · 10:26 AM View on

I just deleted an earlier post about ranking of HF models based on their total downloads because I made an error Will post with updated values soon

@onusoz · /2026/06/20 · 10:01 AM View on

gemma-4-26b-a4b is the most CIRCULATED LLM of recent history, with an average of 126k downloads per day, since the day it was released 3 months ago Top 10 is shared by Qwen and Gemma, with DeepSeek V4 Pro coming in close 🤝 Note that a model's distribution is inversely proportional to its size, but not strictly! Usefulness plays a factor as well, since gemma 4 26b-a4b is being downloaded more than the smaller gemma 4 e4b I created this leaderboard because Hugging Face's all time highest downloads and likes did not give me enough information about what is really popular *these last few months* How do I calculate this ranking? - Get models that with n_downloads >= 100k - Exclude models older than 1 year - Sort by descending total_downloads / age = average_downloads_per_day (can also sort w.r. to likes per month) - Deduplicate quantizations etc. of the same model based on slug prefix heuristics More info and source on the leaderboard page, hosted on a Hugging Face space: osolmaz-leaderboard.hf.space

Image hidden

@onusoz · /2026/06/20 · 02:17 AM View on

I need better UI/UX on queueing messages to agents. I want to be able to: switch the order of queued messages pause the queue edit any message that are still in the queue undo steer messages in the few seconds they are being sent I want more visual emphasis on the queue, like a Queue View I can toggle, that puts the queue at the center I want this in all the UIs and coding agents, codex CLI, desktop, moshi... especially while on the phone

@onusoz · /2026/06/20 · 02:06 AM View on

saving the world from AI cartelization for fun and profit

@mervenoyann · Jun 19, 2026

you don't know but @huggingface is bunch of hobbyists asked to do what they like full-time you can't beat someone who's having fun

@onusoz · /2026/06/19 · 01:44 AM View on

herdr is my terminal now, both local and remote. highly recommend

@herdrdev · Jun 18, 2026

still leaving your laptop open so the agent doesn't die? still hand-rolling tmux + ssh + notifications? still can't check on it from your phone? you don't have to. try herdr.dev

@onusoz · /2026/06/18 · 06:09 AM View on

16x parallel Gemma-4-26B-A4B-NVFP4 runs 🤯🤯🤯 18 output tokens/s, aggregate 300 tok/s 🫪 1 DGX Spark with 128 GB unified memory Concurrency so high I had to demo it programmatically It can go up to 32 even! 🤯 But then my screen would not have been readable for you And this is not even using flashinfer yet! Please reply if you know whether support is on the way Note that this is not dumb e4b or e2b that you can run on the average laptop. This is the big Gemma MoE Model link: huggingface.co/nvidia/Gemma-4-26B-A4B-NVFP4

@onusoz · /2026/06/18 · 03:44 AM View on

accurate

@RhysSullivan · Jun 17, 2026

there's this valley of despair when programming with agents you either have to be an inference maximalist or minimalist, but anywhere in between you just get slop and pain

Image hidden

@onusoz · /2026/06/17 · 04:06 PM View on

I did some math, and running my Nvidia GB10 workstation (Asus GX10) costs me maximum: 12~13 USD / month or 150~160 USD / year It is a little bit above half the price of ChatGPT plus subscription. For that, I get to run models that can fit in 128 GB of memory How I calculated: You can see how much power your apartment uses in Singapore in half-hourly resolution. We turned off all devices and A/C while we sleep, and got only the fridge and the GB10 remaining From that, we see it uses around 80-100 Watt while I was running an inference workload overnight. So this is like an upper bound I take it as 90 Watt. Electricity here costs 0.25 SGD / kWh 0.09 * 0.25 * 24 * 30 * (SGD/USD conversion rate) = 12~13 USD / month = 150~160 USD / year Local models are getting very good now, small ones roughly around GPT 5.x-mini level. This workstation makes all sorts of workloads possible for me that would otherwise cost a ton on the API It is also my always on workstation that works overnight. I use Codex for my work, and my workstation is always running agents. It never sleeps. I never have to worry about keeping my laptop lid open. I connect and monitor the agents anytime on my phone using mosh and herdr We have crossed a threshold. Running local models is cheaper than a big token sub for quite a few workloads already. If you are running a business, that makes a difference The localening is here

@onusoz · /2026/06/17 · 04:52 AM View on

THE LOCALENING IS HERE

@mitchellh · Jun 16, 2026

We've gone really quickly from "local models are dogshit" to "local models are good actually" (like, a 12 month window from A to B). I don't think they're actually good ENOUGH yet. We need an Opus 4.5 quality local model. When that happens, I think the world will spill over. Opus 4.5 is/was amazing, and is more than good enough for almost all tasks still as long as you pair with a frontier-level planner/judge. It'll still require a hugely expensive machine to run it, I'm sure, like a $5K or more laptop or mac studio. But, that's going to be pennies compared to the API costs plus all the benefits of guaranteed privacy and so on.

@onusoz · /2026/06/17 · 03:55 AM View on

New agent benchmark alert: SkillsBench

@xdotli · Jun 16, 2026

A big pain point in using AI benchmarks is encountering errors after its first release. Today, we're releasing SkillsBench 1.1, the first benchmark for how well AI agents use skills, now audited end to end and verified error-free. Prof. @dawnsongtweets joins 1.1 as advising author. We worked through every task with several frontier labs to eliminate the errors in the previous version. We also added new tasks, moved the ones with external dependencies into a separate set so the core suite runs clean, and expanded coverage to more models. Capability is climbing fast. The best with-skills resolution rate rose from ~36% (Claude Sonnet 4.5, Sep 2025) to 67% (GPT-5.5, May 2026), about +1.9 points per month. The frontier is hill-climbing SkillsBench fast. The right skills still matter. Across the fleet, curated skills lift resolution rate by +16.6 points on average (33.9% → 50.5%), and by as much as +25.7 points for a single model. The top configuration is GPT-5.5 on OpenHands at 67.3%. By popular demand (thx Nate @cursor_ai), we're now tracking skills invocation: how often an agent actually uses the skills it's given. Recent flagship configurations invoke them 90–99% of the time (Codex 99%, OpenHands + GPT-5.5 92%, Gemini CLI 90%), versus roughly 50% for older setups. Also new in 1.1: @OpenHands joins as a fourth harness, alongside Claude Code, Codex, and Gemini CLI; a rebuilt leaderboard with refined categories, subdomain skill rankings, and Skill Lift; and native task . md on BenchFlow, with multi-scene environments and rollout branching. We also partnered with @k_dense_ai to add scientific skills to some science tasks. One implication for deployment: skills can substitute for scale. GLM 5.1 with skills (58.4%) outperforms Opus 4.8 without (45.7%). A smaller model with the right procedural knowledge can beat a larger one running without it. Huge thanks to @nick_kango @ivanleomk @kaggle @GoogleDeepMind for hosting a launch event with us. Thanks for everyone who's come on May 27! Also thanks to our partners @gneubig @OpenHandsDev @ivanburazin @daytonaio @jackminong @johannes_hage @PrimeIntellect @TimothyKassis @k_dense_ai for providing support in credits, compute, and skills. SkillsBench live leaderboard will also come to @ValsAI. Many people have told us they use SkillsBench as an index to measure models' agentic capability over diverse and high GDP value domains. Great work on Valkyrie as well! @ Jarett @nikilravi @langstonnashold @RayanKrishnan SkillsBench is fully open-source. Explore the leaderboard and tasks, read the docs, or contribute your own skill set or harness and join the leaderboard. 🧵

Image hidden

@onusoz · /2026/06/16 · 05:19 PM View on

Link to model: huggingface.co/nvidia/Qwen3.6…

@onusoz · /2026/06/16 · 05:06 PM View on

Click open GitHub PRs and issues directly in the side pane in @herdrdev, instead of having to go to the browser. As many issues and PRs as you want, WITH TABS! Install ghzinga herdr plugin and just ctrl+click the link: github.com/osolmaz/ghzinga Thanks @lumendriada for sneaking in the ability to capture link clicks 2 days after I requested it! God I love open source...

@onusoz · /2026/06/16 · 04:05 PM View on

Hugging Face buckets are very literally, actually, 100%, a game changer note that I never ever use that phrase

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/16 · 12:45 PM View on

Trying to copy wrapped URLs is a pain not only in ghostty/iterm2 but also in mobile apps like Moshi On the laptop it’s fine because I can select rectangular area and edit it, or make the window bigger On the phone, its’s impossible. Fingers too big, too much of a hassle Should a terminal emulator try to detect these? It already detects herdr. What do you think @odd_joel

Image hidden

@onusoz · /2026/06/16 · 09:11 AM View on

nvidia/Qwen3.6-35B-A3B-NVFP4 running in vLLM nightly on my Nvidia GB10 is actually insane 50 tok/s, 4 concurrent generations. total 200 tok/s. ideal for spawning subagents or working in parallel its tool calling behavior is very good as well. I will be giving it test drive on an openclaw instance, and keep you posted More details on NVIDIA forum: forums.developer.nvidia.com/t/benchmark-report-…

@onusoz · /2026/06/15 · 07:00 AM View on

Current average generation speeds for local DeepSeek-V4-Flash-Q2, highest to lowest: Mac Studio M3 Ultra: 32 tok/s MacBook Pro M5 Max: 30 tok/s Apple ??? M4 Max: 25 tok/s MacBook Pro M3 Max: 24 tok/s Mac Studio M2 Ultra: 22 tok/s NVIDIA DGX Spark / GB10: 13 tok/s It seems macs' higher memory bandwidth is contributing here, though I'm not sure if GB10 performance could be improved (I do hope so, I have one!)

@antirez · Jun 14, 2026

If you need AI to do a search for you in the real world, ds4-agent is basically SOTA, because it can access the web sites without any limitations given that it uses your local Chrome browser (no, not in headless mode, that's the trick...), and DeepSeek v4 is great at search.

@onusoz · /2026/06/15 · 05:54 AM View on

We have local Deep Research Now we just need to index the whole internet to have local ChatGPT 😅

@antirez · Jun 14, 2026

If you need AI to do a search for you in the real world, ds4-agent is basically SOTA, because it can access the web sites without any limitations given that it uses your local Chrome browser (no, not in headless mode, that's the trick...), and DeepSeek v4 is great at search.

@onusoz · /2026/06/15 · 05:21 AM View on

Btw, TTS has come such a long way, @GoogleDeepMind cooked with gemini-3.1-flash-tts I gave Codex my google credentials and it oneshotted the Gemini TTS implementation When I built this 4 years ago, Azure TTS used to be SOTA. Then @ElevenLabs came in and raised the bar super high. Now Google is going after their lunch with controllable expressiveness at scale. I cheer for both! Here is Manim Voiceover demo from 4 years ago with Gemini TTS (sound on)

@onusoz · Jun 14, 2026

I major concern I have these days is, while I author code in languages I cannot manually code, are they any good? Over years, I have worked with a number of languages: C, C++, Fortran, MATLAB, JavaScript But Python was my go-to language since more than 10 years. Well that changed last summer So while I have strong opinions on how Python code, should be, conventions and all, I don't have so strong opinions on other languages. That means I am producing slop by default in Rust, Go and TypeScript To solve that problem, I created github.com/osolmaz/slophammer Its aim is to be "the only tool and resource your agent needs, to minimize slop" It is inspired by the recent bathrobe rants of @unclebobmartin, a.k.a. the author of clean code It enforces a minimum test coverage, maximum cyclomatic complexity, mutation tests, code style across different languages But I have a major issue: How do I know that Slophammer itself isn't slop? One way is to implement and use it for Python, the language I know better, and judge what kind of changes it enforces So for this weekend experiment, I used Slophammer to refactor, improve coverage and merge new features to one of my old Python projects, Manim Voiceover github.com/ManimCommunity/manim-voiceover The result is... mixed. We now have types everywhere, which is great. But the constraints have also made it write garbage code like this one. It works fine, even though it's not elegant. The new feature also works What do you think? Does code still need to be aesthetically pleasing to the human eye? Should it still be human readable? If an agent writes slop in the forest, and there is no-one to read it, is it still slop? If anything, I should use its output in Python to reason about other languages, and add more and more constraints. The more the constraints, the less the slop

Image hidden

@onusoz · /2026/06/15 · 04:19 AM View on

OpenClaw is sooooo useful for staying on top of things

Image hidden

@onusoz · /2026/06/14 · 05:26 PM View on

I major concern I have these days is, while I author code in languages I cannot manually code, are they any good? Over years, I have worked with a number of languages: C, C++, Fortran, MATLAB, JavaScript But Python was my go-to language since more than 10 years. Well that changed last summer So while I have strong opinions on how Python code, should be, conventions and all, I don't have so strong opinions on other languages. That means I am producing slop by default in Rust, Go and TypeScript To solve that problem, I created github.com/osolmaz/slophammer Its aim is to be "the only tool and resource your agent needs, to minimize slop" It is inspired by the recent bathrobe rants of @unclebobmartin, a.k.a. the author of clean code It enforces a minimum test coverage, maximum cyclomatic complexity, mutation tests, code style across different languages But I have a major issue: How do I know that Slophammer itself isn't slop? One way is to implement and use it for Python, the language I know better, and judge what kind of changes it enforces So for this weekend experiment, I used Slophammer to refactor, improve coverage and merge new features to one of my old Python projects, Manim Voiceover github.com/ManimCommunity/manim-voiceover The result is... mixed. We now have types everywhere, which is great. But the constraints have also made it write garbage code like this one. It works fine, even though it's not elegant. The new feature also works What do you think? Does code still need to be aesthetically pleasing to the human eye? Should it still be human readable? If an agent writes slop in the forest, and there is no-one to read it, is it still slop? If anything, I should use its output in Python to reason about other languages, and add more and more constraints. The more the constraints, the less the slop

Image hidden

@onusoz · /2026/06/14 · 08:44 AM View on

This is why I love this site, open collaboration!

@LakshyAAAgrawal · Jun 13, 2026

@onusoz Thanks a lot for highlighting the documentation issue here @onusoz. Tracking the issue here (github.com/gepa-ai/gepa/i…) and fix on the way too. In general, the recommendation is to have at least 15 proposals with GEPA, which means max_metric_calls should be set to 16*|valset|.

@onusoz · /2026/06/14 · 06:51 AM View on

I got the names for all future models Anthropic will release By asking ChatGPT “Cool sounding names that mean a work of literature” Codex is one of them 💀

Image hidden

@onusoz · /2026/06/13 · 10:51 AM View on

This is what I have been feeling recently as well, looking at models write code better and faster than me

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/13 · 05:51 AM View on

Dabbling in GEPA. Codex's /goal on GPT 5.5 high is still surprisingly reward-hacking I had set a /goal before I slept to implement a plan. It ended the loop after doing just 1 iteration It feels like the model is following the path of least resistance and slacking off. Though it could also be me putting "try to make good progress in 8 hour's time" in the prompt, can't be sure Lesson: When you are doing such a solver loop, always specify min_iter and max_iter

Image hidden

@onusoz · /2026/06/13 · 02:31 AM View on

I knew disappointment was around the corner, the flicker company being the flicker company The last time I paid them from my pocket was September 2025 It’s supposed to not be their fault, but still…

@onusoz · /2026/06/12 · 05:30 PM View on

Such a small model, but so good at roleplaying (at least in this weird context)

Image hidden

@onusoz · /2026/06/12 · 01:58 PM View on

This is gpt4o material, high risk of being oneshotted:( We defienitely have gpt4o locally

Image hidden

@onusoz · /2026/06/12 · 01:18 PM View on

Too bad it can't render latex

Image hidden

@onusoz · /2026/06/12 · 01:14 PM View on

Gemma chooses the Aesthetic Path

Image hidden

@onusoz · /2026/06/12 · 01:10 PM View on

Experimenting with SOUL.md on gemma4-26b-a4b (running on @DeepInfra) Interesting that such a lightweight model can already run such a conversation in openclaw harness @GoogleDeepMind cooked here

Image hidden

@onusoz · /2026/06/12 · 08:22 AM View on

don’t focus on the word “loop” so much, focus on “verifiability” writing a loop is trivial. what makes the loop work is that there is a verifiable goal with a clear signal of success vs failure verifiable = loopable

@onusoz · /2026/06/12 · 02:27 AM View on

@maddada We've lost @thekitze now 😭 x.com/thekitze/statu…

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/11 · 04:33 AM View on

Slopus -> Fabulous you've got to give it to anthropic...

@onusoz · /2026/06/10 · 02:22 PM View on

"Dogfooding caught the dogfooder" Fable 5 has a sense of humor

Image hidden

@onusoz · /2026/06/10 · 02:19 PM View on

What did @karpathy see / was shown? Why did the benefactor and teacher of the whole ML ecosystem join Anthropic, a company the polar opposite of his image, on the eve of such a powerful model release It can't be purely money Did he reckon that the only way to benefit humanity was to be on the inside, or rather, to not be left outside, of whatever is brewing in there?

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/10 · 01:54 PM View on

I swear to god, let this be a joke If this is a joke, it is not funny Anthropic

Image hidden

@onusoz · /2026/06/10 · 01:16 PM View on

To clarify, I was apparently on the enterprise team plan, which is I think equivalent to Pro (1x) plan

@onusoz · /2026/06/10 · 01:14 PM View on

It’a been a little bit over 1 year since Anthropic released their Max plans and Claude Sonnet and Opus 4, thus making Claude Code affordable and kickstarting the agentic revolution Opus 4 was a glimpse into the future. I’ve spent the entire summer swearing at it and typing ultrathink Today, Fable 5 feels like another step change I no longer need to type ultrathink. And no longer need to swear at Anthropic models. Only at their marketing team.

@onusoz · Jun 1, 2025

The models, they just wanna work. They want to build your product, fix your bugs, serve your users. You feed them the right context, give them good tools. You don’t assume what they cannot do without trying, and you don’t prematurely constrain them into deterministic workflows.

@onusoz · /2026/06/10 · 07:52 AM View on

Fable burned through my 5 hour quota, and then automatically fell back to usage credits without asking. Org settings I suppose It was burning through 1 usd every few seconds It burned through 66 usd before I reacted. Yeah, this is not affordable for anyone with that API pricing, without subsidy/plan

Image hidden

@onusoz · /2026/06/10 · 07:22 AM View on

Ok now it appears among available modes in the shift+tab mode cycle

Image hidden

@onusoz · /2026/06/10 · 06:57 AM View on

Speaking of loops, I have renamed my implementation-loop skill from earlier this year to autoimplement, because it's shorter Calling skills that loop auto-x, auto-y makes them more memorable than calling them x-loop, y-loop But it also increases the number of keystrokes you have to type, before you can tab-complete them Alas, I like still this more github.com/osolmaz/tools/tree/main/agents/skill…

Image hidden

@onusoz · /2026/06/10 · 06:13 AM View on

when your model is a more decent, thoughtful being than your marketing team

Image hidden

@onusoz · /2026/06/10 · 05:11 AM View on

Ok so there is auto mode which they introduced back in March, but apparently they are not so confident in it that it's still in experimental mode and not easily findable in settings code.claude.com/docs/en/permis…

@onusoz · /2026/06/10 · 04:38 AM View on

To YOLO with Fable 5, or not to YOLO, that is the question... The last time I left, Claude models still had tendencies to rm -rf your home folder or delete stuff without asking first. Is this still a risk? And from the looks of it, Claude Code still doesn't have Codex's LLM-filtered approval gate feature. Or am I missing something? Please enlighten your fellow Claude noob 😇

Image hidden

@onusoz · /2026/06/10 · 04:06 AM View on

The masculine urge to create your own agent multiplexer

@onusoz · Jun 2, 2026

🙏 Daily prayer 🙏 Thank you lord for giving me the restraint to not build my own agent multiplexer 🙏 Amen 🙏

@onusoz · /2026/06/10 · 03:47 AM View on

We've lost another brother @maddada to agent multiplexers 🫂 x.com/maddada/status…

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/10 · 03:26 AM View on

Just in time for a lot of Codex-default developers going back to Claude Code momentarily to try out Fable 5 Here is a CLAUDE.md -> AGENTS.md symlinker that should save you from the hurdles of obstinate Anthropic conventions It installs a hook that creates the CLAUDE.md symlink automatically as Claude Code traverses directories that contain AGENTS.md, automatically ignored by git No need to create CLAUDE.md with reference to AGENTS.md like Anthropic suggests. It just works github.com/osolmaz/claude-md-symlinker

Image hidden

@onusoz · /2026/06/09 · 02:15 PM View on

TUIs can be easy! look at what right-click does in @herdrdev refreshing to see something that works with both the keyboard and the mouse. and all this would not have been possible without @ratatui_rs

Image hidden

@onusoz · /2026/06/09 · 06:08 AM View on

Question to my ghostty-savvy friends I am trying to reproduce the Quake style dropdown experience I have been using since 2010 on ghostty on mac here. nothing works quite as well as iterm2 yet I tried ghostty quick terminal mode. good but it doesn't let me open multiple tabs I tried cmux because it ships ghostty anyway and is supposed to have more features. but its system-wide hotkey is not playing well with aerospace and window focus iterm2 worked perfectly. tap control double and I'm in the terminal. is there anything that replicates this UX

@onusoz · /2026/06/08 · 02:55 PM View on

I feel like there are 6 people left here not using gpt5 for their posts It is not a simple epidemic. It is the whole world becoming illiterate

@onusoz · /2026/06/08 · 06:15 AM View on

Link: github.com/osolmaz/ghz…

@onusoz · /2026/06/08 · 06:15 AM View on

ghzinga can now show multiple PRs/issues in tabs natively, no need to create a new pane in tmux/herdr also, you can tell your agent to open all the relevant issues/PRs in a side pane using it, and it should work seamlessly it's the open source maintainer's best friend. life is too short to juggle 100 tabs in chrome, why not have it right next to codex!

@onusoz · /2026/06/06 · 05:02 PM View on

just vibe-checking all these models is a full-time job 🫪

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/05 · 08:16 AM View on

LM Studio in my menu bar is giving me some serious nostalgia

Image hidden

@onusoz · /2026/06/04 · 08:27 AM View on

new open tts model, demos are eerily good

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/03 · 04:57 PM View on

Here is the source, I called it ghzinga. You can click click click by default (unlike gh dash, which is still awesome in itself) For just viewing single issues/PRs github.com/osolmaz/ghz…

@onusoz · /2026/06/03 · 04:54 PM View on

Like, so tired of this

Image hidden

@onusoz · /2026/06/03 · 04:52 PM View on

.@herdrdev is cool. I am tired of doing back and forth with github in the browser, so I created my own clickable PR/issue viewer, inspired by gh-dash put that in the left pane, codex on the right. saves me so much time

Image hidden

@onusoz · /2026/06/03 · 04:30 PM View on

@OpenAI Extra ironic that this is tweet was AI generated x.com/DuckDuckGo/sta…

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/03 · 04:04 PM View on

Wait did anyone think otherwise? lol 128 GB unified memory, 20 cores, "Spark" in the name... I didn't watch the presentation. Maybe because of that I directly inferred that it's the same chip

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/03 · 02:57 PM View on

@OpenAI 😩😩😩 x.com/DrYukselUrun/s…

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/02 · 03:57 PM View on

F for the fallen brother 🫡 x.com/dani_akash_/st…

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/02 · 10:28 AM View on

RIP 🙏 x.com/yushen686/stat…

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/02 · 08:55 AM View on

🙏 RIP, we’ve lost another brother to agent multiplexers. Amen 🙏

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/02 · 04:27 AM View on

🙏 Daily prayer 🙏 Thank you lord for giving me the restraint to not build my own agent multiplexer 🙏 Amen 🙏

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/01 · 04:18 PM View on

sounds about right

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/01 · 04:17 PM View on

we'll have a linux laptop running ds4 flash and co. !!!

Quoted post

Quoted post was not retrieved.

@onusoz · /2026/06/01 · 04:07 PM View on

@github x.com/onusoz/status/…

@onusoz · Jun 1, 2026

Thank you @ashleywolf for helping me personally, I really appreciate it! The account was reinstated less than 1 hour of posting this! The whole company must be working hard to make github scale in an era of crazy demand and growth!

@onusoz · /2026/06/01 · 04:00 PM View on

Update: The account has been reinstated! Thank you @github

@onusoz · /2026/06/01 · 03:47 PM View on

Thank you @ashleywolf for helping me personally, I really appreciate it! The account was reinstated less than 1 hour of posting this! The whole company must be working hard to make github scale in an era of crazy demand and growth!

@onusoz · Jun 1, 2026

I am a paying customer of github. I have a team account with 2 seats, one for me, and one for my agent. I have been paying for more than a year now I do this because I treat my agent's workstation as a lower trust machine, and do not allow merging to main in certain repos I have been working on a tool that calls github's graphql API. today, my agent's account username:dutifulbob got suspended for no reason what am I supposed to do now? put my main account on my openclaw instance? I applied to reinstate, it appears it might take weeks to enable it back??? Maybe don't pull such things on your long term paying customers @github??

Image hidden

@onusoz · /2026/06/01 · 02:43 PM View on

I am a paying customer of github. I have a team account with 2 seats, one for me, and one for my agent. I have been paying for more than a year now I do this because I treat my agent's workstation as a lower trust machine, and do not allow merging to main in certain repos I have been working on a tool that calls github's graphql API. today, my agent's account username:dutifulbob got suspended for no reason what am I supposed to do now? put my main account on my openclaw instance? I applied to reinstate, it appears it might take weeks to enable it back??? Maybe don't pull such things on your long term paying customers @github??

Image hidden

@onusoz · /2026/06/01 · 11:24 AM View on

🙏 Thank you lord for giving me the resolve and patience to not build my own agent multiplexer Amen 🙏

Quoted post

Quoted post was not retrieved.