Entries for June 23, 2026

@onusoz · /2026/06/23 · 04:50 PM View on

If you are interested in running such demos, look into --demo mode in my local model swiss army knife localpi github.com/osolmaz/loc… Thank you @googlegemma for the shoutout

@googlegemma · Jun 23, 2026

16 parallel runs of Gemma 4 26B A4B on a single NVIDIA DGX Spark! Pushing 18 tok/s per instance and a 300 tok/s aggregate. It can even hit 32 parallel runs. This level of concurrency highlights how efficient the architecture is.

@onusoz · /2026/06/23 · 03:15 PM View on

i meant to qt this one x.com/herdrdev/statu…

@herdrdev · Jun 22, 2026

herdr plugin marketplace is live! herdr.dev/plugins/ if you'd like to share your plugins with other people, all you have to do is add "herdr-plugin" to your github repo's topic list, and it will be automatically added to the website!

Image hidden

@onusoz · /2026/06/23 · 03:13 PM View on

Huge. Added my github TUI github.com/osolmaz/ghz…

@herdrdev · Jun 15, 2026

herdr 0.7.0 is out, and it's a major one: it introduces plugins! the idea is simple: herdr stays lean, and everything custom gets extended through plugins. shareable, scoped, built however you want, to fit your own flow. with this release we're also shipping a few examples of what the plugin system can do. first up: a telegram plugin. herdr already controls your agents and knows their status, so the plugin just hooks into agent events and pings telegram the moment one needs you. notification lands → `herdr --remote` or ssh from your phone → straight back to the agent that needs you.

Image hidden

@onusoz · /2026/06/23 · 02:29 PM View on

Will talk about my recent adventures running local models on the Spark, Pi and OpenClaw Click Notify Me on youtube to stay tuned

@ben_burtenshaw · Jun 23, 2026

the livestream will be here: youtube.com/watch?v=wRcByx… and on @huggingface x profile.

@onusoz · /2026/06/23 · 01:34 PM View on

New blog post: Using local models for agentic zero-shot classification, in real-time, high frequency triage If you have a 128gb of memory for models (a DGX spark like I do for example), you can create a real time classifier and notifier for yourself that can classify more than >20 items per minute, using mid-sized @googlegemma and @Alibaba_Qwen models, with over 200-300 output tok/s aggregate throughput Like processing new tweets on twitter, issues/prs on github, messages on telegram and discord, in real-time Over the past few weeks, I have built one for myself, to filter and get notified about local model related issues on the OpenClaw repo I initially thought gemma-4-e4b would give me the best tradeoff I was wrong. I learned that if one has enough memory already, one should not bother with <10b models like gemma4 e4b or e2b. Precision and recall were much higher zero-shot with gemma-4-26b-a4b, whereas the smaller e4b needed significant prompt optimization to eventually not perform nearly as good To provide more context to the model, I created a restricted bash-like shell, called reposhell. In that shell, it can run read-only commands to ls/find/grep/cat openclaw source code, but only that. When the PR description/diffs are not clear enough as to categorize it, the agent reads the code to figure it out Because small models can get prompt injected, and I need to make sure that someone can't harm my setup by creating a malicious issue or PR in the openclaw repo I found that for specific systems like this, it is very convenient to extend and bundle Pi. You can create agentic CLI tools that work fully locally and for free, and keep that separate from your main pi coding setup. localpager-agent has its own session dir and tools, and I ensure that it will run local models in a secure way by isolating it from my main pi setup Once localpager-agent categorizes a PR/issue as local_models and related labels, I automatically receive it as a notification on Discord The whole implementation is fully open source and MIT licensed, alongside the dataset we used to benchmark the performance I believe zero-shot agentic classification running on local hardware will find many use cases across a wide variety of business applications, like news gathering, open source software development, customer support, content moderation, sales and so on Agents increase the amount of information produced in a lot of systems, and hence we will need to set up cheap ways to wrangle all that information In times where governments can cut off access to SOTA models on a whim, it is more important than ever to build your business on open models and if possible, run them on your own hardware! Big thanks to @evalstate and @ben_burtenshaw for their valuable feedback, especially with helping me evaluate this more rigorously! One take-away is that categorizing contributions in an open source repo is a *hard* problem, and that it is not trivial to reliably create a golden dataset with LLMs, for evaluation purposes Read more here: huggingface.co/blog/local-models-pr-triage

@onusoz · /2026/06/23 · 11:26 AM View on

One sweep over 100 samples takes around 4 hours. Next up: cross reference ground truth with predictions from hf-mem by @alvarobartt github.com/alvarobartt/hf…

Image hidden

@onusoz · /2026/06/23 · 03:01 AM View on

gpt5.5 and most other models are very bad at one-shotting nice data models gpt5.5 also has this annoying property that once it decides for a schema (or any design), it's very hard to trigger thinking again. and if you ask to "rewrite from scratch", it will write create something even more ridiculous To solve this problem, I have built a meta-harness over codex just for simplifying slop data models called schemator (work in progress) Basic idea: it mimics what I myself do while I am designing a schema: scrutinize and question each field one by one It starts a fresh codex session for each field with a fixed prompt like "Try to come up with the most Lindy data model" + a prompt for side notes It does that with a fresh context for each field, so that they are independent from each other. At the end of a review run over a field, the reviewer can propose to keep, rename or remove the field When all fields are reviewed once, that makes one iteration. Then this is looped over until the review results stabilize, and do not propose any further changes I get better results by just asking my agent to "use schemator on this" after it creates a JSON schema or SQL table Give it a try if you have codex! It has a skill, so should be easy for an agent to figure out how to use github.com/osolmaz/schemator

Image hidden