Local models for real-time triage

Post

Local models for real-time triage

@onusoz · /2026/06/23 · 01:34 PM View on

New blog post: Using local models for agentic zero-shot classification, in real-time, high frequency triage If you have a 128gb of memory for models (a DGX spark like I do for example), you can create a real time classifier and notifier for yourself that can classify more than >20 items per minute, using mid-sized @googlegemma and @Alibaba_Qwen models, with over 200-300 output tok/s aggregate throughput Like processing new tweets on twitter, issues/prs on github, messages on telegram and discord, in real-time Over the past few weeks, I have built one for myself, to filter and get notified about local model related issues on the OpenClaw repo I initially thought gemma-4-e4b would give me the best tradeoff I was wrong. I learned that if one has enough memory already, one should not bother with <10b models like gemma4 e4b or e2b. Precision and recall were much higher zero-shot with gemma-4-26b-a4b, whereas the smaller e4b needed significant prompt optimization to eventually not perform nearly as good To provide more context to the model, I created a restricted bash-like shell, called reposhell. In that shell, it can run read-only commands to ls/find/grep/cat openclaw source code, but only that. When the PR description/diffs are not clear enough as to categorize it, the agent reads the code to figure it out Because small models can get prompt injected, and I need to make sure that someone can't harm my setup by creating a malicious issue or PR in the openclaw repo I found that for specific systems like this, it is very convenient to extend and bundle Pi. You can create agentic CLI tools that work fully locally and for free, and keep that separate from your main pi coding setup. localpager-agent has its own session dir and tools, and I ensure that it will run local models in a secure way by isolating it from my main pi setup Once localpager-agent categorizes a PR/issue as local_models and related labels, I automatically receive it as a notification on Discord The whole implementation is fully open source and MIT licensed, alongside the dataset we used to benchmark the performance I believe zero-shot agentic classification running on local hardware will find many use cases across a wide variety of business applications, like news gathering, open source software development, customer support, content moderation, sales and so on Agents increase the amount of information produced in a lot of systems, and hence we will need to set up cheap ways to wrangle all that information In times where governments can cut off access to SOTA models on a whim, it is more important than ever to build your business on open models and if possible, run them on your own hardware! Big thanks to @evalstate and @ben_burtenshaw for their valuable feedback, especially with helping me evaluate this more rigorously! One take-away is that categorizing contributions in an open source repo is a *hard* problem, and that it is not trivial to reliably create a golden dataset with LLMs, for evaluation purposes Read more here: huggingface.co/blog/local-models-pr-triage