If you are interested in running such demos, look into --demo mode in my local model swiss army... | Onur Solmaz blog

Post

@onusoz · /2026/06/23 · 04:50 PM View on

If you are interested in running such demos, look into --demo mode in my local model swiss army knife localpi github.com/osolmaz/loc… Thank you @googlegemma for the shoutout

@googlegemma · Jun 23, 2026

16 parallel runs of Gemma 4 26B A4B on a single NVIDIA DGX Spark! Pushing 18 tok/s per instance and a 300 tok/s aggregate. It can even hit 32 parallel runs. This level of concurrency highlights how efficient the architecture is.