16x parallel Gemma-4-26B-A4B-NVFP4 runs 🤯🤯🤯

Post

@onusoz · /2026/06/18 · 06:09 AM View on

16x parallel Gemma-4-26B-A4B-NVFP4 runs 🤯🤯🤯 18 output tokens/s, aggregate 300 tok/s 🫪 1 DGX Spark with 128 GB unified memory Concurrency so high I had to demo it programmatically It can go up to 32 even! 🤯 But then my screen would not have been readable for you And this is not even using flashinfer yet! Please reply if you know whether support is on the way Note that this is not dumb e4b or e2b that you can run on the average laptop. This is the big Gemma MoE Model link: huggingface.co/nvidia/Gemma-4-26B-A4B-NVFP4