Is anyone able to run nvidia/Qwen3.6-35B-A3B-NVFP4 with the config suggested in the readme?
It OOMs before it can start serving
huggingface.co/nvidia/Qwen3.6…
I can't believe I'm asking GPT to use Claude to review
It's almost as if there is a 9-month cornercutting cycle, a two-body problem between openai and anthropic
I keep seeing insanely expensive builds giving insanely impressive results
These results don't matter
What matters is, whether one can:
- run a "SOTA level" model, whatever that is
- with under 32gb VRAM or unified memory
- in 5 parallel sessions
- with 50~100 tok/s each
- with enough leeway memory for other applications
- in a system as cheap as $1000
That is our goalpost
That is the threshold when open source AI will win