Current average generation speeds for local DeepSeek-V4-Flash-Q2, highest to lowest:
Mac Studio M3 Ultra: 32 tok/s
MacBook Pro M5 Max: 30 tok/s
Apple ??? M4 Max: 25 tok/s
MacBook Pro M3 Max: 24 tok/s
Mac Studio M2 Ultra: 22 tok/s
NVIDIA DGX Spark / GB10: 13 tok/s
It seems macs' higher memory bandwidth is contributing here, though I'm not sure if GB10 performance could be improved (I do hope so, I have one!)
Btw, TTS has come such a long way, @GoogleDeepMind cooked with gemini-3.1-flash-tts
I gave Codex my google credentials and it oneshotted the Gemini TTS implementation
When I built this 4 years ago, Azure TTS used to be SOTA. Then @ElevenLabs came in and raised the bar super high. Now Google is going after their lunch with controllable expressiveness at scale. I cheer for both!
Here is Manim Voiceover demo from 4 years ago with Gemini TTS (sound on)