One sweep over 100 samples takes around 4 hours.

Post

@onusoz · /2026/06/22 · 04:49 PM View on

gpt 5.5 is not naturally good at modeling and cannot create simplified nice mathematical models completely autonomously I did a parameter sweep with gemma-4-31b-a4b on memory usage, output tok/s etc. while varying context window, concurrency and other parameters. It took quite a few tries, and I still do not trust the model that gpt5 fit to the data besides, it measured linux cgroup memory and not the actual gpu memory used, so the whole sweep is wasted... output tok/s looks more accurate though, soon I will have a model that can give the optimal parameters over the space of context window <> concurrency <> tok/s <> memory usage off to do another run
@onusoz · /2026/06/23 · 11:26 AM View on

One sweep over 100 samples takes around 4 hours. Next up: cross reference ground truth with predictions from hf-mem by @alvarobartt github.com/alvarobartt/hf…

Image hidden