Post
gpt 5.5 is not naturally good at modeling and cannot create simplified nice mathematical models completely autonomously I did a parameter sweep with gemma-4-31b-a4b on memory usage, output tok/s etc. while varying context window, concurrency and other parameters. It took quite a few tries, and I still do not trust the model that gpt5 fit to the data besides, it measured linux cgroup memory and not the actual gpu memory used, so the whole sweep is wasted... output tok/s looks more accurate though, soon I will have a model that can give the optimal parameters over the space of context window <> concurrency <> tok/s <> memory usage off to do another run
One sweep over 100 samples takes around 4 hours. Next up: cross reference ground truth with predictions from hf-mem by @alvarobartt https://t.co/45WN0jtEQjImage hidden