@onusoz · /2026/06/22 · 04:49 PM View on

gpt 5.5 is not naturally good at modeling and cannot create simplified nice mathematical models completely autonomously I did a parameter sweep with gemma-4-31b-a4b on memory usage, output tok/s etc. while varying context window, concurrency and other parameters. It took quite a few tries, and I still do not trust the model that gpt5 fit to the data besides, it measured linux cgroup memory and not the actual gpu memory used, so the whole sweep is wasted... output tok/s looks more accurate though, soon I will have a model that can give the optimal parameters over the space of context window <> concurrency <> tok/s <> memory usage off to do another run

@onusoz · /2026/06/22 · 07:10 AM View on

For my recent LLM leaderboard osolmaz-leaderboard.hf.space, I sum up all time total downloads (or likes) across model variants, and then divide it by the age of that model. I.e. "time decay" for popularity This gives a more time-agnostic metric for the popularity of that model. In an ideal ranking, older models that are not popular anymore should be demoted, like 2 year old Llama 3 models. If you don't do that, they might still occupy top 10 needlessly, despite having been replaced by e.g. qwen in practice Thanks to that, qwen-3-6b which came up 1 year ago and has 150m downloads can surpass llama-3-1-8b which came up 2 years ago and has 200m downloads More notes on my post: solmaz.io/popularity-ranking

Image hidden

@onusoz · /2026/06/22 · 06:58 AM View on

if you take the Most Downloaded Models of All Time, Llama 3.1 makes it to Top 10 with around 200 million total downloads (ranking is done w.r. to time-averaged downloads) RIP Llama, you walked so @googlegemma and @Alibaba_Qwen can run Also a reminder that if you build your branding on top of open weight models developed by big corps, you might eventually be the de facto owner of that brand if they pull the plug on it. Like llama.cpp @ggml_org Huge fumble by Meta

Image hidden

@onusoz · /2026/06/22 · 05:19 AM View on

My LLM leaderboard osolmaz-leaderboard.hf.space auto discovers different variants of model releases, even if they are not linked by base_model From this, I found out that @RedHat_AI was the first to release NVFP4 quantization for qwen3-6-35b-a3b Nice to see everything in one place

Image hidden

@onusoz · /2026/06/22 · 02:24 AM View on

"big token will bless me with free tokens today inshallah" is not a healthy mindset to nurture don't get me wrong, I love the subsidies and the memes

@thsottiaux · Jun 21, 2026

Image hidden

Onur Solmaz · Post · /2026/06/22

Popularity ranking cheatsheet

I recently did some work ranking models on Hugging Face. While doing that, I remembered some concepts I had known years ago from studying recommender systems. But I couldn’t find any personal notes from that time.

So I’m leaving this cheatsheet for my future self, if I ever need it again.

The main idea with popularity metrics is that it is proportional to e.g. total likes/views, and inversely proportional to the time passed to accumulate those likes/views. A lot of different platforms came up with many different ways to calculate this. And while you can model this in a certain way that maximizes some imaginary objective, what ends up being implemented first is the cheapest/most efficient algorithm.

Below are some examples, generated by GPT 5.5 xhigh.

<slop>

The abstract problem is:

\text{rank items by scarce attention}

A platform has many items and a limited front page. It needs to decide what deserves visibility now. That is usually not the same as “best,” “most useful,” or “most popular all time.”

A clean taxonomy:

\text{popular} = \text{received a lot of attention}

\text{hot} = \text{received a lot of attention recently}

\text{trending} = \text{receiving more attention than expected}

Here $A$ means attention: views, downloads, likes, votes, streams, sales, stars, comments, clicks, etc.

Raw popularity

This is the simplest ranking.

S = A

Use it when you want “biggest ever.”

Examples: most downloaded, most viewed, most sold, most starred.

Problem: old items dominate because they had more time.

Velocity

This measures speed of attention.

S = \frac{A}{t}

where $t$ is age.

Use it when you want “how fast is this spreading?”

A stricter version:

S = \frac{A}{(1+t)^\alpha}

If $\alpha = 1$ , this is close to attention per unit time. If $\alpha < 1$ , old items are penalized more gently. If $\alpha > 1$ , new items are favored aggressively.

This family is close to what Hacker News describes at a high level: HN says its basic ranking divides points by a power of time since submission, while also applying other factors such as flags, anti-abuse systems, demotions, account/site weighting, and moderator action.¹

Log-scaled velocity

Raw attention often follows a power law: a few items get enormous numbers. So platforms often compress the signal.

S = \frac{\log(1+A)}{(1+t)^\alpha}

This keeps huge items ahead, but prevents them from crushing everything else.

This is usually a better “hotness” formula than plain:

S = \frac{A}{t}

because it rewards scale without making scale the only thing that matters.

Recent-window popularity

Instead of lifetime attention, count only a recent window.

S = A_r

where $A_r$ is recent attention.

Or normalize by window size:

S = \frac{A_r}{w}

where $w$ is the time window.

Examples:

\text{most viewed today}

\text{most streamed this week}

\text{most downloaded in the last 30 days}

Spotify’s daily and weekly charts are this kind of family, though Spotify also says it uses chart-eligible streams and filtering formulas to protect chart integrity; it does not simply expose raw app stream counts as chart counts.²

Momentum

Momentum compares the current period with the previous period.

S = \frac{A_r + 1}{A_p + 1}

where $A_p$ is previous-period attention.

Example:

S = \frac{\text{downloads this week}+1}{\text{downloads last week}+1}

This finds things that are accelerating.

Problem: small items can look extreme. Going from 1 to 20 is a $20\times$ jump, but it may still be tiny in absolute terms.

A safer version mixes ratio and volume:

S = \log(1+A_r)\frac{A_r+1}{A_p+1}

Trend detection

Trending is not just “popular.” It usually means “unusually active relative to expectation.”

S = \frac{A_r + 1}{E + 1}

where $E$ is expected attention.

If something normally gets 100 views/day and now gets 10,000, it is trending. If something normally gets 10 million views/day and now gets 10.5 million, it is popular but not necessarily trending.

Another version:

S = A_r - E

The ratio version favors surprise. The difference version favors large absolute surges.

Google Trends is a useful example of normalization: it divides search interest by total searches for the relevant geography and time range, then scales results from 0 to 100, so large regions do not automatically dominate raw volume rankings.³

Hotness

Hotness combines attention and freshness.

A simple hotness score:

S = \log(1+A) - \lambda t

Popularity pushes up. Age pulls down.

Another common form:

S = \frac{\log(1+A)}{(1+t)^\alpha}

This says: “large attention matters, but old attention decays.”

Classic Reddit-style hotness

The old open-source Reddit code had a “hot” formula based on vote balance, logarithmic scaling, and time. In simplified notation:

S = \operatorname{sign}(u-d)\log_{10}(\max(|u-d|,1)) + \frac{T}{45000}

where $u$ is upvotes, $d$ is downvotes, and $T$ is time since a reference epoch. This is specifically the archived open-source Reddit implementation, not a guarantee of current Reddit production ranking.⁴

The important idea: votes matter logarithmically, and time strongly affects ordering. This makes the ranking feel alive.

Time-decayed attention

Instead of using age directly, you can make every attention event fade over time.

S = \sum A_i e^{-\lambda \Delta t_i}

where each attention event $A_i$ contributes less as it gets older.

Plain language: a view today counts more than a view last month.

This is good when you have event-level data.

A simpler approximate version:

S = A_r + \beta A_p

where $0 < \beta < 1$ .

Example:

S = \text{attention this week} + 0.5 \times \text{attention last week}

Steam’s real-time Top Sellers use this general idea in a revenue context: Steam says it rolls up player spending from the trailing 24 hours and gives extra weight to spending in the last 3 hours, across base game purchases, DLC, and in-game transactions.⁵

Quality-adjusted popularity

Sometimes attention alone rewards clickbait. So platforms mix attention with satisfaction.

S = A q

where $q$ is a quality signal.

Examples of $q$ :

\text{like rate}

\text{rating}

\text{completion rate}

\text{return rate}

\text{positive vote share}

YouTube Charts disclose this kind of multi-signal logic: they consider view count, how quickly views are growing, where views come from, topic, age, and performance compared with recent uploads from the same channel; YouTube explicitly says the highest-view-count video is not necessarily ranked first.⁶

Confidence-adjusted ranking

This prevents tiny samples from winning too easily.

Bad ranking:

S = q

This lets an item with 2 perfect ratings beat an item with 10,000 very good ratings.

A Bayesian shrinkage version:

S = \frac{n}{n+k}q + \frac{k}{n+k}\bar{q}

where $n$ is sample size, $q$ is the item’s observed quality, $\bar{q}$ is the global average, and $k$ controls how much evidence you need before trusting the item.

Plain language: with little data, pull the score toward the average.

IMDb is an example of this family in spirit: IMDb says it publishes weighted vote averages rather than raw averages, that not all votes have the same impact, and that it does not disclose the exact method.⁷

Wilson score ranking

For up/down votes, a common confidence-based formula is the Wilson lower bound.

Let:

p = \frac{u}{n}

where $u$ is positive votes and $n$ is total votes.

Then:

S = \frac{ p + \frac{z^2}{2n} ------------------ z\sqrt{\frac{p(1-p)}{n}+\frac{z^2}{4n^2}} }{ 1+\frac{z^2}{n} }

This estimates a conservative lower bound for true positive rate.

Use it when you want “best-rated with enough evidence,” not merely “highest average rating.”

Evan Miller’s “How Not To Sort By Average Rating” popularized this for web rankings, and the archived Reddit code includes a confidence sort using the Wilson method; Stack Overflow also discussed the same family of sorting methods for comments/answers.⁸

Category-normalized popularity

Raw popularity is unfair across categories.

S = \frac{A}{\bar{A}}

where $\bar{A}$ is average attention in that category.

Example: a niche item with 10,000 downloads may be huge in its category, while a general consumer app with 10,000 downloads may be irrelevant.

A velocity version:

S = \frac{A/t}{\bar{A}/\bar{t}}

Use this for “popular relative to peers.”

Spotify’s Local Pulse is a real-world example of relative popularity: Spotify says Local Pulse shows songs uniquely popular in a city relative to their overall popularity.²

Composite ranking

Most mature platforms do not use one pure formula. They combine signals.

S = a\log(1+A) + b\log(1+A_r) + cq - d\log(1+t)

where $a,b,c,d$ are weights.

Then platforms add penalties:

S = S - \text{spam penalty} - \text{abuse penalty} - \text{duplicate penalty}

Product Hunt is explicit that its homepage leaderboard changes based on upvotes, comments, time since submission, and other factors, while withholding exact details to reduce gaming.⁹

Amazon’s book sales ranking is also a composite/decayed-relative system: Amazon says rankings reflect recent and historical activity, recent activity is weighted more heavily, ranks are relative to other books, and rank can change even if the item’s own activity stays constant.¹⁰

Examples out in the wild

Platform	Ranking type	Disclosed logic
Reddit Hot, classic open-source version	Hotness	Vote balance, log scaling, and time term.⁴
Hacker News	Hotness / age decay	Points divided by a power of time, plus flags, anti-abuse, demotions, weighting, moderator action.¹
Product Hunt	Launch hotness	Upvotes, comments, time since submission, and undisclosed anti-gaming factors.⁹
YouTube Charts	Trending / hotness	View count, growth speed, traffic source, topic, video age, channel-relative performance, safety filters.⁶
GitHub Trending	Developer attention	The public page exposes total stars/forks and “stars today”; GitHub does not publish a full ranking formula there.¹¹
Google Trends	Normalized search interest	Search interest divided by total searches for that time/place, scaled 0–100.³
Spotify Charts	Stream popularity with filtering	Chart-eligible streams; local charts; Local Pulse is city popularity relative to overall popularity.²
Steam Top Sellers	Revenue hotness	Trailing 24h revenue, with extra weight on last 3h; includes DLC and in-game transactions.⁵
Amazon Best Sellers Rank	Decayed relative sales/activity	Recent and historical activity, recent activity weighted more heavily, rank relative to peers.¹⁰
IMDb ratings	Weighted reputation	Weighted vote averages, not raw averages; exact method undisclosed.⁷

The design choices

Every attention-ranking system chooses answers to these questions:

Choice	Meaning
What counts as attention?	Views, downloads, stars, likes, votes, sales, comments, plays, installs.
Is old attention still valuable?	Use lifetime totals if yes; decay if no.
Do you care about speed?	Use velocity or recent-window ranking.
Do you care about surprise?	Use trending vs expected baseline.
Do you care about quality?	Mix in ratings, retention, completion, votes, reviews.
Do you need confidence?	Use Bayesian shrinkage or Wilson scoring.
Do categories differ?	Normalize within category, geography, language, genre, or cohort.
Can it be gamed?	Add anti-spam filters, trust weighting, duplicate penalties, anomaly detection.
Is it public or personalized?	Public rankings use global signals; feeds use global signals plus user relevance.

Practical formula families

For all-time popularity:

S = A

For average popularity over lifetime:

S = \frac{A}{t}

For hotness:

S = \frac{\log(1+A)}{(1+t)^\alpha}

For recent popularity:

S = A_r

For momentum:

S = \frac{A_r+1}{A_p+1}

For trending:

S = \frac{A_r+1}{E+1}

For quality-adjusted popularity:

S = A q

For confidence-adjusted quality:

S = \frac{n}{n+k}q + \frac{k}{n+k}\bar{q}

For relative category popularity:

S = \frac{A}{\bar{A}}

For a practical general-purpose front page:

S = a\log(1+A) + b\log(1+A_r) + cq - d\log(1+t)

Then apply filters and penalties.

Best mental model

There are three core ranking concepts:

\textbf{Popular: } S = A

“Has accumulated a lot of attention.”

\textbf{Hot: } S = \frac{\log(1+A)}{(1+t)^\alpha}

“Has a lot of attention for its age.”

\textbf{Trending: } S = \frac{A_r+1}{E+1}

“Is getting more attention than expected.”

Most real platforms are combinations of these, with normalization, confidence adjustment, and anti-gaming rules layered on top.

</slop>