OpinionAI ModelsProductMay 17, 20269 min read

You don't need every AI model. You need the right five.

2.88 million models on Hugging Face. 359 on the leaderboard. You can do almost any job with five. The case for curation over catalogs.

By Atul

Three numbers that frame the picker

2.88M

models hosted on the Hugging Face Hub as of May 2026, up from one million in late 2024.

huggingface.co/models ↑

10×

purchase lift when Iyengar's classic jam study cut the tasting display from 24 jars to 6.

Iyengar & Lepper, JPSP 2000 ↑

models you actually need to do almost every job you reach for AI to do.

This post, end of ↑

Quick. Name the best image model for product photography. Name the best video model for an eight-second cinematic B-roll. Name the best text model for refactoring a TypeScript file. If you had to Google any of them, the dropdown lost.

The dominant pitch from AI tools in 2026 is we give you access to everything. Every provider, every version, every fine-tune. That sounds like generosity; it’s actually a bill. You pay it in decision fatigue, in stale “best model for X” trivia that rots in three weeks, in time spent A/B-ing two options for a task where either would have been fine. A deliberately small, hand-picked list of “use this for this right now” beats a 400-row dropdown every single time. This post is about why.

A wall of vinyl records arranged in tight rows in a small record shop, dense and selectively curated. — The Criterion Collection of vinyl: thousands of records exist; a good shop sells a few hundred. Photo by Florencia Viadana on Unsplash.

The catalog is genuinely absurd now

Start with the headline number. Hugging Face hosts 2,883,687 models as of May 2026, up from one million in late 2024, and two million in mid-2025. Most are forks, quantizations, and fine-tunes nobody will run. But the “serious” sub-list isn’t small either: the Artificial Analysis leaderboard currently tracks 359 large language models, 224 of them with open weights. And that’s before you even touch image, video, audio, or music.

Five modalities · the “serious” sub-list

Modality

What you’d see in the dropdown

The shape of the picker

Text / chat / code

GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, Grok 4, DeepSeek V3.2, Llama 4, Mistral Large, Qwen 3, Command R+, Phi-4

359 LLMs tracked on the Artificial Analysis leaderboard. Source ↑

Image

Flux 2 Pro, Midjourney V8, Imagen 4, Ideogram 3, GPT Image 2, DALL-E 3, SD 3.5, Adobe Firefly, Recraft v3

Different leader for photorealism, stylized art, and text-in-image. Source ↑

Video

Veo 3.1, Kling 3.0, Runway Gen-4.5, Luma, Hailuo, Pika

Veo all-rounder, Runway for fine control, Kling for value. Source ↑

Music / audio

Suno v5, Udio v4, ElevenLabs Music, Stable Audio, AIVA

Suno leads on vocals; Udio leads on producer-grade inpainting. Source ↑

Voice / TTS

ElevenLabs v3, OpenAI Voice, Play.ht, Resemble, OpenVoice

Different winner for emotion, for cloning, for very long form. Source ↑

You don’t need to read the table line by line. The point is the feeling. Eight or twelve “serious” options in every modality you might reach for, each with a partisan blog post arguing it’s the best, each with weekly updates that nudge the ranking around the top three. Stand in front of that wall on a Tuesday afternoon with a thing you need to ship by 4pm, and the catalog becomes the problem.

“Access to everything” is a tax

The pitch is intuitive: more choice equals more power. The behavioral research has been pushing back on that intuition for twenty-five years. In 2000, Sheena Iyengar and Mark Lepper set up two tasting displays in a Menlo Park grocery store: one with 24 jams, one with 6. The 24-jar table drew bigger crowds. The 6-jar table converted them at ten times the rate (30% bought a jar versus 3%). Bigger menu, more attention, fewer decisions made. Barry Schwartz turned that finding into a book; UX designers turned the underlying mechanism into Hick’s Law: the time it takes to decide grows with the number of options you can see.

That’s the textbook version. The lived version is worse, because AI picker dropdowns have three properties that compound the problem:

The right answer changes constantly. The Artificial Analysis leaderboard reshuffled its top ten three times in Q1 2026 alone; even a routine vote-pipeline change at LMSYS in January shifted Elo scores by 30+ points on models that hadn’t changed at all. A list you compiled six weeks ago is already wrong.
Marginal differences dominate the picker. For probably 80% of real tasks you’d open the dropdown for, the top three models in a modality produce output you couldn’t tell apart in a blind test. The picker pretends they’re different. Mostly they aren’t.
Default paralysis hits new users hardest. A power user can ignore 90% of the dropdown. A first-time user freezes. The “we have 47 providers” pitch is the worst onboarding experience in software.

Every decision you make at the model picker is a decision you didn’t make about your actual work. That’s the tax. It looks free because nobody invoices you for it.

Curation is the actual product

The categories that figured out this problem long ago all share a shape. They aggressively exclude.

The Criterion Collection has shipped roughly 1,500 titles since 1984. Netflix has more films in any given month. Criterion is more valued. Wirecutter publishes one pick per category, not a comparison sheet. Pitchfork’s Best New Music is a column, not a directory. The Hacker News front page is thirty items. In each case, the product isn’t the catalog; the product is the curation applied to the catalog. The taste is the value-add. The aggressive exclusion is the feature.

A chef plating a small composed dish at a restaurant pass, narrow focus on the food. — Five courses, picked. Not forty-seven dishes on a buffet line. Photo by Jay Wennington on Unsplash.

Apply the same lens to AI. A tool that says “here are 47 image models, good luck” is the Netflix back catalog. A tool that says “for product shots use this; for stylized covers use that; for poster typography use the third one” is the Criterion shelf. The first feels generous and is exhausting. The second feels narrow and is liberating. Restaurant menus that exceed forty items usually do it because the kitchen wants to seem competent at everything; every chef who has worked one tells you it’s the worst dish on the menu that defines your reputation, not the best.

The frontier moves. Maintenance is the work.

Anyone can publish a list. The hard part is rotating it. Flux 2 Pro leapfrogged Midjourney v6 on photorealism in late 2025; six months earlier the answer was inverted. Veo 3.1 took the cinematic video crown from Sora; OpenAI shipped GPT-5.5 in March 2026 and reclaimed the Intelligence Index top spot from Claude. Claude Opus 4.7 then took back the SWE-bench Pro lead at 64.3%. Anyone running a serious coding workflow on the “best” model from January 2026 is already wrong.

Same job, three different answers in twelve months

Job

May 2025

Nov 2025

May 2026

General writing & chat

Claude 3.5 Sonnet

GPT-5

Claude Opus 4.7 / GPT-5.5

Code refactoring

Claude 3.5 Sonnet

Claude Sonnet 4

Claude Opus 4.7 (SWE-bench Pro 64.3%)

Photorealistic image

Midjourney v6

Flux 1.1 Pro

Flux 2 Pro

Cinematic video clip

Runway Gen-3

Sora

Veo 3.1

Song with vocals

Suno v3.5

Suno v4

Suno v5

A list maintained by people whose actual job is to watch the frontier is wildly more valuable than the same list scraped quarterly. This is why the model-picker pattern fails as a UX choice: the dropdown is a snapshot, but the underlying truth is a moving target. Curation has to be a job, not a one-time table in a launch blog post.

What a curated list looks like in May 2026

Concrete is better than abstract. Here’s the small list, by job, as of the day this post went up. It will be wrong by the time you read it, and that’s the point. The list rotates. What doesn’t rotate is the shape of the list: one pick per job, six jobs, justified in a sentence.

The short list · one pick per job · May 2026

Job

Pick today

Why

Writing, chat, reasoning

Claude Opus 4.7

Highest-quality long-form prose; leads SWE-bench Pro at 64.3% so it doubles as the code model.

Quick lookups & cheap iteration

Gemini 3.1 Flash

Frontier-adjacent quality at sub-$2 / million tokens. Use when latency or budget matters more than the last 4%.

Photo, product shot, hyperreal

Flux 2 Pro

Best photorealism, camera-accurate optics. Beats Midjourney for anything that needs to look like a photograph.

Stylized art, posters, covers

Midjourney V8

Distinct aesthetic + native 2K. Still the right call when the brief is 'make it look intentional'.

Cinematic video clip (up to 10s)

Veo 3.1

Strongest all-rounder. Prompt adherence, 4K, native audio. Reach for Kling 3.0 only when you need many cheap iterations.

Song with vocals

Suno v5

Vocal quality bar in 2026. Use ElevenLabs Music if you specifically need cleared commercial rights for YouTube.

Sources: Artificial Analysis: GPT-5.5 leadership, Midjourney V8 vs Flux 2026, Best AI video generator 2026, Suno vs Udio vs ElevenLabs 2026.

That’s it. Six jobs, six picks. No model picker. No decision-fatigue tax. The trade-off is real (you don’t get to A/B Claude against Gemini for your daily writing) and for almost everyone, almost all the time, the trade-off is a giveaway. The cost of standing in front of the wall every Tuesday vastly outweighs the cost of using a model that is 4% behind the current leader on a benchmark you don’t actually run. The right question isn’t “which model is best?” The right question is “what am I trying to do?” A good tool answers the second.

A spotlit museum vitrine holding three small artefacts on a clean white plinth. — A museum vitrine with three objects. The wall behind it would hold a hundred. Photo by Grant Ritchie on Unsplash.

Where curation is the wrong call

Curation is for the 80%, not everyone. Three honest cases where the narrow list breaks down:

You have a genuinely specialist need. A medical fine-tune. A Japanese-first text model. A regulated provider on a specific compliance list. A music tool with cleared commercial rights for YouTube monetization: ElevenLabs Music ships with the Merlin and Kobalt deals that Suno doesn’t. The curated default fails these jobs; you need the broader catalog.
Editorial bias is real and worth naming. Any curator’s picks reflect criteria: output quality, latency, cost, availability, safety profile, license terms. Publish the criteria, not just the picks. Readers should be able to disagree with the rubric, not just the choices it produces. We’ve laid ours out in the modality field guide.
Curation can calcify. The mitigation is transparency about when the list was last reviewed and what changed. A curated list with no date stamp ages worse than a dropdown.

Pick a job. Not a model.

This is the bet we made with CSuite. Instead of bolting on every provider’s catalog, we ship a deliberately small, hand-picked roster across text, image, video, and audio. The picks rotate when something genuinely better arrives. You don’t tune dials; you pick a job and the tool routes you to the model that’s currently best for it. It’s the same posture that moving AI local is starting to put on the rest of the stack: fewer moving parts, owned by you, designed to disappear.

Spotify won by having every song, the argument goes; surely AI should copy the move. Spotify won by having every song and Discover Weekly, an editorial algorithm whose entire purpose is to do the picking for you. The model picker is the part Spotify quietly replaced. The catalog isn’t the product. The taste is.

Next Tuesday at 4pm, you’ll have a thing to ship. The tool that helps you ship it isn’t the one with the longest dropdown. It’s the one where you don’t see a dropdown at all.

You don't need every AI model. You need the right five.

The catalog is genuinely absurd now

“Access to everything” is a tax

Curation is the actual product

The frontier moves. Maintenance is the work.

What a curated list looks like in May 2026

Where curation is the wrong call

Pick a job. Not a model.

Sora vs Veo vs Kling in 2026: one shutdown, one successor, one survivor

ByteDance models with real examples: Seedream and Seedance

Most AI apps are wrappers, and you're paying the markup

One-time payment. Yours forever.