NewsAI ModelsVideo AIMay 21, 202612 min read

The last three months in AI: Video models

OpenAI killed Sora. The Chinese open-weight stack took the top three on the arena. And a 10-second clip now starts at fifty cents.

By Atul

Spring 2026 · Video models

Feb 20 to May 20

OpenAI killed Sora. The Chinese open-weight stack took the top three, and the floor on a 10-second clip fell to fifty cents.

Top 3 · Feb 20

Top 3 · May 20

Sora 2 (OpenAI)

HappyHorse-1.0 (Alibaba)

Veo 3.1 (Google)

Seedance 2.0 720p (ByteDance)

Hailuo 2.3 (MiniMax)

Kling 3.0 Omni (Kuaishou)

The biggest video-AI story of the quarter happened on a Sunday in late April. On April 26, 2026, OpenAI shut down Sora: the app, the iOS experience, and the consumer product all gone in a single Help Center notice. The Sora 2 model and Videos API are slated to follow on September 24. Disney, which had reportedly signed a billion-dollar partnership with the product, found out less than an hour before the public announcement. Seven months earlier, Sora 2 had been the model every other lab measured itself against.

On the same Sunday, almost to the hour, a 15B-parameter Alibaba video model called HappyHorse-1.0 went live on fal and held its #1 spot on the Artificial Analysis Video Arena, where it had been parked anonymously for nineteen days. That is the quarter in miniature: the American consumer video product turned off and the Chinese open-weight stack took the top three slots on the leaderboard. If you only have an hour for a video-AI update this quarter, spend it with Wan 2.7, Hunyuan Video 1.5, and the Veo 3.1 Lite pricing card. Those three explain almost everything else.

A close-up of a movie clapperboard, slate-style markings on top. — Quarterly roundup, clapperboard style. Photo by Leuchtturm Entertainment on Unsplash.

April 26 was the sliding door

OpenAI’s reasoning for the shutdown was never stated in detail. The reporting points to economics: burn estimated around one million dollars a day in inference costs against thin and shrinking usage, a peak-to-trough user collapse from roughly a million to under five hundred thousand, and the cancellation of the Disney deal. The cost of generating cinematic-tier video at consumer scale, on closed-source infrastructure, turned out to be more than the market was ready to pay. The same quarter, Tencent shipped an open-weight 6-second 720p generator that runs in 75 seconds on a single RTX 4090. That contrast is the story.

The rest of the quarter rhymes. Two American flagships went quiet on the video side (OpenAI by walking away, Meta by never showing up at the frontier) while five Chinese labs shipped genuinely new models, two of them open-weight. The leaderboard’s top three at publication are Alibaba, ByteDance, and Kuaishou; the highest-ranked American model on the no-audio board is xAI’s Grok Imagine, at #4. Twelve months ago, the same board read Sora, Veo, and Runway. That is a structural shift, not a one-quarter blip.

What shipped: three flagships, three Chinese bets

Three labs shipped new flagship video models in the window. The bets are almost orthogonal: one targets the pro suite, one targets the leaderboard, one targets your laptop.

Three new flagships · Apr 1 to May 4

Model

Lab

Date

The bet

Wan 2.7

Alibaba

Apr 1–6

Open-weight under Apache 2.0, four-model suite (T2V, I2V, ref-to-video with voice clone, instruction edit), API from $0.10/sec.

HappyHorse-1.0

Alibaba (ATH)

Apr 7

Closed beta, took #1 on the Video Arena on day one and held it. 1080p, lip-sync in seven languages, weights promised under Apache 2.0.

Hunyuan Video 1.5

Tencent

May 4

8.3B params, open-weight, 6-second 720p clips in 75 seconds on a single RTX 4090. A serious video model that runs on the gaming PC under your desk.

Wan 2.7 arrived as a four-model suite between April 1 and April 6, announced by Alibaba on the 6th and live on Model Studio and wan.video by April 22. The headline is breadth: text-to-video, image-to-video, a reference-to-video model with voice cloning, and instruction-based video editing: all four Apache 2.0, all four supporting native audio synced inside the diffusion pass, all four exposed via API from $0.10 a second. It also ships a “Thinking Mode” that plans composition before generating, the same chain-of-thought-for-pixels pattern that OpenAI shipped on the image side in April.

HappyHorse-1.0 appeared on the Video Arena anonymously on April 7. By the 10th, CNBC, Bloomberg, and The Information had confirmed Alibaba as the lab behind it, specifically the Future Life Lab inside Alibaba’s Taotian (Tmall + Taobao) group. It is a 15B-parameter unified single-stream transformer that generates video and synced audio from a single prompt, supports lip-sync across seven languages, and runs at roughly 38 seconds for a 1080p clip on a single H100. The team has committed to an Apache 2.0 weight release on GitHub; as of publication, the timing is unstated. It has held #1 on the no-audio arena since its first day. No public model has unseated it.

Hunyuan Video 1.5 is the one that matters most for the long arc. Tencent shipped it on May 4 with full weights and code for Windows and Linux. The headline number is the hardware floor: 8.3B parameters, 6 seconds of 720p video, 75 seconds of inference on a single RTX 4090. That is not a frontier flagship (HappyHorse and Seedance are both visibly ahead on the leaderboard), but it is the first credible open-weight video model that runs on a gaming PC, which is a different category of consequence. The case for running these on your own machine on the video side moved from “next year” to “today” in the space of one release.

A working film set with a camera and crew gathered around a shot. — The cost of standing up a set fell faster than the cost of the talent on it. Photo by Jakob Owens on Unsplash.

What got better: everyone’s mid-tier leveled up

Below the headline launches, the pricing tier compressed and the avatar end of the stack matured fast.

Veo 3.1 Lite went live on Vertex AI and the Gemini API on March 31 at $0.05 a second: the lowest sticker price on any name-brand video API. Same week, Google added a standalone upscaling endpoint (1080p or 4K) that works on any video, AI-generated or not, and wired Veo 3.1 into Google Vids on April 2. For office-tier video (product demos, training clips, social cuts) the workflow is now “type a prompt into Vids,” not “export, edit, upload.”

Seedance 2.0’s global rollout on April 15 took ByteDance’s February launch international (CapCut and Dreamina integrations included) while pointedly skipping the US over the lingering TikTok divestment legislation. Seedance generates 15-second multi-shot clips with stereo dialogue, music, and ambient sound in a single pass and is the leading model on the audio side of the leaderboard.

HeyGen Avatar V shipped on April 8 with the first credible solution to identity drift on long video: a 0.840 face similarity score against the source actor across multi-minute scenes, and phoneme-level lip-sync in 175+ languages from a 15-second phone clip. Synthesia’s Express-2 engine ended the talking-head era for enterprise training in April with full-body gestures and micro-expressions at 1080p/30fps. Hedra opened Live Avatars at $0.05 a minute, with sub-100ms latency for streamed avatar video.

xAI’s Grok Imagine Quality Mode went general availability on April 3 and shipped an enterprise API on May 6, putting xAI on the no-audio leaderboard at #4: the only American lab in the top five. Pro mode at 1080p was announced for late spring.

Under the hood: four shifts that landed in the same quarter

The catalog aside, the more interesting story is what the labs collectively decided about how a video model should be built.

Audio is in the diffusion pass now. Kling 3.0, Wan 2.7, Seedance 2.0, HappyHorse-1.0, and Veo 3.1 all generate dialogue, ambient sound, and music as part of the same generation step that produces the frames. The post-production audio bolt-on (lip-sync a generated video to a separately generated audio track) is on the way out for new releases. The label every model now uses is “native audio,” and the quality jump versus 2025-era bolt-ons is large enough that the labels are deserved.

Thinking mode is migrating from text to video. Wan 2.7 ships an explicit planning pass that builds a composition plan before the diffusion model runs, exactly the same shape as the reasoning loops in OpenAI’s gpt-image-2 and the major LLM flagships. It is the first video model to expose the pattern in production, and three independent benchmark write-ups attribute its prompt-adherence win to the planning step rather than the backbone.

Open-weight video crossed the consumer-GPU line. Hunyuan Video 1.5’s 8.3B parameter count, FP8 quantisation, and selective tile attention together knock the floor down to a single 24GB consumer card. A 6-second clip in 75 seconds on a $1,500 GPU was not possible at any quality bar at the start of the quarter. It is now.

Real-time video-to-video became a thing. Decart’s MirageLSD re-renders streamed video at 20 FPS with under 100ms of latency: transform a live Zoom into a watercolour, re-skin a live gameplay feed, swap the season on a streaming camera. Mirage is more curiosity than tool today, but it is the first system to run a diffusion model in the live-streaming pipeline at all. Expect that envelope to expand.

A non-linear editing bay with a Premiere Pro timeline view filled with clips. — Under the hood, the pipeline collapsed. Audio, motion, edits, and identity now happen in a single forward pass. Photo by Peter Stumpf on Unsplash.

Trend lines: four patterns across the quarter

Across the catalog, four things rhyme. None of them are obvious from any single launch.

1. Cost per second collapsed. A year ago, generating a 1080p second of AI video without obvious artifacts cost roughly a dollar on Runway and $0.75 on Sora. Veo 3.1 Lite is $0.05. Wan 2.7 starts at $0.10. Kling 3.0 is $0.07 to $0.10 depending on tier. The cost of a one-minute clip in good quality dropped from roughly $60 in Q4 2025 to roughly $3 at the May 20 floor: about 95% in two quarters. The companion piece on cost per task rather than per token is even more applicable to video than to text: the right metric is cost per finished clip, and that number is moving fast.

2. The leaderboard turned Chinese. Of the top five models on the no-audio Video Arena on May 20, four are Chinese-lab work (HappyHorse, Seedance, two Kling tiers) and the fifth is xAI. On the audio side, the top three are HappyHorse, Seedance, and Kling. Veo 3.1 sits at #4. Sora is gone. Runway Gen-4.5 is outside the top ten on the public arena. Twelve months ago the same board read the opposite. This is the most decisively Chinese moment any AI modality has had since the original Qwen open-weight push.

3. Open weights are real on video now. Wan 2.7 is Apache 2.0 and HappyHorse-1.0 has committed to Apache 2.0 weights on GitHub. Hunyuan Video 1.5 runs on a single 4090. There is no longer a practical reason a serious team needs to keep a video API dependency for prototyping or small-batch generation. The supply chain for that capability now goes through Hugging Face rather than a vendor contract. Same trajectory as the text installment’s open-weight crossover a quarter earlier, three months behind it.

4. Avatars matured separately from T2V. HeyGen, Synthesia, Hedra, and ByteDance’s OmniHuman are now operating in a category that is functionally distinct from text-to-video: a real actor’s identity, voice, and gestures, driven by audio or script. Identity consistency, which is the open problem on the T2V side, is the solved problem on the avatar side. Marketing teams sometimes conflate the two; they shouldn’t. A talking-head avatar is for explainers, training, and personalised messaging. A T2V model is for everything else. The two stacks will probably converge by end of year, but as of May they are different products with different price cards.

Quiet quarter for

Three places were unusually silent. OpenAI shipped no new video model and discontinued its existing one; the next move, whatever it is, is not on the public roadmap. Meta shipped no public video work in the window; the Muse Spark turn at Superintelligence Labs has been text-first. Stability AI spent the quarter on 4D research and AMD-optimised builds; no headline T2V drop. Apple shipped on-device video editing improvements in Final Cut and the iOS 26.5 timeline, but no native on-device generation model. Pattern: the labs that were defining video in 2025 are the ones absent from the leaderboard in May 2026.

What to watch May to August

Four things are queued up and worth marking on a calendar.

HappyHorse-1.0 weights drop. Alibaba committed to Apache 2.0 weights on GitHub; the date is unstated. The day they land is the day open-weight catches the closed leaderboard top.

The Sora API shutdown on September 24. The Videos API is still live and still being billed against. Teams with active integrations have until September to migrate. The natural drop-ins are Veo 3.1 (cheaper, native audio, similar shape) or Wan 2.7 (open, half the price).

Veo 4 or whatever Google calls it. Veo 3.1 is six months old and Google’s release cadence has tightened. The text installment caught Gemini 3.5 Flash at I/O. A Veo successor is the obvious next shoe, with industry analysts giving it 70% odds for a summer drop.

EU AI Act enforcement on August 2. The EU AI Act requires AI-generated content to be machine-detectable as synthetic, and C2PA is the de facto interop format. For teams shipping AI-generated video into Europe, the next 70 days are when you start signing your output. The piece on regulations eating cloud AI applies to video with extra force: deepfake liability is the sharper edge of the same wedge.

The leaderboard, as of May 20

For calibration. The top three are tight; everyone outside the top five is closer to the median than to the leader.

Artificial Analysis Video Arena (without audio) · top 5, May 20 2026

Model

Lab

Elo

HappyHorse-1.0

Alibaba

1357

Seedance 2.0 (720p)

ByteDance

1273

Kling 3.0 1080p (Pro)

Kuaishou

1250

Grok Imagine Video

xAI

1233

Kling 3.0 Omni 1080p

Kuaishou

1232

Cost per second · snapshot, May 20 2026

Veo 3.1 Lite

Google · 720p, Vertex AI, Mar 31

$0.05

Wan 2.7

Alibaba · API, Apr 1–6

$0.10

Kling 3.0 (Pro)

Kuaishou · 1080p, on-platform

$0.10

Runway Gen-4.5 (fast)

Runway · Gen-4.5 API

$0.15

Veo 3.1 Standard

Google · Native audio, 1080p

$0.40

Sora 2

OpenAI · Retired Apr 26

$0.75

Bar width is inversely proportional to per-second cost; cheaper models read as fuller bars. Sora’s number is what it was on the day OpenAI took the product down.

A finished 10-second clip from any of those five would look, to the median viewer, broadly fine. The reasons to pick between them are now cost per second, identity consistency across cuts, licensing, availability under your jurisdiction, and which editor your shop already runs. That is a curation question more than a capability one, which is the case made in you don’t need every AI model. The companion pieces this quarter are the text installment and the image installment; together they cover the same Feb 20 to May 20 window across the three modalities most teams ship every week.

Next installment in this series: The last three months in AI, August 2026.