Alibaba’s HappyHorse AI Video Generator Just Topped the World

What Happened: Alibaba’s HappyHorse-1.0 appeared anonymously on global video benchmarks on April 7, 2026, climbed to #1 in text-to-video and image-to-video, then Alibaba claimed it on April 10. With Sora discontinued and Seedance paused over copyright disputes, HappyHorse-1.0 is now the highest-ranked AI video model in the world.

An AI video model appeared from nowhere on April 7, 2026. No announcement. No press release.

Just a model called “HappyHorse-1.0” quietly climbing the Artificial Analysis Video Arena leaderboard until it sat at #1 in both text-to-video and image-to-video generation worldwide.

Three days later, Alibaba claimed it.

From what I can tell, this was a deliberate stealth play: drop the model anonymously, let it win on merit in blind human tests, then reveal HappyHorse as an Alibaba AI video generator once the scores were locked in. That strategy paid off. Alibaba’s Hong Kong shares rose 2.12% on the day of the reveal, according to CNBC.

If you work with AI video tools at all, or you have been watching this space closely, the HappyHorse story is worth understanding in full.

Alibaba HappyHorse AI Video Generator

What Happened with HappyHorse-1.0 This Week?

HappyHorse-1.0 is a video generation model from Alibaba’s ATH AI Innovation Unit that topped the Artificial Analysis global leaderboard for both text-to-video and image-to-video on April 10, 2026.

HappyHorse anonymous leaderboard climb to number one

The model first appeared on the benchmark platform around April 7 under no company affiliation. It climbed fast, briefly disappeared from the leaderboard with no official explanation, then returned with Alibaba’s confirmation.

In my reading of the timeline, the disappearance was likely a controlled pause before a planned announcement rather than a technical issue.

The blind test scores are straightforward. In text-to-video with no audio, HappyHorse-1.0 scored 1333 Elo, beating the previous leader by 60 points. In image-to-video with no audio, it hit 1392 Elo, 37 points ahead of its nearest rival.

Here is how it ranks across all four categories:

CategoryHappyHorse-1.0 EloRank
Text-to-Video (no audio)1333#1
Image-to-Video (no audio)1392#1
Text-to-Video (with audio)1205#2
Image-to-Video (with audio)1161#2

The audio-video categories put it at #2. That is worth noting. The model is not the unanimous winner across every dimension, but in silent video generation, which reflects raw visual quality and motion realism, the results are dominant.

Why Is This a Bigger Deal Than the Benchmarks Suggest?

HappyHorse-1.0 is landing at the exact moment Western competition has collapsed, which makes the timing more significant than a standard benchmark win.

Western video AI gap filled by Chinese competitors chart

OpenAI discontinued Sora earlier in 2026 to focus on AGI and enterprise tools. The full breakdown of that decision is in the Sora shutdown coverage, but the short version is that OpenAI calculated video generation was too compute-heavy for the return it delivered. That left a gap.

ByteDance’s Seedance 2.0 was supposed to fill it. Instead, Seedance hit a copyright wall with Hollywood studios and paused its global rollout.

What I find striking is that two of the strongest non-Chinese challengers have now either exited or stalled at roughly the same time, leaving Alibaba positioned to dominate by default.

That puts HappyHorse, Google’s Veo, and Kuaishou’s Kling AI as the realistic top-tier options right now. From the blind test data, HappyHorse has a clear benchmark lead over Kling in the categories that matter most for visual fidelity.

How Does HappyHorse-1.0 Work Under the Hood?

HappyHorse-1.0 uses a 40-layer single-stream Self-Attention Transformer that processes text, video, and audio as one unified token sequence, skipping the Cross-Attention structures most competing models rely on.

From a technical standpoint, this is the most interesting part of the story. Most multimodal video models use separate processing pathways for text, video, and audio, with Cross-Attention bridging them.

HappyHorse drops that entirely. All three token types go into one sequence and attend to each other directly, with the middle 32 of the 40 layers sharing parameters across all modalities.

The design pays off in speed. In my reading of the vendor benchmarks, the model requires only 8 denoising steps and no Classifier-Free Guidance to reach its output quality, where most current video models use 20 to 50 steps. On an H100 GPU, HappyHorse generates a 5-second 256p clip in roughly 2 seconds, and a 1080p version in 38 seconds.

The model runs at an estimated 10B to 30B parameters (the official site claims 15B), supports six languages natively, and is specifically optimized for human-centric scenarios: lip-sync accuracy, facial performance, and realistic body movement. Whether you believe the “fully open-source” claim is a separate question covered in the next section.

What Does This Mean for AI Video Creators Right Now?

HappyHorse-1.0 is not yet publicly available, which means you cannot use it today regardless of where it ranks.

The official landing pages claim the model is “fully open-source” with base, distilled, and upscaling versions released. In practice, the GitHub and HuggingFace links both show “Coming Soon” as of April 10. The gap between that marketing claim and the actual code availability is the most important thing to flag right now.

What I would not do is delay live projects waiting for the release. The timeline is unclear, and the “open-source” claim may turn out to mean something narrower than it sounds, similar to corporate model releases that restrict commercial use. When access does arrive, here are the three things worth checking before committing to it:

  1. Licensing terms: is it MIT/Apache 2.0, or a restricted commercial license?
  2. Content moderation rules: what does Alibaba restrict under Chinese regulatory requirements?
  3. Training data transparency: what footage was used to achieve the lip-sync accuracy it is being praised for?

For the broader picture of how Chinese labs are positioning against Western competitors, the Chinese AI competition piece covers the full context.

What Comes Next for Alibaba’s HappyHorse AI Video Generator?

The next move from Alibaba is the public API and open-source code drop, which will determine whether HappyHorse-1.0 becomes the dominant foundation model for AI video generation in 2026.

Alibaba has confirmed an API rollout is coming. The way I see it, when that opens, developers will face a genuine decision: build on HappyHorse (if the licensing allows it) or stay with Runway and other established options at significantly higher per-second costs.

Chinese AI video tools have been pricing at roughly 4 cents per second generated, which undercuts Western alternatives by a wide margin.

One question worth watching is whether the community assumption that HappyHorse is a rebrand of Alibaba’s Wan 2.7 model is correct. From the technical specs, the architectures do not match.

Wan 2.7 focuses on long-text rendering and “thinking modes.” HappyHorse uses the single-stream Transformer described above. These appear to be two parallel research bets running inside Alibaba at the same time.

The stealth launch strategy worked. The model won a global benchmark before anyone knew who built it. The real test now is whether the open-source release matches the claim, or whether “fully open” turns out to mean the usual corporate fine print.

Leave a Reply

Your email address will not be published. Required fields are marked *