Overview

What is MaineCoon?

The first real-time audio-visual autoregressive model — 22 billion parameters, streaming synchronized output, optimized for social interaction.

Technical Capabilities Try Live Demo →

47.5FPS

Single H100 GPU

<3s

First frame latency

10min+

Continuous generation

<$0.001/s

Generation cost

Real Output

What MaineCoon actually looks like

Not mockups — these samples are generated directly by the model with synchronized audio and video.

MaineCoon

Audio-visual sync

Speech, lip movement, and facial expression produced in a single synchronized output.

AV Sync →

MaineCoon is developed by Catnip, a ~10-person AI startup based in China. The model was built from scratch in roughly 2 months by 3 core researchers using an AI-native development approach.

Not your typical video model

Most video AI generates complete clips before you can watch them. MaineCoon streams audio and video together, chunk by chunk — like ChatGPT streams text, but for synchronized sight and sound. You see the first frame within 3 seconds, then output continues in real time.

Named after the “dog of cats”

Maine Coon cats are known for following their owners everywhere — highly interactive and responsive. The model mirrors this behavior: it doesn't generate everything at once and leave. It stays with you, follows your prompts, and adapts mid-stream.

Three things no prior model did simultaneously

Streaming audio-visual generation — sub-second chunks with joint audio and video output
Industry-leading inference speed — 47.5 FPS on a single H100, ~7× faster than streaming AV peers
Long-duration stability — 10+ minutes continuous generation with agentic drift correction

Video model or digital human infrastructure?

Both — at different layers. MaineCoon is a foundation model (rendering layer), not a turnkey SaaS like HeyGen. It's the engine that platforms building AI companions, virtual streamers, and interactive NPCs can deploy. Catnip calls the broader vision a Social World Model.

Who built MaineCoon?+

Catnip, founded by Yang Shurui (ex-TikTok, ex-PixVerse) with Chief Scientist Xie Zeke (HKUST-GZ). Backed by Sequoia, Morningside Venture Capital, and others.

When was it released?+

Project started March 2025. Technical report published on arXiv in June 2025, quickly gaining attention from the AI community and media.

Is it open source?+

Technical report, Hugging Face model card, and GitHub repository are public. Check official channels for licensing details.

Experience MaineCoon live

Input a prompt and watch real-time streaming audio-visual generation on the official platform.

Try Experience Platform →Read Technical Report