Overview
What is MaineCoon?
The first real-time audio-visual autoregressive model β 22 billion parameters, streaming synchronized output, optimized for social interaction.
Single H100 GPU
First frame latency
Continuous generation
Generation cost
Real Output
What MaineCoon actually looks like
Not mockups β these samples are generated directly by the model with synchronized audio and video.
Audio-visual sync
Speech, lip movement, and facial expression produced in a single synchronized output.
AV Sync βMaineCoon is developed by Catnip, a ~10-person AI startup based in China. The model was built from scratch in roughly 2 months by 3 core researchers using an AI-native development approach.
Not your typical video model
Most video AI generates complete clips before you can watch them. MaineCoon streams audio and video together, chunk by chunk β like ChatGPT streams text, but for synchronized sight and sound. You see the first frame within 3 seconds, then output continues in real time.
Named after the βdog of catsβ
Maine Coon cats are known for following their owners everywhere β highly interactive and responsive. The model mirrors this behavior: it doesn't generate everything at once and leave. It stays with you, follows your prompts, and adapts mid-stream.
Three things no prior model did simultaneously
- Streaming audio-visual generation β sub-second chunks with joint audio and video output
- Industry-leading inference speed β 47.5 FPS on a single H100, ~7Γ faster than streaming AV peers
- Long-duration stability β 10+ minutes continuous generation with agentic drift correction
Video model or digital human infrastructure?
Both β at different layers. MaineCoon is a foundation model (rendering layer), not a turnkey SaaS like HeyGen. It's the engine that platforms building AI companions, virtual streamers, and interactive NPCs can deploy. Catnip calls the broader vision a Social World Model.
Who built MaineCoon?+
Catnip, founded by Yang Shurui (ex-TikTok, ex-PixVerse) with Chief Scientist Xie Zeke (HKUST-GZ). Backed by Sequoia, Morningside Venture Capital, and others.
When was it released?+
Project started March 2025. Technical report published on arXiv in June 2025, quickly gaining attention from the AI community and media.
Is it open source?+
Technical report, Hugging Face model card, and GitHub repository are public. Check official channels for licensing details.
Experience MaineCoon live
Input a prompt and watch real-time streaming audio-visual generation on the official platform.