🐱MaineCoon AI

Overview

What is MaineCoon?

The first real-time audio-visual autoregressive model β€” 22 billion parameters, streaming synchronized output, optimized for social interaction.

47.5FPS

Single H100 GPU

<3s

First frame latency

10min+

Continuous generation

<$0.001/s

Generation cost

Real Output

What MaineCoon actually looks like

Not mockups β€” these samples are generated directly by the model with synchronized audio and video.

MaineCoon

Audio-visual sync

Speech, lip movement, and facial expression produced in a single synchronized output.

AV Sync β†’

MaineCoon is developed by Catnip, a ~10-person AI startup based in China. The model was built from scratch in roughly 2 months by 3 core researchers using an AI-native development approach.

Not your typical video model

Most video AI generates complete clips before you can watch them. MaineCoon streams audio and video together, chunk by chunk β€” like ChatGPT streams text, but for synchronized sight and sound. You see the first frame within 3 seconds, then output continues in real time.

Named after the β€œdog of cats”

Maine Coon cats are known for following their owners everywhere β€” highly interactive and responsive. The model mirrors this behavior: it doesn't generate everything at once and leave. It stays with you, follows your prompts, and adapts mid-stream.

Three things no prior model did simultaneously

  • Streaming audio-visual generation β€” sub-second chunks with joint audio and video output
  • Industry-leading inference speed β€” 47.5 FPS on a single H100, ~7Γ— faster than streaming AV peers
  • Long-duration stability β€” 10+ minutes continuous generation with agentic drift correction

Video model or digital human infrastructure?

Both β€” at different layers. MaineCoon is a foundation model (rendering layer), not a turnkey SaaS like HeyGen. It's the engine that platforms building AI companions, virtual streamers, and interactive NPCs can deploy. Catnip calls the broader vision a Social World Model.

Who built MaineCoon?+

Catnip, founded by Yang Shurui (ex-TikTok, ex-PixVerse) with Chief Scientist Xie Zeke (HKUST-GZ). Backed by Sequoia, Morningside Venture Capital, and others.

When was it released?+

Project started March 2025. Technical report published on arXiv in June 2025, quickly gaining attention from the AI community and media.

Is it open source?+

Technical report, Hugging Face model card, and GitHub repository are public. Check official channels for licensing details.

Experience MaineCoon live

Input a prompt and watch real-time streaming audio-visual generation on the official platform.