🎭 EchoMimicV3: Audio-Driven Human Animation

Transform a portrait photo into a talking video! Upload an image and an audio file to create lifelike, expressive animations. This demo showcases the power of the EchoMimicV3 model.

Key Features:

🎯 High-Quality Lip Sync: Accurate mouth movements that match the input audio.
🎨 Natural Facial Expressions: Generates subtle and natural facial emotions.
🎵 Speech & Singing: Works with both spoken word and singing.
⚡ Efficient: Powered by a compact 1.3B parameter model.

📸 Upload Portrait Image

🎵 Upload Audio

✍️ Prompt

🚫 Negative Prompt

🎥 Generated Video

🎲 Used Seed

✨ Click to Try Examples

Examples

📸 Upload Portrait Image	🎵 Upload Audio	✍️ Prompt	🚫 Negative Prompt	🎲 Seed	Inference Steps	Text Guidance Scale (CFG)	Audio Guidance Scale (aCFG)	Frames Per Second (FPS)	Partial Video Length (Chunk Size)	Overlap Length	Negative Scale	Negative Steps	Use Dynamic Text CFG	Use Dynamic Audio aCFG	Sampler	Scheduler Shift	Audio Scale	Use Un-IP Mask	Enable TeaCache	TeaCache Threshold	TeaCache Offload	Num Skip Start Steps	Enable Riflex	Riflex K

📋 How to Use

Upload Image: Choose a clear portrait photo (front-facing works best).
Upload Audio: Add an audio file with clear speech or singing.
Adjust Settings (Optional): Fine-tune parameters in the advanced sections for different results. For memory issues, try lowering the "Partial Video Length".
Generate: Click the button and wait for your talking video!

Note: Generation time depends on settings and audio length. It can take a few minutes.

This demo is based on the EchoMimicV3 repository.

🎭 EchoMimicV3: Audio-Driven Human Animation

Core Generation Parameters

Classifier-Free Guidance (CFG)

Performance & VRAM (Chunking)

Sampler & Scheduler

Negative Guidance (Advanced CFG)

TeaCache (Performance Boost)

Riflex (Consistency)

Other

✨ Click to Try Examples

📋 How to Use