🎭 EchoMimicV3: Audio-Driven Human Animation
Transform a portrait photo into a talking video! Upload an image and an audio file to create lifelike, expressive animations. This demo showcases the power of the EchoMimicV3 model.
Key Features:
- 🎯 High-Quality Lip Sync: Accurate mouth movements that match the input audio.
- 🎨 Natural Facial Expressions: Generates subtle and natural facial emotions.
- 🎵 Speech & Singing: Works with both spoken word and singing.
- ⚡ Efficient: Powered by a compact 1.3B parameter model.
Core Generation Parameters
5 50
10 30
Classifier-Free Guidance (CFG)
1 10
1 10
Gradually adjusts CFG during generation, can improve quality.
Gradually adjusts aCFG during generation, can improve quality.
Performance & VRAM (Chunking)
49 161
4 16
Sampler & Scheduler
Sampler
Algorithm for the diffusion process.
1 10
0.5 2
Inverts the inpainting mask.
Negative Guidance (Advanced CFG)
1 5
0 10
TeaCache (Performance Boost)
0 0.2
Riflex (Consistency)
1 10
Other
0 10
✨ Click to Try Examples
Examples
📸 Upload Portrait Image | 🎵 Upload Audio | ✍️ Prompt | 🚫 Negative Prompt | 🎲 Seed | Inference Steps | Text Guidance Scale (CFG) | Audio Guidance Scale (aCFG) | Frames Per Second (FPS) | Partial Video Length (Chunk Size) | Overlap Length | Negative Scale | Negative Steps | Use Dynamic Text CFG | Use Dynamic Audio aCFG | Sampler | Scheduler Shift | Audio Scale | Use Un-IP Mask | Enable TeaCache | TeaCache Threshold | TeaCache Offload | Num Skip Start Steps | Enable Riflex | Riflex K |
---|
📋 How to Use
- Upload Image: Choose a clear portrait photo (front-facing works best).
- Upload Audio: Add an audio file with clear speech or singing.
- Adjust Settings (Optional): Fine-tune parameters in the advanced sections for different results. For memory issues, try lowering the "Partial Video Length".
- Generate: Click the button and wait for your talking video!
Note: Generation time depends on settings and audio length. It can take a few minutes.
This demo is based on the EchoMimicV3 repository.