🎭 EchoMimicV3: Audio-Driven Human Animation

Transform a portrait photo into a talking video! Upload an image and an audio file to create lifelike, expressive animations. This demo showcases the power of the EchoMimicV3 model.

Key Features:

  • 🎯 High-Quality Lip Sync: Accurate mouth movements that match the input audio.
  • 🎨 Natural Facial Expressions: Generates subtle and natural facial emotions.
  • 🎵 Speech & Singing: Works with both spoken word and singing.
  • Efficient: Powered by a compact 1.3B parameter model.

Core Generation Parameters

5 50
10 30

Classifier-Free Guidance (CFG)

1 10
1 10

Gradually adjusts CFG during generation, can improve quality.

Gradually adjusts aCFG during generation, can improve quality.

Performance & VRAM (Chunking)

49 161
4 16

Sampler & Scheduler

Sampler

Algorithm for the diffusion process.

1 10
0.5 2

Inverts the inpainting mask.

Negative Guidance (Advanced CFG)

1 5
0 10

TeaCache (Performance Boost)

0 0.2

Riflex (Consistency)

1 10

Other

0 10

✨ Click to Try Examples

Examples
📸 Upload Portrait Image 🎵 Upload Audio ✍️ Prompt 🚫 Negative Prompt 🎲 Seed Inference Steps Text Guidance Scale (CFG) Audio Guidance Scale (aCFG) Frames Per Second (FPS) Partial Video Length (Chunk Size) Overlap Length Negative Scale Negative Steps Use Dynamic Text CFG Use Dynamic Audio aCFG Sampler Scheduler Shift Audio Scale Use Un-IP Mask Enable TeaCache TeaCache Threshold TeaCache Offload Num Skip Start Steps Enable Riflex Riflex K

📋 How to Use

  1. Upload Image: Choose a clear portrait photo (front-facing works best).
  2. Upload Audio: Add an audio file with clear speech or singing.
  3. Adjust Settings (Optional): Fine-tune parameters in the advanced sections for different results. For memory issues, try lowering the "Partial Video Length".
  4. Generate: Click the button and wait for your talking video!

Note: Generation time depends on settings and audio length. It can take a few minutes.

This demo is based on the EchoMimicV3 repository.