Hanzo Dev commited on
Commit
287780c
·
1 Parent(s): fbc526e

Fix voice names (cherry/noah) and add custom voice docs

Browse files
Files changed (1) hide show
  1. README.md +35 -2
README.md CHANGED
@@ -44,11 +44,44 @@ Zen-Dub-Live leverages Zen Omni's unified Thinker-Talker architecture for true e
44
  **Key**: The entire pipeline is native - audio understanding, translation, AND speech synthesis happen end-to-end. No separate ASR or TTS models needed.
45
 
46
  - **First-packet latency**: 234ms (audio) / 547ms (video)
47
- - **Built-in voices**: `chelsie`, `ethan`, `aiden`
48
- - **Languages**: 119 text, 19 speech input, 10 speech output
49
 
50
  See: [Zen Omni Technical Report](https://arxiv.org/abs/2509.17765)
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ## Overview
53
 
54
  Zen-Dub-Live is a real-time AI dubbing platform for broadcast-grade speech-to-speech translation with synchronized video lip-sync. The system ingests live video and audio, translates speech, synthesizes anchor-specific voices, and re-renders mouth regions so that lip movements match the translated speech—all under live broadcast latency constraints.
 
44
  **Key**: The entire pipeline is native - audio understanding, translation, AND speech synthesis happen end-to-end. No separate ASR or TTS models needed.
45
 
46
  - **First-packet latency**: 234ms (audio) / 547ms (video)
47
+ - **Built-in voices**: `cherry` (female), `noah` (male)
48
+ - **Languages**: 119 text, 19 speech input, 2 speech output voices
49
 
50
  See: [Zen Omni Technical Report](https://arxiv.org/abs/2509.17765)
51
 
52
+ ### Adding Custom Voices
53
+
54
+ Zen-Dub-Live supports voice cloning for anchor-specific voices:
55
+
56
+ ```python
57
+ from zen_dub_live import AnchorVoice
58
+
59
+ # Clone a voice from reference audio (10-30 seconds recommended)
60
+ custom_voice = AnchorVoice.from_audio(
61
+ "anchor_audio_sample.wav",
62
+ name="anchor_01"
63
+ )
64
+
65
+ # Register for use in pipeline
66
+ pipeline.register_voice(custom_voice)
67
+
68
+ # Use in session
69
+ session = await pipeline.create_session(
70
+ anchor_voice="anchor_01",
71
+ ...
72
+ )
73
+ ```
74
+
75
+ Voice profiles are stored as embeddings and can be saved/loaded:
76
+
77
+ ```python
78
+ # Save voice profile
79
+ custom_voice.save("voices/anchor_01.pt")
80
+
81
+ # Load voice profile
82
+ anchor_voice = AnchorVoice.load("voices/anchor_01.pt")
83
+ ```
84
+
85
  ## Overview
86
 
87
  Zen-Dub-Live is a real-time AI dubbing platform for broadcast-grade speech-to-speech translation with synchronized video lip-sync. The system ingests live video and audio, translates speech, synthesizes anchor-specific voices, and re-renders mouth regions so that lip movements match the translated speech—all under live broadcast latency constraints.