myshell-ai
/

DreamVoice

English

myshell

speech-to-speech

Model card Files Files and versions

xet

Community

Higobeatz commited on Jan 8

Commit

e52c10a

1 Parent(s): b63b1c9

openvoice plugin

Browse files

Files changed (2) hide show

.ipynb_checkpoints/README-checkpoint.md +142 -0
README.md +3 -3

.ipynb_checkpoints/README-checkpoint.md ADDED Viewed

	@@ -0,0 +1,142 @@

+---
+language:
+- en
+tags:
+- myshell
+- speech-to-speech
+---
+<!-- might put a [width=2000 * height=xxx] img here, this size best fits git page
+<img src="resources\cover.png"> -->
+<img src="resources/dreamvoice.png">
+# DreamVoice: Text-guided Voice Conversion
+--------------------
+## Introduction
+DreamVoice is an innovative approach to voice conversion (VC) that leverages text-guided generation to create personalized and versatile voice experiences.
+Unlike traditional VC methods, which require a target recording during inference, DreamVoice introduces a more intuitive solution by allowing users to specify desired voice timbres through text prompts.
+For more details, please check our interspeech paper: [DreamVoice](https://arxiv.org/abs/2406.16314)
+To listen to demos and download dataset, please check dreamvoice's homepage: [Homepage](https://haidog-yaqub.github.io/dreamvoice_demo/)
+# Model Usage
+To load the models, you need to install packages:
+```
+pip install -r requirements.txt
+```
+Then you can use the model with the following code:
+- NEW! DreamVoice Plugin for OpenVoice (DreamVG + [Opnevoice](https://github.com/myshell-ai/OpenVoice))
+```python
+import torch
+from dreamvoice import DreamVoice_Plugin
+from dreamvoice.openvoice_utils import se_extractor
+from openvoice.api import ToneColorConverter
+# init dreamvoice
+dreamvoice = DreamVoice_Plugin(device='cuda')
+# init openvoice
+ckpt_converter = 'checkpoints_v2/converter'
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+openvoice = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)
+openvoice.load_ckpt(f'{ckpt_converter}/checkpoint.pth')
+# generate speaker
+prompt = 'cute female girl voice'
+target_se = dreamvoice.gen_spk(prompt)
+target_se = target_se.unsqueeze(-1)
+# content source
+source_path = 'examples/test2.wav'
+source_se = se_extractor(source_path, openvoice).to(device)
+# voice conversion
+encode_message = "@MyShell"
+openvoice.convert(
+    audio_src_path=source_path,
+    src_se=source_se,
+    tgt_se=target_se,
+    output_path='output.wav',
+    message=encode_message)
+```
+- DreamVoice Plugin for DiffVC (Diffusion-based VC Model)
+```python
+from dreamvoice import DreamVoice
+# Initialize DreamVoice in plugin mode with CUDA device
+dreamvoice = DreamVoice(mode='plugin', device='cuda')
+# Description of the target voice
+prompt = 'young female voice, sounds young and cute'
+# Provide the path to the content audio and generate the converted audio
+gen_audio, sr = dreamvoice.genvc('examples/test1.wav', prompt)
+# Save the converted audio
+dreamvoice.save_audio('gen1.wav', gen_audio, sr)
+# Save the speaker embedding if you like the generated voice
+dreamvoice.save_spk_embed('voice_stash1.pt')
+# Load the saved speaker embedding
+dreamvoice.load_spk_embed('voice_stash1.pt')
+# Use the saved speaker embedding for another audio sample
+gen_audio2, sr = dreamvoice.simplevc('examples/test2.wav', use_spk_cache=True)
+dreamvoice.save_audio('gen2.wav', gen_audio2, sr)
+```
+- End-to-end DreamVoice VC Model
+```python
+from dreamvoice import DreamVoice
+# Initialize DreamVoice in end-to-end mode with CUDA device
+dreamvoice = DreamVoice(mode='end2end', device='cuda')
+# Provide the path to the content audio and generate the converted audio
+gen_end2end, sr = dreamvoice.genvc('examples/test1.wav', prompt)
+# Save the converted audio
+dreamvoice.save_audio('gen_end2end.wav', gen_end2end, sr)
+# Note: End-to-end mode does not support saving speaker embeddings
+# To use a voice generated in end-to-end mode, switch back to plugin mode
+# and extract the speaker embedding from the generated audio
+# Switch back to plugin mode
+dreamvoice = DreamVoice(mode='plugin', device='cuda')
+# Load the speaker audio from the previously generated file
+gen_end2end2, sr = dreamvoice.simplevc('examples/test2.wav', speaker_audio='gen_end2end.wav')
+# Save the new converted audio
+dreamvoice.save_audio('gen_end2end2.wav', gen_end2end2, sr)
+```
+- DiffVC (Diffusion-based VC Model)
+```python
+from dreamvoice import DreamVoice
+# Plugin mode can be used for traditional one-shot voice conversion
+dreamvoice = DreamVoice(mode='plugin', device='cuda')
+# Generate audio using traditional one-shot voice conversion
+gen_tradition, sr = dreamvoice.simplevc('examples/test1.wav', speaker_audio='examples/speaker.wav')
+# Save the converted audio
+dreamvoice.save_audio('gen_tradition.wav', gen_tradition, sr)
+```
+## Reference
+If you find the code useful for your research, please consider citing:
+```bibtex
+@article{hai2024dreamvoice,
+  title={DreamVoice: Text-Guided Voice Conversion},
+  author={Hai, Jiarui and Thakkar, Karan and Wang, Helin and Qin, Zengyi and Elhilali, Mounya},
+  journal={arXiv preprint arXiv:2406.16314},
+  year={2024}
+}
+```

README.md CHANGED Viewed

@@ -69,7 +69,7 @@ openvoice.convert(
     message=encode_message)
 ```
-- DreamVoice Plugin for Diffusion-based VC model (DreamVG + DiffVC)
 ```python
 from dreamvoice import DreamVoice
@@ -92,7 +92,7 @@ gen_audio2, sr = dreamvoice.simplevc('examples/test2.wav', use_spk_cache=True)
 dreamvoice.save_audio('gen2.wav', gen_audio2, sr)
 ```
-- Diffusion-based End-to-end model (DreamVC)
 ```python
 from dreamvoice import DreamVoice
@@ -115,7 +115,7 @@ gen_end2end2, sr = dreamvoice.simplevc('examples/test2.wav', speaker_audio='gen_
 dreamvoice.save_audio('gen_end2end2.wav', gen_end2end2, sr)
 ```
-- One-shot Voice Conversion (DiffVC)
 ```python
 from dreamvoice import DreamVoice

     message=encode_message)
 ```
+- DreamVoice Plugin for DiffVC (Diffusion-based VC Model)
 ```python
 from dreamvoice import DreamVoice
 dreamvoice.save_audio('gen2.wav', gen_audio2, sr)
 ```
+- End-to-end DreamVoice VC Model
 ```python
 from dreamvoice import DreamVoice
 dreamvoice.save_audio('gen_end2end2.wav', gen_end2end2, sr)
 ```
+- DiffVC (Diffusion-based VC Model)
 ```python
 from dreamvoice import DreamVoice