myshell-ai
/

DreamVoice

English

myshell

speech-to-speech

Model card Files Files and versions

xet

Community

Higobeatz commited on Jan 9

Commit

2315c29

1 Parent(s): 17420f4

openvoice plugin

Browse files

Files changed (1) hide show

.ipynb_checkpoints/README-checkpoint.md +0 -158

.ipynb_checkpoints/README-checkpoint.md DELETED Viewed

@@ -1,158 +0,0 @@
----
-language:
-- en
-tags:
-- myshell
-- speech-to-speech
----
-<!-- might put a [width=2000 * height=xxx] img here, this size best fits git page
-<img src="resources\cover.png"> -->
-<img src="resources/dreamvoice.png">
-# DreamVoice: Text-guided Voice Conversion
---------------------
-## Introduction
-DreamVoice is an innovative approach to voice conversion (VC) that leverages text-guided generation to create personalized and versatile voice experiences.
-Unlike traditional VC methods, which require a target recording during inference, DreamVoice introduces a more intuitive solution by allowing users to specify desired voice timbres through text prompts.
-For more details, please check our interspeech paper: [DreamVoice](https://arxiv.org/abs/2406.16314)
-To listen to demos and download dataset, please check dreamvoice's homepage: [Homepage](https://haidog-yaqub.github.io/dreamvoice_demo/)
-# How to Use
-To load the models, you need to install packages:
-```
-pip install -r requirements.txt
-```
-Then you can use the model with the following code:
-- NEW! DreamVoice Plugin for OpenVoice (DreamVG + [Opnevoice](https://github.com/myshell-ai/OpenVoice))
-```python
-import torch
-from dreamvoice import DreamVoice_Plugin
-from dreamvoice.openvoice_utils import se_extractor
-from openvoice.api import ToneColorConverter
-# init dreamvoice
-dreamvoice = DreamVoice_Plugin(device='cuda')
-# init openvoice
-ckpt_converter = 'checkpoints_v2/converter'
-openvoice = ToneColorConverter(f'{ckpt_converter}/config.json', device='cuda')
-openvoice.load_ckpt(f'{ckpt_converter}/checkpoint.pth')
-# generate speaker
-prompt = 'young female voice, sounds young and cute'
-target_se = dreamvoice.gen_spk(prompt)
-target_se = target_se.unsqueeze(-1)
-# content source
-source_path = 'examples/test2.wav'
-source_se = se_extractor(source_path, openvoice).to(device)
-# voice conversion
-encode_message = "@MyShell"
-openvoice.convert(
-    audio_src_path=source_path,
-    src_se=source_se,
-    tgt_se=target_se,
-    output_path='output.wav',
-    message=encode_message)
-```
-- DreamVoice Plugin for DiffVC (Diffusion-based VC Model)
-```python
-from dreamvoice import DreamVoice
-# Initialize DreamVoice in plugin mode with CUDA device
-dreamvoice = DreamVoice(mode='plugin', device='cuda')
-# Description of the target voice
-prompt = 'young female voice, sounds young and cute'
-# Provide the path to the content audio and generate the converted audio
-gen_audio, sr = dreamvoice.genvc('examples/test1.wav', prompt)
-# Save the converted audio
-dreamvoice.save_audio('gen1.wav', gen_audio, sr)
-# Save the speaker embedding if you like the generated voice
-dreamvoice.save_spk_embed('voice_stash1.pt')
-# Load the saved speaker embedding
-dreamvoice.load_spk_embed('voice_stash1.pt')
-# Use the saved speaker embedding for another audio sample
-gen_audio2, sr = dreamvoice.simplevc('examples/test2.wav', use_spk_cache=True)
-dreamvoice.save_audio('gen2.wav', gen_audio2, sr)
-```
-# Training Guide
-1. download VCTK and LibriTTS-R
-2. download [DreamVoice DataSet](https://haidog-yaqub.github.io/dreamvoice_demo/)
-3. extract speaker embeddings and cache in local path:
-```
-python dreamvoice/train_utils/prepare/prepare_se.py
-```
-4. modify trainning config and train your dreamvoice plugin:
-```
-cd dreamvoice/train_utils/src
-accelerate launch train.py
-```
-# Extra Features
-- End-to-end DreamVoice VC Model
-```python
-from dreamvoice import DreamVoice
-# Initialize DreamVoice in end-to-end mode with CUDA device
-dreamvoice = DreamVoice(mode='end2end', device='cuda')
-# Provide the path to the content audio and generate the converted audio
-gen_end2end, sr = dreamvoice.genvc('examples/test1.wav', prompt)
-# Save the converted audio
-dreamvoice.save_audio('gen_end2end.wav', gen_end2end, sr)
-# Note: End-to-end mode does not support saving speaker embeddings
-# To use a voice generated in end-to-end mode, switch back to plugin mode
-# and extract the speaker embedding from the generated audio
-# Switch back to plugin mode
-dreamvoice = DreamVoice(mode='plugin', device='cuda')
-# Load the speaker audio from the previously generated file
-gen_end2end2, sr = dreamvoice.simplevc('examples/test2.wav', speaker_audio='gen_end2end.wav')
-# Save the new converted audio
-dreamvoice.save_audio('gen_end2end2.wav', gen_end2end2, sr)
-```
-- DiffVC (Diffusion-based VC Model)
-```python
-from dreamvoice import DreamVoice
-# Plugin mode can be used for traditional one-shot voice conversion
-dreamvoice = DreamVoice(mode='plugin', device='cuda')
-# Generate audio using traditional one-shot voice conversion
-gen_tradition, sr = dreamvoice.simplevc('examples/test1.wav', speaker_audio='examples/speaker.wav')
-# Save the converted audio
-dreamvoice.save_audio('gen_tradition.wav', gen_tradition, sr)
-```
-## Reference
-If you find the code useful for your research, please consider citing:
-```bibtex
-@article{hai2024dreamvoice,
-  title={DreamVoice: Text-Guided Voice Conversion},
-  author={Hai, Jiarui and Thakkar, Karan and Wang, Helin and Qin, Zengyi and Elhilali, Mounya},
-  journal={arXiv preprint arXiv:2406.16314},
-  year={2024}
-}
-```