VibeVoice-Hindi-7B: Open-Source Model Clones Voices, Speaks Fluent Hindi

VibeVoice-Hindi-7B: Open-Source Model Clones Voices, Speaks Fluent Hindi

Photo by Google DeepMind on Pexels

The open-source community has a new voice: VibeVoice-Hindi-7B. This text-to-speech (TTS) model delivers expressive Hindi synthesis, complete with multi-speaker support and voice cloning. Building upon Microsoft’s VibeVoice, the fine-tuned model allows for the creation of extended audio narrations and can replicate voices using only brief audio samples. Both the full model and LoRA adapters are available. Users on Reddit are highlighting its capabilities for generating natural-sounding Hindi, crafting multi-speaker conversations, and producing audio tracks up to 45 minutes long. The model leverages the Qwen2.5-7B large language model, incorporating LoRA fine-tuning and a diffusion head for high-quality audio. VibeVoice-Hindi-7B is released under the permissive MIT License, encouraging further development and exploration within the open-source ecosystem. The project originated from a Reddit discussion: https://old.reddit.com/r/artificial/comments/1ogspvd/p_vibevoicehindi7b_opensource_expressive/