Popularity Leaderboard

Open source text-to-speech popularity leaderboard

OuteTTS

GGUF
LLaMa
Voice cloning

OuteTTS, a novel TTS model, uses pure language modeling on LLaMa architecture (Oute3-350M-DEV base). It shows quality speech synthesis via crafted prompts & audio tokens, without external adapters or complex setups.

F5-TTS

ConvNeXt V2
F5
E2
Sway Sampling

A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

XTTS-v2

Voice cloning
Cross-language
24khz
17 languages

ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. There is no need for an excessive amount of training data that spans countless hours.

MaskGCT

Voice cloning
zero-shot

Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

fish-speech-1.4

Voice cloning
zero-shot
Multilingual
Fast

Fish Speech V1.4 is a leading text-to-speech (TTS) model trained on 700k hours of audio data in multiple languages

Bark

highly realistic
Multilingual

Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects.

parler-tts

fully open-source
lightweight

Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).

MeloTTS

high-quality
multi-lingual

Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).