Popularity Leaderboard

Open source text-to-speech popularity leaderboard

OuteTTS

GGUF

LLaMa

Voice cloning

OuteTTS, a novel TTS model, uses pure language modeling on LLaMa architecture (Oute3-350M-DEV base). It shows quality speech synthesis via crafted prompts & audio tokens, without external adapters or complex setups.

View

F5-TTS

ConvNeXt V2

Sway Sampling

A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

View

XTTS-v2

Voice cloning

Cross-language

24khz

17 languages

ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. There is no need for an excessive amount of training data that spans countless hours.

View

MaskGCT

Voice cloning

zero-shot

Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

View

fish-speech-1.4

Voice cloning

zero-shot

Multilingual

Fast

Fish Speech V1.4 is a leading text-to-speech (TTS) model trained on 700k hours of audio data in multiple languages

View

Bark

highly realistic

Multilingual

Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects.

View

parler-tts

fully open-source

lightweight

Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).

View

MeloTTS

high-quality

multi-lingual

Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).

View