Voice cloning studio · v3 engine

Clone any voice.
Then direct it like talent.

Timbre captures a speaker from 30 seconds of audio and turns it into a studio-grade voice you can narrate, edit, and ship - in 32 languages, at production quality.

No card required 30s to first clone SOC 2 · consent-verified
Voices in production at
NorthwindAudible-ishPolyglotCadence FMStudio VelaLumen Games
/ 01 - the workflow

From a voice memo to a finished take in three moves.

No microphone rig, no re-records. Upload a sample, shape the delivery, and export broadcast-ready audio.

STEP 01

Capture the voice

Drop in 30 seconds of clean speech. Timbre models timbre, accent, and breath into a private voiceprint.

STEP 02

Direct the delivery

Dial emotion, pace, and emphasis per line. Add pauses and pronunciations like notes to a session musician.

STEP 03

Export anywhere

Render to WAV, MP3, or stream over the API - synced captions and timestamps included.

WAV · MP3 · API
/ 02 - voice library

Start with 120+ studio voices. Or clone your own.

Hand-tuned presets across narration, advertising, gaming and IVR - each cleared for commercial use.

/ 03 - the platform

A full studio behind every voice.

Everything you need to produce, localize and ship audio at scale - with the controls a real engineer expects.

One voice, 32 languages

Clone once and speak the world. Cross-lingual transfer keeps the speaker's identity intact while swapping the language and accent natively.

EnglishEspañol日本語FrançaisDeutschहिन्दीPortuguêsالعربية한국어+24

Streaming API

Sub-300ms first byte. Build voice into apps and agents.

# clone → speak
POST /v3/speak
{
  "voice": "vp_8c1",
  "text": "Ship it.",
  "format": "wav"
}

Emotion & pacing control

Per-line sliders for tone, intensity and speed - plus phoneme-level pronunciation overrides for names and jargon.

Consent & watermark

Every clone is consent-verified and carries an inaudible watermark for provenance.

Auto-dubbing

Drop in a video and get a lip-aware dub that holds timing across the whole timeline.

99.2%
Voice-match fidelity
<300ms
Streaming latency
32
Languages & accents
14M+
Clips generated weekly
/ 04 - pricing

Plans that scale from first take to full studio.

Start free, upgrade when you ship. Every plan includes commercial rights and the full voice library.

Creator

$0/forever

For trying clones and short projects.

  • 1 cloned voice
  • 10,000 characters / mo
  • Full studio voice library
  • MP3 export
Start free

Studio

Most popular
$29/mo

For creators shipping audio every week.

  • 15 cloned voices
  • 500,000 characters / mo
  • Emotion & pacing control
  • WAV + 32-language dubbing
  • Streaming API access
Start 14-day trial

Scale

Custom

For teams and platforms in production.

  • Unlimited voices & seats
  • Dedicated low-latency cluster
  • SSO, audit logs, SLA
  • On-prem & custom models
Talk to sales
/ 05 - questions

Good to know before you clone.

How much audio do I need to clone a voice?
Thirty seconds of clean, single-speaker speech is enough for a high-fidelity clone. More audio - up to ten minutes - sharpens accent and emotional range, but the 30-second instant clone is what most people ship with.
Can I only clone my own voice?
You can clone any voice you have explicit, documented consent to use - your own, a hired voice actor, or a licensed talent. Every clone passes a consent check and carries an inaudible watermark, and impersonation of public figures without permission is blocked.
Do I own the audio I generate?
Yes. Every paid plan - and the free tier - includes full commercial rights to the audio you generate, including library voices. Your reference recordings and voiceprints stay private to your workspace and are never used to train shared models.
How real does it actually sound?
The v3 engine reproduces breath, micro-pauses and prosody, landing at 99.2% match on our blind listening panel. The per-line direction controls let you push a take from neutral narration to a warm, emphatic read without re-recording.
Can I use Timbre in real time?
Yes - the streaming API returns the first audio byte in under 300ms, which is fast enough for live agents, IVR and interactive characters. Batch rendering is available for longer-form narration and dubbing jobs.

Your first voice clone is 30 seconds away.

Upload a sample, hear it speak, and ship a finished take today. No card, no studio, no re-records.

JOIN 40,000+ CREATORS & TEAMS PRODUCING WITH TIMBRE