Curiosity · AI Model

OpenAI TTS-1-HD

OpenAI TTS-1-HD is the higher-quality variant of OpenAI's text-to-speech API. It ships six built-in voices (alloy, echo, fable, onyx, nova, shimmer) and is tuned for natural prosody in audiobook-length generation, while the lighter TTS-1 variant optimises for streaming voice-agent latency.

Model specs

Vendor: OpenAI
Family: TTS-1
Released: 2023-11
Context window: 4,096 tokens
Modalities: text, audio
Input price: $30/M tok
Output price: n/a
Pricing as of: 2026-04-20

Strengths

Natural prosody and clean diction across six voices
Multiple output formats (mp3, opus, aac, flac)
Supports streaming for real-time voice UX
Tight integration with GPT-4.x and GPT-5 via the OpenAI API

Limitations

No voice cloning — you pick from a fixed voice list
Fewer languages than ElevenLabs Multilingual v2
Prosody control beyond simple punctuation is limited
Pricing is per character of input text rather than audio duration

Use cases

Audiobook narration and long-form content
Voice agents and IVR prompts
Accessibility — reading documents aloud
Podcast intros and educational narration

Benchmarks

Benchmark	Score	As of
MOS (listener rating)	≈4.3 / 5	2024
Output formats	mp3, opus, aac, flac	2024

Frequently asked questions

What is OpenAI TTS-1-HD?

TTS-1-HD is the high-fidelity variant of OpenAI's text-to-speech API. It produces more natural prosody than TTS-1 at the cost of a higher price per input character, and is the default choice for narration and audiobook workflows.

What voices does OpenAI TTS offer?

OpenAI TTS ships six built-in voices: alloy, echo, fable, onyx, nova, and shimmer. There is no public voice-cloning option in the base API — for custom voices, use ElevenLabs.

How is OpenAI TTS priced?

Pricing is per 1 000 000 input characters rather than per token. As of April 2026, TTS-1-HD costs roughly USD 30 per million characters and TTS-1 costs roughly USD 15 per million characters.

When should I use TTS-1 vs TTS-1-HD?

Use TTS-1 for low-latency streaming (voice agents, IVR) where quality is good-enough. Use TTS-1-HD for narration, audiobooks, and pre-recorded content where prosody and naturalness matter most.

Sources

OpenAI — TTS guide — accessed 2026-04-20
OpenAI — Audio API pricing — accessed 2026-04-20