Curiosity · AI Model
OpenAI TTS-1-HD
OpenAI TTS-1-HD is the higher-quality variant of OpenAI's text-to-speech API. It ships six built-in voices (alloy, echo, fable, onyx, nova, shimmer) and is tuned for natural prosody in audiobook-length generation, while the lighter TTS-1 variant optimises for streaming voice-agent latency.
Model specs
- Vendor
- OpenAI
- Family
- TTS-1
- Released
- 2023-11
- Context window
- 4,096 tokens
- Modalities
- text, audio
- Input price
- $30/M tok
- Output price
- n/a
- Pricing as of
- 2026-04-20
Strengths
- Natural prosody and clean diction across six voices
- Multiple output formats (mp3, opus, aac, flac)
- Supports streaming for real-time voice UX
- Tight integration with GPT-4.x and GPT-5 via the OpenAI API
Limitations
- No voice cloning — you pick from a fixed voice list
- Fewer languages than ElevenLabs Multilingual v2
- Prosody control beyond simple punctuation is limited
- Pricing is per character of input text rather than audio duration
Use cases
- Audiobook narration and long-form content
- Voice agents and IVR prompts
- Accessibility — reading documents aloud
- Podcast intros and educational narration
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MOS (listener rating) | ≈4.3 / 5 | 2024 |
| Output formats | mp3, opus, aac, flac | 2024 |
Frequently asked questions
What is OpenAI TTS-1-HD?
TTS-1-HD is the high-fidelity variant of OpenAI's text-to-speech API. It produces more natural prosody than TTS-1 at the cost of a higher price per input character, and is the default choice for narration and audiobook workflows.
What voices does OpenAI TTS offer?
OpenAI TTS ships six built-in voices: alloy, echo, fable, onyx, nova, and shimmer. There is no public voice-cloning option in the base API — for custom voices, use ElevenLabs.
How is OpenAI TTS priced?
Pricing is per 1 000 000 input characters rather than per token. As of April 2026, TTS-1-HD costs roughly USD 30 per million characters and TTS-1 costs roughly USD 15 per million characters.
When should I use TTS-1 vs TTS-1-HD?
Use TTS-1 for low-latency streaming (voice agents, IVR) where quality is good-enough. Use TTS-1-HD for narration, audiobooks, and pre-recorded content where prosody and naturalness matter most.
Sources
- OpenAI — TTS guide — accessed 2026-04-20
- OpenAI — Audio API pricing — accessed 2026-04-20