Curiosity · AI Model

OpenAI TTS-1-HD

OpenAI TTS-1-HD is the higher-quality variant of OpenAI's text-to-speech API. It ships six built-in voices (alloy, echo, fable, onyx, nova, shimmer) and is tuned for natural prosody in audiobook-length generation, while the lighter TTS-1 variant optimises for streaming voice-agent latency.

Model specs

Vendor
OpenAI
Family
TTS-1
Released
2023-11
Context window
4,096 tokens
Modalities
text, audio
Input price
$30/M tok
Output price
n/a
Pricing as of
2026-04-20

Strengths

  • Natural prosody and clean diction across six voices
  • Multiple output formats (mp3, opus, aac, flac)
  • Supports streaming for real-time voice UX
  • Tight integration with GPT-4.x and GPT-5 via the OpenAI API

Limitations

  • No voice cloning — you pick from a fixed voice list
  • Fewer languages than ElevenLabs Multilingual v2
  • Prosody control beyond simple punctuation is limited
  • Pricing is per character of input text rather than audio duration

Use cases

  • Audiobook narration and long-form content
  • Voice agents and IVR prompts
  • Accessibility — reading documents aloud
  • Podcast intros and educational narration

Benchmarks

BenchmarkScoreAs of
MOS (listener rating)≈4.3 / 52024
Output formatsmp3, opus, aac, flac2024

Frequently asked questions

What is OpenAI TTS-1-HD?

TTS-1-HD is the high-fidelity variant of OpenAI's text-to-speech API. It produces more natural prosody than TTS-1 at the cost of a higher price per input character, and is the default choice for narration and audiobook workflows.

What voices does OpenAI TTS offer?

OpenAI TTS ships six built-in voices: alloy, echo, fable, onyx, nova, and shimmer. There is no public voice-cloning option in the base API — for custom voices, use ElevenLabs.

How is OpenAI TTS priced?

Pricing is per 1 000 000 input characters rather than per token. As of April 2026, TTS-1-HD costs roughly USD 30 per million characters and TTS-1 costs roughly USD 15 per million characters.

When should I use TTS-1 vs TTS-1-HD?

Use TTS-1 for low-latency streaming (voice agents, IVR) where quality is good-enough. Use TTS-1-HD for narration, audiobooks, and pre-recorded content where prosody and naturalness matter most.

Sources

  1. OpenAI — TTS guide — accessed 2026-04-20
  2. OpenAI — Audio API pricing — accessed 2026-04-20