Skip to main content

Overview

The TTS API converts text into speech audio. It exposes a V2 OpenAI-compatible speech endpoint backed by Hidoba TTS.

Base endpoint:

POST /v2/audio/speech

Voice processor

Use hidoba voice processor to manage your voices and see their ID.

Currently API only supports ishikawa model (other models currently work only in the calls).

Typical Flow

  1. Send text, model, voice, and optional audio settings.
  2. The API validates quota access and calls the model.
  3. The response returns raw audio bytes in the requested format.
  4. Usage is recorded automatically for quota billing.

Features

  • OpenAI-compatible shape: Use model, voice, input, response_format, and speed.
  • Formats: Generate mp3, opus, wav, or raw pcm.
  • Billing included: Processed characters are billed automatically from provider usage.
Important Considerations
  • Maximum input: 2,000 characters per request.
  • Streaming: Not supported by this endpoint.
  • Quota type: Requires a server-time quota.