Skip to main content

Overview

The TTS API converts text into speech audio. It exposes a V2 OpenAI-compatible speech endpoint backed by Hidoba TTS.

Base endpoint:

POST /v2/audio/speech

Voice processor

Use hidoba voice processor to manage your voices and see their ID.

Currently API only supports ishikawa model (other models currently work only in the calls).

Typical Flow

Send text, model, voice, and optional audio settings.
The API validates quota access and calls the model.
The response returns raw audio bytes in the requested format.
Usage is recorded automatically for quota billing.

Features

OpenAI-compatible shape: Use model, voice, input, response_format, and speed.
Formats: Generate mp3, opus, wav, or raw pcm.
Language control: Pass a supported short language code, or use auto to detect the language from the input text.
Billing included: Processed characters are billed automatically from provider usage.

:::important Important Considerations

Maximum input: 2,000 characters per request.
Streaming: Not supported by this endpoint.
Quota type: Requires a server-time quota. :::

Voice processor
Typical Flow
Features