Skip to main content

Messages API v3

Messages API v3 is the rc2-backed OpenAI-compatible text generation endpoint.

Base URL:

https://msg.hidoba.com

Use this API when you want synchronous chat or Responses API generations with Hidoba quota tracking, character prompts, RAG, streaming, fallback models, and usage attribution.

The older Messages API v2 docs cover the legacy /v2/completions text/audio flow. Use Messages API v3 for new OpenAI-compatible text generation integrations.

Typical Flow

  1. Send an OpenAI-compatible request with a quota API key.
  2. rc2 validates the API key, quota, character access, and request lifecycle.
  3. Messages API v3 transforms the request, injects character and RAG context when configured, and routes the request to Bifrost.
  4. The model response is returned directly or streamed.
  5. Usage is recorded automatically; clients do not call billing or usage endpoints.

Features

  • Chat Completions: POST /v3/chat/completions
  • Responses API: POST /v3/responses
  • Compatibility aliases: /v1/chat/completions and /v1/responses
  • Authentication: Authorization: Bearer <quota_api_key> or X-API-Key: <quota_api_key>
  • Characters: Optional GitHub or inline characters under metadata.hidoba.character
  • RAG: Derived from character and server config. Do not send request-level RAG config.
  • Routing: Server-owned provider routing with optional request fallback_model
  • Reasoning controls: OpenRouter-style reasoning options can be passed through
  • Streaming: Standard streaming responses are proxied for supported models

Important Considerations

important
  • metadata.hidoba may contain only character and character_params.
  • metadata.hidoba.rag, metadata.hidoba.routing, metadata.hidoba.request_id, and unknown metadata.hidoba fields are rejected.
  • metadata.hidoba is stripped before provider-visible model payloads.
  • RAG uses internal retrieval, including dense, SPLADE, and BM25 signals when configured. Clients do not call SPLADE directly.
  • Character max_new_tokens, when present in old character schemas, is not used as the output-token limit. Use request-level token fields such as max_completion_tokens, max_tokens, or max_output_tokens.