For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Request access
OverviewSpeech to TextSpeech to Speech TranslationText to Speech
OverviewSpeech to TextSpeech to Speech TranslationText to Speech
Request access
LogoLogo

Realtime voice APIs

The foundational voice AI model for a borderless world

Stream audio over WebSocket and receive transcription, translation, or synthesized speech with sub-second latency. Built for real-time, multilingual speech; drop-in ready for Python.

Read the docs

Request access

Kotoba Technologies raises $11.83M in second seed round

Speech APIs

Realtime and batch speech APIs, one voice model.

Overview

Start here →

Quickstart, authentication, audio formats, and per-capability guides — everything you need to make your first call.

GuidesQuickstart
s2t

Speech to Text →

Stream audio and receive transcription deltas live, or POST a file and poll for a batch transcript.

EN · JA · KO · ZHWebSocketREST
s2st

Speech to Speech Translation →

Simultaneous translation between languages with sub‑second latency. Voice in, voice out; no waiting for the sentence to end.

EN · JA · KO · ZH · ESRealtime
t2s

Text to Speech →

Synthesize natural speech from text. Streaming output, designed for agents and devices.

EN · JA · KO · ZH · ESStreaming
SDKs

Install and start streaming. One client, three APIs.

Python: pip install kotoba-sdk

Built with