Realtime voice APIs
Stream audio over WebSocket and receive transcription, translation, or synthesized speech with sub-second latency. Built for real-time, multilingual speech; drop-in ready for Python.
Realtime and batch speech APIs, one voice model.
Start here →
Quickstart, authentication, audio formats, and per-capability guides — everything you need to make your first call.
Speech to Text →
Stream audio and receive transcription deltas live, or POST a file and poll for a batch transcript.
Speech to Speech Translation →
Simultaneous translation between languages with sub‑second latency. Voice in, voice out; no waiting for the sentence to end.
Text to Speech →
Synthesize natural speech from text. Streaming output, designed for agents and devices.
Install and start streaming. One client, three APIs.
Python:
pip install kotoba-sdk