Realtime voice APIs

The foundational voice AI model for a borderless world

Stream audio over WebSocket and receive transcription, translation, or synthesized speech with sub-second latency. Built for real-time, multilingual speech; drop-in ready for Python.

Read the docs

Request access

Speech APIs

Realtime and batch speech APIs, one voice model.

Overview

Start here

Quickstart, authentication, audio formats, and per-capability guides — everything you need to make your first call.

GuidesQuickstart

s2t

Speech to Text

Stream audio and receive transcription deltas live, or POST a file and poll for a batch transcript.

EN · JA · KO · ZHWebSocketREST

s2st

Speech to Speech Translation

Simultaneous translation between languages with sub‑second latency. Voice in, voice out; no waiting for the sentence to end.

EN · JA · KO · ZH · ESRealtime

t2s

Text to Speech

Synthesize natural speech from text. Streaming output, designed for agents and devices.

EN · JA · KO · ZH · ESStreaming

SDKs

Install and start streaming. One client, three APIs.

Python: pip install kotoba-sdk

Built with

The foundational voice AI model for a borderless world

Realtime and batch speech APIs, one voice model.

Start here →

Speech to Text →

Speech to Speech Translation →

Text to Speech →

Install and start streaming. One client, three APIs.

Start here

Speech to Text

Speech to Speech Translation

Text to Speech