Kotoba Technologies APIs
Low-latency speech APIs for real-time and async applications.
Kotoba APIs are in private alpha, available to selected customers. Request access.
Kotoba Technologies provides two families of speech APIs:
Realtime APIs — WebSocket-based services for streaming speech with sub-second latency:
- ASR — Automatic Speech Recognition. Stream audio and receive transcription deltas live.
- STS — Speech-to-Speech Translation. Translate spoken audio between languages in real time.
- TTS — Text-to-Speech. Synthesize natural-sounding speech from text.
Each realtime API speaks JSON over a single WebSocket connection and is described by an AsyncAPI specification.
Transcription API — A REST API for batch and offline workflows:
- Transcription — Submit an audio file and poll for the result. Simpler than WebSocket when low latency is not required.
Quickstart
Install the Python SDK (Python 3.10+):
Batch transcription via the REST API:
Streaming TTS:
Each capability has its own Python SDK page with end-to-end snippets: Speech to Text · Speech to Speech Translation · Text to Speech.
What’s next
- Authentication — server-side and browser flows.
- Audio formats — picking the right encoding.
- Capabilities: Speech to Text, Speech to Speech Translation, Text to Speech.
- Async transcription (REST) — submit a file for async transcription (under the ASR tab).