Speech to Text
Stream live transcripts or transcribe finished audio files.
Kotoba’s Speech-to-Text (ASR) lets you turn audio into text in either of two complementary modes:
- Live (WebSocket) — push PCM16 chunks as they arrive, read transcript deltas back the same connection. Built for microphones and any latency-sensitive pipeline where the first words matter before the utterance ends.
- Batch (REST) — POST an audio file and poll until a job completes. Best for long files, offline workflows, and anything that doesn’t need partial results.
Both modes support English (en), Japanese (ja), Korean (ko), and
Chinese (zh) input.
Pick a transport
Where to go next
- Python SDK for ASR —
client.asr.transcribe(...)for REST batch,client.asr.transcribe_stream(...)for live. - API reference — Live (WebSocket) — the AsyncAPI spec for the streaming channel.
- API reference — Batch (REST) — the OpenAPI spec for the async transcription job endpoints.