Speech to Speech Translation | Kotoba Technologies

Kotoba’s Speech-to-Speech translation (S2ST) ingests audio in one language and emits both an incremental transcript of the source and synthesized speech in the target language — over a single WebSocket connection. Use it for simultaneous translation, live captioning with voice-over, and any scenario that needs sub-second turn-around without splitting the pipeline into separate ASR + MT + TTS steps.

Supported languages: English (en), Japanese (ja), Korean (ko), Chinese (zh), and Spanish (es).

What you get back

partial_transcript — incremental source-language transcript.
audio_chunk — synthesized target-language audio, streamed as it is produced.
done — emitted when the server has finished processing the committed audio.

Where to go next

Python SDK for S2ST — client.s2st.translate(...) for a one-shot file, client.s2st.stream(...) for live audio.
API reference — the AsyncAPI spec for the S2ST WebSocket channel.