Python SDK — s2st
Python SDK — s2st
Speech-to-speech translation from Python, WebSocket streaming.
Install
Requires Python 3.10 or later. The package is imported as kotoba.
For live-microphone examples, install with the optional mic extra
(pulls in sounddevice, which needs PortAudio on the system):
Configure
KotobaClient() reads its credentials and per-route URLs from environment
variables. For S2ST you need:
Or pass them in code:
To use other language pairs, register them at runtime:
One-shot translation
translate(...) consumes a finished audio file and returns the
translated audio plus the source-side transcript:
Streaming translation
Use client.s2st.stream(...) for live audio in / live audio out — both
transcript deltas and synthesized chunks surface as the server produces
them:
Tuning latency with delay
Both stream(...) and translate(...) accept an optional delay
parameter — an integer in the range 0–20 that controls how many
tokens of context the server buffers before emitting translated audio.
Higher values give the model more lookahead (better translation
quality); lower values reduce latency. Omit it to keep the server
default.
Async
Both entry points have async equivalents via AsyncKotobaClient:
For a runnable mic demo see examples/s2st_mic_async.py in the SDK repo
(needs the mic extra — sounddevice — and PortAudio).
What’s in the box (S2ST)
See the API reference for the on-the-wire protocol that this SDK wraps.