Python SDK — t2s
Python SDK — t2s
Text-to-speech from Python, with one-shot and streaming synthesis.
Install
Requires Python 3.10 or later. The package is imported as kotoba.
Configure
KotobaClient() reads its credentials and per-route URLs from environment
variables. For TTS you need:
Or pass them in code:
To use other voices / languages, register the route:
One-shot synthesis
audio.to_wav() converts the underlying float32 24 kHz mono signal to
a playable 16-bit WAV.
Available Japanese speakers: ja-man-m02-azawa (male, default) and
ja-woman-f04-me (female). Pass speaker_id=... to override:
Streaming synthesis
The full text is sent in a single frame; the server streams the synthesized audio back chunk-by-chunk, so you can play (or pipe to a speaker / WebRTC track) without waiting for the utterance to finish:
synthesize_stream(...) flattens the loop when you only want PCM bytes:
Async
What’s in the box (TTS)
See the API reference for the on-the-wire protocol that this SDK wraps.