Python SDK — t2s

Text-to-speech from Python, with one-shot and streaming synthesis.

Install

$pip install kotoba-sdk

Requires Python 3.10 or later. The package is imported as kotoba.

Configure

KotobaClient() reads its credentials and per-route URLs from environment variables. For TTS you need:

VariablePurpose
KOTOBA_API_KEYBearer token sent on WS requests
KOTOBA_TTS_JA_URLWebSocket URL for Japanese TTS, e.g. wss://.../tts

Or pass them in code:

1import kotoba
2
3client = kotoba.KotobaClient(
4 api_key="kotoba-...",
5 tts_ja_ws_url="wss://.../tts",
6)

To use other voices / languages, register the route:

1kotoba.register_endpoint("tts", None, "ko", "wss://.../tts-ko")

One-shot synthesis

1import kotoba
2
3client = kotoba.KotobaClient()
4audio = client.tts.synthesize("こんにちは、世界。", language="ja")
5audio.to_wav("hello.wav")

audio.to_wav() converts the underlying float32 24 kHz mono signal to a playable 16-bit WAV.

Available Japanese speakers: ja-man-m02-azawa (male, default) and ja-woman-f04-me (female). Pass speaker_id=... to override:

1audio = client.tts.synthesize(
2 "こんにちは、世界。",
3 language="ja",
4 speaker_id="ja-woman-f04-me",
5)

Streaming synthesis

The full text is sent in a single frame; the server streams the synthesized audio back chunk-by-chunk, so you can play (or pipe to a speaker / WebRTC track) without waiting for the utterance to finish:

1with client.tts.stream(language="ja") as session:
2 session.synthesize("こんにちは。本日はよろしくお願いします。")
3
4 for event in session:
5 if event.type == "audio_chunk":
6 handle(event.audio) # float32 PCM @ 24 kHz
7 elif event.type == "done":
8 break

synthesize_stream(...) flattens the loop when you only want PCM bytes:

1for pcm in client.tts.synthesize_stream("こんにちは、世界。", language="ja"):
2 speaker.write(pcm)

Async

1import asyncio
2import kotoba
3
4async def main() -> None:
5 async with kotoba.AsyncKotobaClient() as client:
6 async with client.tts.stream(language="ja") as session:
7 await session.synthesize("こんにちは。")
8 async for event in session:
9 if event.type == "audio_chunk":
10 await play(event.audio)
11 elif event.type == "done":
12 break
13
14asyncio.run(main())

What’s in the box (TTS)

SymbolWhat
client.tts.synthesize(text, language)One-shot synthesis → AudioResult
client.tts.stream(language)Streaming session you drive manually
client.tts.synthesize_stream(text, lang)Single text → audio-chunk iterator

See the API reference for the on-the-wire protocol that this SDK wraps.