Python SDK — s2t | Kotoba Technologies

Source code, releases, and changelog.

Install

$ pip install kotoba-sdk

Requires Python 3.10 or later. The package is imported as kotoba. For live-microphone examples, install with the optional mic extra (pulls in sounddevice, which needs PortAudio on the system):

$ pip install 'kotoba-sdk[mic]'

Configure

KotobaClient() reads its credentials and per-route URLs from environment variables. For ASR you need:

Variable	Purpose
`KOTOBA_API_KEY`	Bearer token sent on both REST and WS requests
`KOTOBA_ASR_REST_URL`	REST API base URL, e.g. `https://.../v1`
`KOTOBA_ASR_URL`	WebSocket URL for live ASR, e.g. `wss://.../asr`

Pass them in code instead if you’d rather not rely on the environment:

1 import kotoba
2 
3 client = kotoba.KotobaClient(
4     api_key="kotoba-...",
5     url="https://.../v1",            # REST base
6     asr_ws_url="wss://.../asr",      # WS
7 )

See Authentication for the full handshake details.

Pick a transport

Transport	Method	Best for
REST	`client.asr.transcribe(...)` (POST + poll)	Batch / file-based work, long files
REST (low-level)	`client.asr.submit_job(...)` + `get_job(...)`	Custom polling, decoupled submit/fetch
WebSocket	`client.asr.stream(...)` / `transcribe_stream(...)`	Live mic, latency-sensitive pipelines

REST batch (`transcribe`)

POSTs the file to KOTOBA_ASR_REST_URL, polls until the job is done, and returns the final transcript:

1 import kotoba
2 
3 client = kotoba.KotobaClient()
4 result = client.asr.transcribe("clip.mp3", language="ja", with_timestamps=True)
5 print(result.text)
6 for seg in result.segments:
7     print(f"{seg.start:6.2f} - {seg.end:6.2f}  {seg.text}")

transcribe() accepts anything soundfile can decode (WAV / FLAC / OGG / MP3 / …). When with_timestamps=True, result.segments is populated with Segment(text, start, end) entries. Polling knobs:

1 result = client.asr.transcribe(
2     "clip.mp3",
3     language="ja",
4     with_timestamps=True,
5     poll_interval=1.0,      # initial GET polling interval (s)
6     poll_backoff=1.5,       # multiplied each poll
7     max_poll_interval=10.0,
8     timeout=1200.0,         # overall deadline for job completion (s)
9 )

TranscriptionError is raised on a server-reported failure, TimeoutError if the deadline elapses.

REST low-level (`submit_job` / `get_job`)

If you’d rather poll yourself — for example to drive a job queue or surface progress in a UI — call the two REST endpoints directly:

1 import time
2 
3 job = client.asr.submit_job("clip.mp3", language="ja")
4 print("submitted:", job.id)
5 
6 while True:
7     status = client.asr.get_job(job.id)
8     if status.state == "done":
9         print(status.transcription.text)
10         break
11     if status.state == "error":
12         raise RuntimeError(status.error_message)
13     time.sleep(2)

JobStatus.state is one of processing | done | error.

WebSocket streaming (`transcribe_stream`)

For the realtime / mic case — where transcript deltas should surface while audio is still being captured — pass a generator of PCM16 LE mono bytes to transcribe_stream(...). The feeder and receiver run concurrently, so the first delta can fire before your source is exhausted:

1 for delta in client.asr.transcribe_stream(mic_chunks(), language="en"):
2     print(delta, end="", flush=True)

Optional knobs on both stream(...) and transcribe_stream(...):

language — "en", "ja", "ko", or "zh"
sample_rate — defaults to 24 kHz; the session resamples internally
keywords — list of hotword biases, e.g. ["Kotobatech", "LLM"]

Async

Every ASR entry point has an async equivalent via AsyncKotobaClient:

1 import asyncio
2 import kotoba
3 
4 async def main() -> None:
5     async with kotoba.AsyncKotobaClient() as client:
6         result = await client.asr.transcribe("clip.mp3", language="ja")
7         print(result.text)
8 
9 asyncio.run(main())

The sync wrapper runs an asyncio loop on a background daemon thread — underlying transport is identical, only the call style differs.

What’s in the box (ASR)

Symbol	What
`client.asr.transcribe(path, ...)`	REST batch (with optional timestamps)
`client.asr.submit_job(...)` / `get_job(id)`	Low-level REST helpers
`client.asr.stream(...)`	WebSocket session you drive manually
`client.asr.transcribe_stream(iter, ...)`	Generator in → transcript deltas out

See the API reference for the on-the-wire protocol that this SDK wraps: Live (WebSocket) · Batch (REST).