Python SDK — s2t

Speech-to-text from Python, REST batch + WebSocket streaming.

Install

$pip install kotoba-sdk

Requires Python 3.10 or later. The package is imported as kotoba. For live-microphone examples, install with the optional mic extra (pulls in sounddevice, which needs PortAudio on the system):

$pip install 'kotoba-sdk[mic]'

Configure

KotobaClient() reads its credentials and per-route URLs from environment variables. For ASR you need:

VariablePurpose
KOTOBA_API_KEYBearer token sent on both REST and WS requests
KOTOBA_ASR_REST_URLREST API base URL, e.g. https://.../v1
KOTOBA_ASR_URLWebSocket URL for live ASR, e.g. wss://.../asr

Pass them in code instead if you’d rather not rely on the environment:

1import kotoba
2
3client = kotoba.KotobaClient(
4 api_key="kotoba-...",
5 url="https://.../v1", # REST base
6 asr_ws_url="wss://.../asr", # WS
7)

See Authentication for the full handshake details.

Pick a transport

TransportMethodBest for
RESTclient.asr.transcribe(...) (POST + poll)Batch / file-based work, long files
REST (low-level)client.asr.submit_job(...) + get_job(...)Custom polling, decoupled submit/fetch
WebSocketclient.asr.stream(...) / transcribe_stream(...)Live mic, latency-sensitive pipelines

REST batch (transcribe)

POSTs the file to KOTOBA_ASR_REST_URL, polls until the job is done, and returns the final transcript:

1import kotoba
2
3client = kotoba.KotobaClient()
4result = client.asr.transcribe("clip.mp3", language="ja", with_timestamps=True)
5print(result.text)
6for seg in result.segments:
7 print(f"{seg.start:6.2f} - {seg.end:6.2f} {seg.text}")

transcribe() accepts anything soundfile can decode (WAV / FLAC / OGG / MP3 / …). When with_timestamps=True, result.segments is populated with Segment(text, start, end) entries. Polling knobs:

1result = client.asr.transcribe(
2 "clip.mp3",
3 language="ja",
4 with_timestamps=True,
5 poll_interval=1.0, # initial GET polling interval (s)
6 poll_backoff=1.5, # multiplied each poll
7 max_poll_interval=10.0,
8 timeout=1200.0, # overall deadline for job completion (s)
9)

TranscriptionError is raised on a server-reported failure, TimeoutError if the deadline elapses.

REST low-level (submit_job / get_job)

If you’d rather poll yourself — for example to drive a job queue or surface progress in a UI — call the two REST endpoints directly:

1import time
2
3job = client.asr.submit_job("clip.mp3", language="ja")
4print("submitted:", job.id)
5
6while True:
7 status = client.asr.get_job(job.id)
8 if status.state == "done":
9 print(status.transcription.text)
10 break
11 if status.state == "error":
12 raise RuntimeError(status.error_message)
13 time.sleep(2)

JobStatus.state is one of processing | done | error.

WebSocket streaming (transcribe_stream)

For the realtime / mic case — where transcript deltas should surface while audio is still being captured — pass a generator of PCM16 LE mono bytes to transcribe_stream(...). The feeder and receiver run concurrently, so the first delta can fire before your source is exhausted:

1for delta in client.asr.transcribe_stream(mic_chunks(), language="en"):
2 print(delta, end="", flush=True)

Optional knobs on both stream(...) and transcribe_stream(...):

  • language"en", "ja", "ko", or "zh"
  • sample_rate — defaults to 24 kHz; the session resamples internally
  • keywords — list of hotword biases, e.g. ["Kotobatech", "LLM"]

Async

Every ASR entry point has an async equivalent via AsyncKotobaClient:

1import asyncio
2import kotoba
3
4async def main() -> None:
5 async with kotoba.AsyncKotobaClient() as client:
6 result = await client.asr.transcribe("clip.mp3", language="ja")
7 print(result.text)
8
9asyncio.run(main())

The sync wrapper runs an asyncio loop on a background daemon thread — underlying transport is identical, only the call style differs.

What’s in the box (ASR)

SymbolWhat
client.asr.transcribe(path, ...)REST batch (with optional timestamps)
client.asr.submit_job(...) / get_job(id)Low-level REST helpers
client.asr.stream(...)WebSocket session you drive manually
client.asr.transcribe_stream(iter, ...)Generator in → transcript deltas out

See the API reference for the on-the-wire protocol that this SDK wraps: Live (WebSocket) · Batch (REST).