Kotoba Technologies APIs

Low-latency speech APIs for real-time and async applications.

Kotoba APIs are in private alpha, available to selected customers. Request access.

Kotoba Technologies provides two families of speech APIs:

Realtime APIs — WebSocket-based services for streaming speech with sub-second latency:

  • ASR — Automatic Speech Recognition. Stream audio and receive transcription deltas live.
  • STS — Speech-to-Speech Translation. Translate spoken audio between languages in real time.
  • TTS — Text-to-Speech. Synthesize natural-sounding speech from text.

Each realtime API speaks JSON over a single WebSocket connection and is described by an AsyncAPI specification.

Transcription API — A REST API for batch and offline workflows:

  • Transcription — Submit an audio file and poll for the result. Simpler than WebSocket when low latency is not required.

Quickstart

Install the Python SDK (Python 3.10+):

$pip install kotoba-sdk

Batch transcription via the REST API:

1import kotoba
2
3client = kotoba.KotobaClient() # reads KOTOBA_API_KEY + KOTOBA_*_URL from env
4result = client.asr.transcribe("clip.mp3", language="ja")
5print(result.text)

Streaming TTS:

1import kotoba
2
3client = kotoba.KotobaClient()
4with client.tts.stream(language="ja") as session:
5 session.start_response()
6 session.append_text("こんにちは、世界。")
7 session.commit()
8 for event in session:
9 if event.type == "audio_chunk":
10 handle(event.audio)
11 elif event.type == "done":
12 break

Each capability has its own Python SDK page with end-to-end snippets: Speech to Text · Speech to Speech Translation · Text to Speech.

What’s next