Kotoba Technologies APIs | Kotoba Technologies

Kotoba APIs are in private alpha, available to selected customers. Request access.

Kotoba Technologies provides two families of speech APIs:

Realtime APIs — WebSocket-based services for streaming speech with sub-second latency:

ASR — Automatic Speech Recognition. Stream audio and receive transcription deltas live.
STS — Speech-to-Speech Translation. Translate spoken audio between languages in real time.
TTS — Text-to-Speech. Synthesize natural-sounding speech from text.

Each realtime API speaks JSON over a single WebSocket connection and is described by an AsyncAPI specification.

Transcription API — A REST API for batch and offline workflows:

Transcription — Submit an audio file and poll for the result. Simpler than WebSocket when low latency is not required.

Quickstart

Install the Python SDK (Python 3.10+):

$ pip install kotoba-sdk

Batch transcription via the REST API:

1 import kotoba
2 
3 client = kotoba.KotobaClient()  # reads KOTOBA_API_KEY + KOTOBA_*_URL from env
4 result = client.asr.transcribe("clip.mp3", language="ja")
5 print(result.text)

Streaming TTS:

1 import kotoba
2 
3 client = kotoba.KotobaClient()
4 with client.tts.stream(language="ja") as session:
5     session.start_response()
6     session.append_text("こんにちは、世界。")
7     session.commit()
8     for event in session:
9         if event.type == "audio_chunk":
10             handle(event.audio)
11         elif event.type == "done":
12             break

Each capability has its own Python SDK page with end-to-end snippets: Speech to Text · Speech to Speech Translation · Text to Speech.

What’s next

Authentication — server-side and browser flows.
Audio formats — picking the right encoding.
Capabilities: Speech to Text, Speech to Speech Translation, Text to Speech.
Async transcription (REST) — submit a file for async transcription (under the ASR tab).