Interim results
Display partial transcription results in real time by using the interim_results parameter.
Interim results provide partial transcripts while the speaker is still talking. The transcript updates progressively as more audio arrives, so you can display text with low latency before the server finalizes a sentence.
In a non-streaming speech-to-text workflow, you send a complete audio file and wait for the full transcript. Streaming works differently in that audio flows to the server continuously, and the server sends back transcript data as it processes each chunk.
Interim results are the server's best guess so far. As more audio arrives, the server refines its prediction. Think of it like watching someone type a message in a chat app: you see the words appear in real time, and occasionally earlier words change as the system corrects itself. When the server is confident in a segment, it marks the result as final, and the transcript for that segment won't change.
This makes interim results ideal for live captioning, real-time transcription displays, and any experience where users expect to see text appear as they speak.
Enable or disable interim results
Interim results are enabled by default. To receive only finalized transcripts, set interim_results to false in the WebSocket query string:
# Default (interim results enabled)
wss://stt-api.subq.ai/v1/listen?encoding=mp3
# Disable interim results
wss://stt-api.subq.ai/v1/listen?encoding=mp3&interim_results=falseDisabling interim results is useful when you only need the final, stable transcript. For example, when saving transcripts to a database or feeding text into a downstream process that shouldn't act on uncertain data.
How it works
As audio streams in, the server sends Results messages. Each message includes fields that indicate the state of the transcript:
| Field | Meaning |
|---|---|
is_final: false | Interim result. The transcript may change as more audio arrives |
is_final: true | Final result for this audio segment. The transcript is stable |
speech_final: true | The speaker finished talking (end of utterance) |
A typical sequence looks like:
is_final: falsereturns"Hello"is_final: falsereturns"Hello world"is_final: truereturns"Hello world, how are you?"speech_final: truesignals utterance complete
Interim results let you display text as the speaker talks (like live captions). Final results give you the stable transcript to store or process.
Example response
The following example shows an interim Results message:
{
"type": "Results",
"is_final": false,
"speech_final": false,
"channel": {
"alternatives": [{
"transcript": "Hello world",
"confidence": 0.95
}]
}
}Next steps
- Endpointing to control when sentences are finalized
- Utterance detection to detect end of speaker turns
- WebSocket protocol for the full message reference