Interim results

Display partial transcription results in real time by using the interim_results parameter.

Interim results provide partial transcripts while the speaker is still talking. The transcript updates progressively as more audio arrives, so you can display text with low latency before the server finalizes a sentence.

In a non-streaming speech-to-text workflow, you send a complete audio file and wait for the full transcript. Streaming works differently in that audio flows to the server continuously, and the server sends back transcript data as it processes each chunk.

Interim results are the server's best guess so far. As more audio arrives, the server refines its prediction. Think of it like watching someone type a message in a chat app: you see the words appear in real time, and occasionally earlier words change as the system corrects itself. When the server is confident in a segment, it marks the result as final, and the transcript for that segment won't change.

This makes interim results ideal for live captioning, real-time transcription displays, and any experience where users expect to see text appear as they speak.

Enable or disable interim results

Interim results are enabled by default. To receive only finalized transcripts, set interim_results to false in the WebSocket query string:

# Default (interim results enabled)
wss://stt-api.subq.ai/v1/listen?encoding=mp3

# Disable interim results
wss://stt-api.subq.ai/v1/listen?encoding=mp3&interim_results=false

Disabling interim results is useful when you only need the final, stable transcript. For example, when saving transcripts to a database or feeding text into a downstream process that shouldn't act on uncertain data.

How it works

As audio streams in, the server sends Results messages. Each message includes fields that indicate the state of the transcript:

Field	Meaning
`is_final: false`	Interim result. The transcript may change as more audio arrives
`is_final: true`	Final result for this audio segment. The transcript is stable
`speech_final: true`	The speaker finished talking (end of utterance)

A typical sequence looks like:

is_final: false returns "Hello"
is_final: false returns "Hello world"
is_final: true returns "Hello world, how are you?"
speech_final: true signals utterance complete

Interim results let you display text as the speaker talks (like live captions). Final results give you the stable transcript to store or process.

Example response

The following example shows an interim Results message:

{
  "type": "Results",
  "is_final": false,
  "speech_final": false,
  "channel": {
    "alternatives": [{
      "transcript": "Hello world",
      "confidence": 0.95
    }]
  }
}

Next steps

Endpointing to control when sentences are finalized
Utterance detection to detect end of speaker turns
WebSocket protocol for the full message reference

Interim results

Enable or disable interim results

How it works

Example response

Next steps

On this page