Transcribe a file

Use file transcription when you have audio stored locally: recordings, voicemails, podcast episodes, meeting captures, or any other pre-recorded audio. You send the file contents in a single POST request to /v1/listen and receive the complete transcript in the response.

The SubQ API accepts most common audio formats including MP3, WAV, AAC, FLAC, OGG, Opus, WebM, M4A, and raw PCM. For details on how pre-recorded and streaming transcription differ, see Transcription modes.

Prerequisites

Python 3.8 or later
httpx installed. If you haven't already, follow the set up and installation guide.
A local audio file in a supported format (MP3, WAV, FLAC, etc.)

Transcribe a local file

In this example, you read an audio file from disk and send it to the SubQ API for transcription:

transcribe_file.py

import os
import httpx

SUBQ_API_KEY = os.environ["SUBQ_API_KEY"]

# Read the audio file as raw bytes
with open("audio.wav", "rb") as f:
    audio_data = f.read()

# Send the audio to the SubQ API
response = httpx.post(
    "https://stt-api.subq.ai/v1/listen",
    headers={"Authorization": f"Bearer {SUBQ_API_KEY}"},
    content=audio_data,
    timeout=60.0
)

result = response.json()

# Extract and print the transcript
transcript = (
    result.get("results", {})
    .get("channels", [{}])[0]
    .get("alternatives", [{}])[0]
    .get("transcript", "")
)

print(transcript)

Run it:

python transcribe_file.py

How it works

To transcribe a file, you read the audio, send it to the API, and extract the transcript from the response:

Read the file: open("audio.wav", "rb").read() loads the entire file into memory as raw bytes. The API detects the audio format automatically from the file contents, so you don't need to specify a content type.
POST to /v1/listen: httpx.post() sends the audio bytes as the request body to the SubQ API. The Authorization header authenticates the request with your API key using the Bearer scheme. The timeout=60.0 sets a 60-second timeout but you can increase it for longer files.
Parse the response: The transcript sits inside a nested JSON structure: results → channels → alternatives. The chained .get() calls navigate this safely, returning empty defaults if any key is missing. This avoids KeyError exceptions if the response structure is unexpected.

Prerequisites

Node.js 18 or later
A local audio file in a supported format (MP3, WAV, FLAC, etc.)

Transcribe a local file

In this example, you read an audio file from disk and send it to the SubQ API for transcription:

transcribe_file.js

import { readFileSync } from "node:fs";

const SUBQ_API_KEY = process.env.SUBQ_API_KEY;

// Read the audio file as raw bytes
const audioData = readFileSync("audio.wav");

// Send the audio to the SubQ API
const response = await fetch("https://stt-api.subq.ai/v1/listen", {
  method: "POST",
  headers: { "Authorization": `Bearer ${SUBQ_API_KEY}` },
  body: audioData,
});

const result = await response.json();

// Extract and print the transcript
const transcript = result?.results?.channels?.[0]?.alternatives?.[0]?.transcript ?? "";
console.log(transcript);

Run it:

node transcribe_file.js

How it works

To transcribe a file, you read the audio, send it to the API, and extract the transcript from the response:

Read the file: readFileSync("audio.wav") loads the entire file into memory as a Buffer. The API detects the audio format automatically from the file contents, so you don't need to specify a content type.
POST to /v1/listen: fetch() sends the audio bytes as the request body to the SubQ API. The Authorization header authenticates the request with your API key using the Bearer scheme.
Parse the response: The transcript sits inside a nested JSON structure: results → channels → alternatives. Optional chaining (?.) navigates this safely, returning undefined if any key is missing. The ?? "" fallback ensures you always get a string.

Prerequisites

Go 1.21 or later
A local audio file in a supported format (MP3, WAV, FLAC, etc.)

Transcribe a local file

In this example, you read an audio file from disk and send it to the SubQ API for transcription:

transcribe_file.go

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"net/http"
	"os"
)

func main() {
	apiKey := os.Getenv("SUBQ_API_KEY")

	// Read the audio file as raw bytes
	audioData, err := os.ReadFile("audio.wav")
	if err != nil {
		fmt.Println("Error reading file:", err)
		return
	}

	// Send the audio to the SubQ API
	req, _ := http.NewRequest("POST", "https://stt-api.subq.ai/v1/listen", bytes.NewReader(audioData))
	req.Header.Set("Authorization", "Bearer "+apiKey)

	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	defer resp.Body.Close()
	body, _ := io.ReadAll(resp.Body)

	if resp.StatusCode != 200 {
		fmt.Printf("Error %d: %s\n", resp.StatusCode, string(body))
		return
	}

	// Extract and print the transcript
	var result struct {
		Results struct {
			Channels []struct {
				Alternatives []struct {
					Transcript string `json:"transcript"`
				} `json:"alternatives"`
			} `json:"channels"`
		} `json:"results"`
	}
	json.Unmarshal(body, &result)
	fmt.Println(result.Results.Channels[0].Alternatives[0].Transcript)
}

Run it:

go run transcribe_file.go

How it works

To transcribe a file, you read the audio, send it to the API, and extract the transcript from the response:

Read the file: os.ReadFile("audio.wav") loads the entire file into memory as a byte slice. The API detects the audio format automatically from the file contents, so you don't need to set a content type.
POST to /v1/listen: http.NewRequest builds a POST request with the audio bytes as the body. The Authorization header authenticates the request with your API key using the Bearer scheme.
Parse the response: You read the response body with io.ReadAll and unmarshal it into a struct that mirrors the JSON structure. Go's typed struct fields give you compile-time safety when you access Results.Channels[0].Alternatives[0].Transcript.

Prerequisites

Rust 1.70 or later with reqwest, tokio, and serde_json in your Cargo.toml. If you haven't already, follow the set up and installation guide.
A local audio file in a supported format (MP3, WAV, FLAC, etc.)

Transcribe a local file

In this example, you read an audio file from disk and send it to the SubQ API for transcription:

transcribe_file.rs

use serde_json::Value;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api_key = std::env::var("SUBQ_API_KEY").expect("SUBQ_API_KEY not set");

    // Read the audio file as raw bytes
    let audio_data = tokio::fs::read("audio.wav").await?;

    // Send the audio to the SubQ API
    let client = reqwest::Client::new();
    let response = client
        .post("https://stt-api.subq.ai/v1/listen")
        .header("Authorization", format!("Bearer {}", api_key))
        .body(audio_data)
        .send()
        .await?;

    let status = response.status();
    let text = response.text().await?;

    if !status.is_success() {
        eprintln!("Error {}: {}", status, text);
        return Ok(());
    }

    // Extract and print the transcript
    let json: Value = serde_json::from_str(&text)?;
    let transcript = json["results"]["channels"][0]["alternatives"][0]["transcript"]
        .as_str()
        .unwrap_or("");
    println!("{}", transcript);

    Ok(())
}

Run it:

cargo run --bin transcribe_file

How it works

To transcribe a file, you read the audio, send it to the API, and extract the transcript from the response:

Read the file: tokio::fs::read("audio.wav") loads the entire file into a Vec<u8>. The API detects the audio format automatically from the file contents, so you don't need to set a content type.
POST to /v1/listen: reqwest::Client::new().post(url) builds a POST request. The .header("Authorization", ...) call adds Bearer authentication, and .body(audio_data) attaches the raw audio bytes.
Parse the response: You deserialize the response body into a serde_json::Value and index it to extract the transcript. The as_str().unwrap_or("") chain handles the case where a field is missing.

Response structure

The API returns a JSON response. The key fields are:

Field	Type	Description
`results.channels`	array	One entry per audio channel. Mono audio has one channel.
`channels[].alternatives`	array	Ranked transcript alternatives. The first alternative (`[0]`) has the highest confidence.
`alternatives[].transcript`	string	The full transcript text for this alternative.
`alternatives[].confidence`	float	Confidence score from 0.0 to 1.0.
`alternatives[].words`	array	Per-word details including `word`, `start`, `end`, and `confidence`.

Example response:

{
  "results": {
    "channels": [
      {
        "alternatives": [
          {
            "transcript": "Houston we've had a problem",
            "confidence": 0.98,
            "words": [
              { "word": "houston", "start": 0.08, "end": 0.56, "confidence": 0.99 },
              { "word": "we've", "start": 0.56, "end": 0.72, "confidence": 0.97 }
            ]
          }
        ]
      }
    ]
  }
}

You can customize transcriptions with query parameters for smart formatting, speaker diarization, language detection, and more.

If the request fails, the API returns an HTTP error status with a description.

Next steps

Transcribe from URL - transcribe a remote audio file without downloading it first
Real-time streaming - transcribe audio as it's being recorded
PII redaction - automatically remove sensitive data from transcripts

Transcribe a file

On this page