Skip to main content

Speak Live Transcription API

Complete Integration Guide

Speak Ai avatar
Written by Speak Ai
Updated this week

The Speak Live Transcription API provides real-time speech-to-text capabilities through WebSocket connections. You can stream audio data and receive live transcription results, or subscribe to existing transcription sessions to receive results.

Quick Start

1. Get Your API Key

First, you'll need an API key from your Speak dashboard. This key authenticates your requests and tracks usage.

2. Choose Your Connection Method

We support two connection types:

  • Socket.IO (Recommended): Easier to implement, better error handling

  • WebSocket: Lightweight, direct WebSocket connection


Socket.IO Integration (Recommended)

Step 1: Connect to the Server

// Using Socket.IO client
import { io } from 'socket.io-client';

const socket = io('wss://your-speak-server.com/v1/live', {
query: {
'speak-api-key': 'your-api-key-here',
'sourceLanguage': 'en', // Optional: language code
'folderId': 'your-folder-id', // Optional: organize transcripts
'mediaType': 'audio/webm' // Optional: specify audio format
}
});

Step 2: Start Live Streaming

// Start the transcription session
socket.emit('start-live-stream');

// Listen for successful setup
socket.on('metadata', (response) => {
console.log('Media session created:', response.mediaId);
console.log('Ready to send audio data');
});

Step 3: Send Audio Data

// Send audio chunks as they become available
socket.emit('audio-data', audioBuffer);

// Example with microphone input
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
const mediaRecorder = new MediaRecorder(stream);

mediaRecorder.ondataavailable = (event) => {
if (event.data.size > 0) {
// Convert to buffer and send
event.data.arrayBuffer().then(buffer => {
socket.emit('audio-data', Buffer.from(buffer));
});
}
};

mediaRecorder.start(1000); // Send chunks every second
});

Step 4: Receive Transcription Results

// Listen for transcription results
socket.on('transcript', (response) => {
console.log('New word:', response.word.text);
console.log('Confidence:', response.word.confidence);
console.log('Speaker ID:', response.word.speakerId);
console.log('Timing:', response.word.instances);
});

// Listen for errors
socket.on('error', (error) => {
console.error('Transcription error:', error);
});

Step 5: Stop Transcription

// Stop the transcription session
socket.emit('stop-transcription');

// Listen for close confirmation
socket.on('close', (response) => {
console.log('Transcription stopped');
socket.disconnect();
});


WebSocket Integration

Step 1: Connect to WebSocket

// Direct WebSocket connection
const ws = new WebSocket('wss://your-speak-server.com:8083/v1/live-bot?speak-api-key=your-api-key-here');

ws.onopen = () => {
console.log('WebSocket connected');
// Send start-live-stream message
ws.send(JSON.stringify({
event: 'start-live-stream'
}));
};

Step 2: Send Audio Data

// Send audio data as binary
ws.onopen = () => {
// After receiving successful setup response
ws.send(audioBuffer); // Send raw audio buffer
};


Subscribing to Existing Media Sessions


You can subscribe to receive transcription results from existing sessions without sending audio data.

Step 1: Connect for Subscription

const socket = io('wss://your-speak-server.com/v1/live', {
query: {
'speak-api-key': 'your-api-key-here'
}
});

Step 2: Subscribe to Media

// Subscribe to a specific media session
socket.emit('subscribe-to-media', 'media-id-here');

// Listen for subscription confirmation
socket.on('metadata', (response) => {
if (response.status === 'subscribed') {
console.log('Successfully subscribed to media session');
}
});

Step 3: Receive Results

// Receive transcription results from the subscribed session
socket.on('transcript', (response) => {
console.log('Received word:', response.word.text);
});

// Unsubscribe when done
socket.emit('unsubscribe-from-media', 'media-id-here');

Audio Format Requirements

Supported Formats

  • WebM (recommended)

  • MP4

  • WAV

  • PCM

Recommended Settings

  • Sample Rate: 16kHz or 48kHz

  • Channels: Mono (1 channel)

  • Bitrate: 128kbps or higher

  • Chunk Size: 1-2 seconds


Response Format

{
"type": "transcript",
"mediaId": "media-123",
"timestamp": "2024-01-15T10:30:00.000Z",
"word": {
"id": 1,
"text": "hello",
"confidence": 0.95,
"language": "en",
"speakerId": "speaker-1",
"instances": {
"startInSec": 1.2,
"endInSec": 1.5
}
},
"message": "New word received",
}

Metadata Response

{
"type": "metadata",
"mediaId": "media-123",
"folderId": "folder-456",
"userId": "user-789",
"name": "Live Transcription Session",
"timestamp": "2024-01-15T10:30:00.000Z",
"message": "Media session initialized"
}


Best Practices

1. Connection Management

  • Always handle connection errors and reconnection

  • Implement exponential backoff for reconnection attempts

  • Monitor connection health

2. Audio Quality

  • Use consistent audio settings throughout the session

  • Avoid changing audio format mid-session

  • Ensure a stable internet connection

3. Error Handling

  • Implement proper error handling for all events

  • Log errors for debugging

  • Provide user feedback for connection issues


Support

If you encounter any issues or need help with integration:

  1. Review the error messages in your browser console

  1. Contact our support team with your API key and error details.

Did this answer your question?