Speak Live Transcription API | Speak Ai Inc Help Center

The Speak Live Transcription API provides real-time speech-to-text capabilities through WebSocket connections. You can stream audio data and receive live transcription results, or subscribe to existing transcription sessions to receive results.

Quick Start

1. Get Your API Key

First, you'll need an API key from your Speak dashboard. This key authenticates your requests and tracks usage.

2. Choose Your Connection Method

We support two connection types:

Socket.IO (Recommended): Easier to implement, better error handling

WebSocket: Lightweight, direct WebSocket connection

Socket.IO Integration (Recommended)

Step 1: Connect to the Server

// Using Socket.IO client
import { io } from 'socket.io-client';

const socket = io('wss://your-speak-server.com/v1/live', {
  query: {
    'speak-api-key': 'your-api-key-here',
    'sourceLanguage': 'en', // Optional: language code
    'folderId': 'your-folder-id', // Optional: organize transcripts
    'mediaType': 'audio/webm' // Optional: specify audio format
  }
});

Step 2: Start Live Streaming

// Start the transcription session
socket.emit('start-live-stream');

// Listen for successful setup
socket.on('metadata', (response) => {
  console.log('Media session created:', response.mediaId);
  console.log('Ready to send audio data');
});

Step 3: Send Audio Data

// Send audio chunks as they become available
socket.emit('audio-data', audioBuffer);

// Example with microphone input
navigator.mediaDevices.getUserMedia({ audio: true })
  .then(stream => {
    const mediaRecorder = new MediaRecorder(stream);
    
    mediaRecorder.ondataavailable = (event) => {
      if (event.data.size > 0) {
        // Convert to buffer and send
        event.data.arrayBuffer().then(buffer => {
          socket.emit('audio-data', Buffer.from(buffer));
        });
      }
    };
    
    mediaRecorder.start(1000); // Send chunks every second
  });

Step 4: Receive Transcription Results

// Listen for transcription results
socket.on('transcript', (response) => {
  console.log('New word:', response.word.text);
  console.log('Confidence:', response.word.confidence);
  console.log('Speaker ID:', response.word.speakerId);
  console.log('Timing:', response.word.instances);
});

// Listen for errors
socket.on('error', (error) => {
  console.error('Transcription error:', error);
});

Step 5: Stop Transcription

// Stop the transcription session
socket.emit('stop-transcription');

// Listen for close confirmation
socket.on('close', (response) => {
  console.log('Transcription stopped');
  socket.disconnect();
});

WebSocket Integration

Step 1: Connect to WebSocket

// Direct WebSocket connection
const ws = new WebSocket('wss://your-speak-server.com:8083/v1/live-bot?speak-api-key=your-api-key-here');

ws.onopen = () => {
  console.log('WebSocket connected');
  // Send start-live-stream message
  ws.send(JSON.stringify({
    event: 'start-live-stream'
  }));
};

Step 2: Send Audio Data

// Send audio data as binary
ws.onopen = () => {
  // After receiving successful setup response
  ws.send(audioBuffer); // Send raw audio buffer
};

Subscribing to Existing Media Sessions

You can subscribe to receive transcription results from existing sessions without sending audio data.

Step 1: Connect for Subscription

const socket = io('wss://your-speak-server.com/v1/live', {
  query: {
    'speak-api-key': 'your-api-key-here'
  }
});

Step 2: Subscribe to Media

// Subscribe to a specific media session
socket.emit('subscribe-to-media', 'media-id-here');

// Listen for subscription confirmation
socket.on('metadata', (response) => {
  if (response.status === 'subscribed') {
    console.log('Successfully subscribed to media session');
  }
});

Step 3: Receive Results

// Receive transcription results from the subscribed session
socket.on('transcript', (response) => {
  console.log('Received word:', response.word.text);
});

// Unsubscribe when done
socket.emit('unsubscribe-from-media', 'media-id-here');

Audio Format Requirements

Supported Formats

WebM (recommended)

MP4

WAV

PCM

Recommended Settings

Sample Rate: 16kHz or 48kHz

Channels: Mono (1 channel)

Bitrate: 128kbps or higher

Chunk Size: 1-2 seconds

Response Format

{
  "type": "transcript",
  "mediaId": "media-123",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "word": {
    "id": 1,
    "text": "hello",
    "confidence": 0.95,
    "language": "en",
    "speakerId": "speaker-1",
    "instances": {
      "startInSec": 1.2,
      "endInSec": 1.5
    }
  },
  "message": "New word received",
}

Metadata Response

{
  "type": "metadata",
  "mediaId": "media-123",
  "folderId": "folder-456",
  "userId": "user-789",
  "name": "Live Transcription Session",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "message": "Media session initialized"
}

Supported Languages & Code

Language Code	Language Name
en-US	English (US)
en-AU	English (Australia)
en-GB	English (British)
en-IN	English (Indian)
en-IE	English (Irish)
en-NZ	English (New Zealand)
en-AB	English (Scottish)
en-ZA	English (South African)
en-ES	English + Spanish (Multiple)
fr-FR	French
fr-CA	French (Canada)
es-ES	Spanish
es-MX	Spanish (Mexico)
bg-BG	Bulgarian
ca-ES	Catalan
zh-CN	Chinese (Simplified)
zh-TW	Chinese (Traditional)
zh-HK	Chinese (Cantonese, Traditional)
cs-CZ	Czech
da-DK	Danish
nl-NL	Dutch
et-EE	Estonian
nl-BE	Flemish
fi-FI	Finnish
de-CH	German (Swiss)
de-DE	German
el-GR	Greek
hi-IN	Hindi
hi-Latn	Hindi (Latin)
hu-HU	Hungarian
id-ID	Indonesian
it-IT	Italian
ja-JP	Japanese
ko-KR	Korean
lt-LT	Lithuanian
lv-LV	Latvian
ms-MY	Malay
nb-NO	Norwegian
pl-PL	Polish
pt-BR	Portuguese (Brazilian)
pt-PT	Portuguese (Portugal)
ro-RO	Romanian
ru-RU	Russian
sk-SK	Slovak
sv-SE	Swedish
taq	Tamasheq
th-TH	Thai
tr-TR	Turkish
uk-UA	Ukrainian
vi-VN	Vietnamese