The Speak Live Transcription API provides real-time speech-to-text capabilities through WebSocket connections. You can stream audio data and receive live transcription results, or subscribe to existing transcription sessions to receive results.
Quick Start
1. Get Your API Key
First, you'll need an API key from your Speak dashboard. This key authenticates your requests and tracks usage.
2. Choose Your Connection Method
We support two connection types:
Socket.IO (Recommended): Easier to implement, better error handling
WebSocket: Lightweight, direct WebSocket connection
Socket.IO Integration (Recommended)
Step 1: Connect to the Server
// Using Socket.IO client
import { io } from 'socket.io-client';
const socket = io('wss://your-speak-server.com/v1/live', {
query: {
'speak-api-key': 'your-api-key-here',
'sourceLanguage': 'en', // Optional: language code
'folderId': 'your-folder-id', // Optional: organize transcripts
'mediaType': 'audio/webm' // Optional: specify audio format
}
});
Step 2: Start Live Streaming
// Start the transcription session
socket.emit('start-live-stream');
// Listen for successful setup
socket.on('metadata', (response) => {
console.log('Media session created:', response.mediaId);
console.log('Ready to send audio data');
});
Step 3: Send Audio Data
// Send audio chunks as they become available
socket.emit('audio-data', audioBuffer);
// Example with microphone input
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
const mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = (event) => {
if (event.data.size > 0) {
// Convert to buffer and send
event.data.arrayBuffer().then(buffer => {
socket.emit('audio-data', Buffer.from(buffer));
});
}
};
mediaRecorder.start(1000); // Send chunks every second
});
Step 4: Receive Transcription Results
// Listen for transcription results
socket.on('transcript', (response) => {
console.log('New word:', response.word.text);
console.log('Confidence:', response.word.confidence);
console.log('Speaker ID:', response.word.speakerId);
console.log('Timing:', response.word.instances);
});
// Listen for errors
socket.on('error', (error) => {
console.error('Transcription error:', error);
});
Step 5: Stop Transcription
// Stop the transcription session
socket.emit('stop-transcription');
// Listen for close confirmation
socket.on('close', (response) => {
console.log('Transcription stopped');
socket.disconnect();
});
WebSocket Integration
Step 1: Connect to WebSocket
// Direct WebSocket connection
const ws = new WebSocket('wss://your-speak-server.com:8083/v1/live-bot?speak-api-key=your-api-key-here');
ws.onopen = () => {
console.log('WebSocket connected');
// Send start-live-stream message
ws.send(JSON.stringify({
event: 'start-live-stream'
}));
};
Step 2: Send Audio Data
// Send audio data as binary
ws.onopen = () => {
// After receiving successful setup response
ws.send(audioBuffer); // Send raw audio buffer
};
Subscribing to Existing Media Sessions
You can subscribe to receive transcription results from existing sessions without sending audio data.
Step 1: Connect for Subscription
const socket = io('wss://your-speak-server.com/v1/live', {
query: {
'speak-api-key': 'your-api-key-here'
}
});
Step 2: Subscribe to Media
// Subscribe to a specific media session
socket.emit('subscribe-to-media', 'media-id-here');
// Listen for subscription confirmation
socket.on('metadata', (response) => {
if (response.status === 'subscribed') {
console.log('Successfully subscribed to media session');
}
});
Step 3: Receive Results
// Receive transcription results from the subscribed session
socket.on('transcript', (response) => {
console.log('Received word:', response.word.text);
});
// Unsubscribe when done
socket.emit('unsubscribe-from-media', 'media-id-here');
Audio Format Requirements
Supported Formats
WebM (recommended)
MP4
WAV
PCM
Recommended Settings
Sample Rate: 16kHz or 48kHz
Channels: Mono (1 channel)
Bitrate: 128kbps or higher
Chunk Size: 1-2 seconds
Response Format
{
"type": "transcript",
"mediaId": "media-123",
"timestamp": "2024-01-15T10:30:00.000Z",
"word": {
"id": 1,
"text": "hello",
"confidence": 0.95,
"language": "en",
"speakerId": "speaker-1",
"instances": {
"startInSec": 1.2,
"endInSec": 1.5
}
},
"message": "New word received",
}
Metadata Response
{
"type": "metadata",
"mediaId": "media-123",
"folderId": "folder-456",
"userId": "user-789",
"name": "Live Transcription Session",
"timestamp": "2024-01-15T10:30:00.000Z",
"message": "Media session initialized"
}
Best Practices
1. Connection Management
Always handle connection errors and reconnection
Implement exponential backoff for reconnection attempts
Monitor connection health
2. Audio Quality
Use consistent audio settings throughout the session
Avoid changing audio format mid-session
Ensure a stable internet connection
3. Error Handling
Implement proper error handling for all events
Log errors for debugging
Provide user feedback for connection issues
Support
If you encounter any issues or need help with integration:
Check our API documentation
Review the error messages in your browser console
Contact our support team with your API key and error details.