Skip to main content

Upload & Transcribe

Upload audio and video files to Speak AI via MCP tools or CLI. Supports 70+ languages, automatic speaker identification, and real-time status tracking.

Written by Speak Ai

Upload & Transcribe

Speak AI's MCP server includes tools for uploading files by URL, tracking processing, and retrieving transcripts. Use from Claude, ChatGPT, Cursor, the CLI, or any MCP-compatible client.

Upload tools

  • upload_and_analyze — upload a file by URL and run AI analysis in one step (recommended for most workflows)

  • upload_media — upload a file by URL, transcription only

  • upload_local_file — upload from a local file path (CLI and local MCP server only)

  • get_signed_upload_url — get a pre-signed URL for direct upload from your own application

Status and retrieval tools

  • get_media_status — check whether a file has finished processing

  • get_transcript — retrieve the full transcript with speaker labels and timestamps

  • get_captions — download the captions or subtitle file

  • list_supported_languages — see all 70+ supported transcription languages

Example prompts (via Claude or ChatGPT)

  • "Upload and transcribe this recording: [URL]"

  • "Check the status of media ID 12345"

  • "Get the transcript for my most recent upload"

  • "Transcribe this recording in Spanish: [URL]"

Tips

  • Use upload_and_analyze when you want AI insights right away — it combines upload + transcription + AI Chat in one call.

  • Supported formats: MP3, MP4, M4A, WAV, WEBM, MOV, OGG, and more. Max file size: 2 GB (Individual), 5 GB (Team).

  • Speaker identification: Virtual meetings (Zoom, Google Meet, Teams) get names automatically. Uploaded recordings get Speaker 0, 1, 2 — rename them in the app or ask your AI assistant to rename them after transcription.


Still stuck? Chat with us below or email [email protected].

Did this answer your question?