First month for free!

Get started


Speech-to-Text API

Powered by Whisper v3 - Convert audio to text quickly and reliably.

Speaker diarization - Automatically detect who is speaking.

Just $0.50 per 3 hours of speech - Lowest price on the market.

Use our audio-to-text API to build AI-powered features such as automatically generated subtitles, summaries of podcasts, or audio chats. Our API uses the latest Whisper large-v3 AI model to deliver accurate transcriptions with minimal latency and the most competitive pricing available. Transcribe 30 minutes of audio in under one minute. More than 100 languages are supported.

API Usage

Our OpenAI compatible API makes it easy to switch. If you haven't already, you will need to create an API key to authenticate your requests.

const body = new FormData();
body.append('file', '');
// instead of providing a URL you can also upload a file object:
// body.append('file', new Blob([await fs.readFile('/path/to/audio.mp3')]));
body.append('language', 'english');
body.append('response_format', 'json');

fetch('', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  body: body
.then(response => response.json()).then(data => {
.catch(error => {
  console.error('Error:', error);

API Response

Choose between different response formats to get the transcript in the format that best suits your needs. VTT and SRT are file formats that include timestamps and can be used to display subtitles in video players.

{"text": "Artificial intelligence is the intelligence of machines or software, as opposed to the intelligence of humans or animals. It is also the field of study in computer science that develops and studies intelligent machines."}

API Parameters

The API POST takes the following parameters:

Start Your Free Trial