Process audio files for transcription or translation with enhanced language support. Supports multiple audio formats and provides detailed word-level timestamps and speaker diarization.
JWT token for authentication
TEXT FIELD: This is a string field (not a file upload). Provide the audio as a base64-encoded string. First convert your audio file (.mp3, .wav, .flac) to base64, then paste the resulting string here.
"base64_encoded_audio_content"
Language code (e.g. 'en' for English)
"en"
Task type - transcribe in source language or translate to English
transcribe, translate "transcribe"
Optional starting text prompt for context
"Meeting transcript between John and Sarah:"
Number of parallel sequences evaluated
1 <= x <= 5Number of best sequences considered
1 <= x <= 5Include word-level timestamps
Enable speaker diarization
Enable voice activity detection filter
Exclude timestamps from output
Enable streaming output
Minimum number of speakers to detect (0 for automatic)
x >= 0Maximum number of speakers to detect (0 for automatic)
x >= 0Number of audio samples processed in one batch
0 <= x <= 24Penalty for longer sequences (1.0 means no penalty)
x >= 0Beam search patience factor
0 <= x <= 1Minimum duration of silence for a break (seconds)
x >= 0Minimum duration for speech detection (seconds)
x >= 0Voice activity detection onset threshold
0 <= x <= 1Voice activity detection offset threshold
0 <= x <= 1Additional padding at segment end (seconds)
x >= 0Additional padding at segment start (seconds)
x >= 0Maximum duration to process (seconds)
x >= 0Successful transcription