Streaming Speech-to-Text API
WebSocket-based streaming speech recognition for instant transcription. Ideal for live broadcasting, voice assistants, and real-time captioning, with support for multiple languages.
Click to try our speech recognition service
Languages
Advanced Features
Speaker Diarization : Differentiate speakers in a single audio channel using voiceprint information.
Smart Formatting : Improve readability by applying additional formatting. When enabled, dates, times, and numbers will be displayed in conventional formats.
Filler Word Removal : Support filler word filtering to improve the readability of transcribed spoken language.
Live Transcript
Lower Latency
Support real-time interim results and provide the final transcript upon completion, with endpointing latency as low as 500ms.
Higher Accuracy
Exceptional performance in accuracy, with code-switching support for Chinese-English and Japanese-English.

Features
View all featuresMulti-Domain Support
Support optimized models for call centers with enhanced accuracy.
Smart Punctuation & Formatting
Automatic punctuation prediction and text format optimization to generate natural, readable transcripts.
Custom Vocabulary
Boost accuracy for proper nouns like names, places, and organizations with custom hot words.
Speaker Diarization
Distinguish between speakers through voiceprint information.
Filler Word Removal
Support filler word filtering to improve the readability of transcribed spoken language.
Use Cases
Provide real-time subtitles for live events like seminars to enhance the viewer experience.
Enable voice input for various scenarios such as in-car navigation and chat applications, maximizing hands-free operation.
Transcribe customer service calls in real-time, making it easier to record and analyze customer needs and improve service quality.
Real-time transcription during meetings, quickly generating meeting minutes with speaker labels and timestamps.
Improve the efficiency of medical documentation through real-time speech recognition, reducing paperwork for healthcare professionals.
Enhance learning engagement with live captions for training sessions, helping students grasp concepts more effectively.
Enable instant voice control for smart home devices and IoT applications with responsive command recognition.
Perform real-time transcription in the courtroom to ensure the accuracy and integrity of court transcripts.









