Streaming
Streaming recognition with real-time transcript
Up to 37 hours
Start for free, scale seamlessly, and pay only for what you use. DolphinVoice's billing model has no monthly fixed fees and offers tiered rates.
Speech-to-Text
※ All prices are in Japanese Yen (excluding tax). Volume discounts are available for enterprises.
※ The prices above are for connections with "Audio Storage". The saved audio will be used for our company's internal R&D purposes only.
※ For connection requests with "No Audio Storage", the server will not store any audio data or transcripts. The unit price for this option is 50% higher than the base prices shown above.
| Features | Streaming | Pre-recorded (Standard) | Pre-recorded (VIP) |
|---|---|---|---|
| Basic Specs | |||
Audio Limitation | Up to 37 hours | Audio: 1GB, Video: 2GB, Duration: 5 hours | Audio: 1GB, Duration: 5 hours |
Supported Formats | WAV/PCM/MP3 | WAV/PCM/OPUS/MP3/MP4/M4A/AMR/3GP/AAC | WAV/PCM/OPUS/MP3/AMR/3GP/AAC |
Sampling Rate | 16kHz, 8kHz | 16kHz, 8kHz | 16kHz, 8kHz |
Language Support | Japanese, English, Chinese (supports mixed Chinese-English, mixed Japanese-English) | Japanese, English, Chinese (supports mixed Chinese-English, mixed Japanese-English) | Japanese, English, Chinese (supports mixed Chinese-English, mixed Japanese-English) |
| Pre-recorded Speech-to-Text | |||
Processing Speed | - | 1-hour audio in as fast as 10 minutes | 1-hour audio in as fast as 2 minutes |
Output Formats | - | Script / Subtitle | Script / Subtitle |
Speaker Diarization | - | ||
Speech Rate Calculation | - | ||
| Advanced Features | |||
Inverse Text Normalization (ITN) | |||
Hotwords | |||
Forced Correction | |||
Forbidden Words | |||
Disfluency Detection | |||
Word-level Information | |||
Intermediate Results | |||
Amplitude Gain | |||