Pricing

Start for free, scale seamlessly, and pay only for what you use. DolphinVoice's billing model has no monthly fixed fees and offers tiered rates.

Speech Recognition

Speech-to-Text

72 JPY/hr

Streaming recognition with real-time transcript
Up to 37 hours per connection

36 JPY/hr

Transcribe audio & video files
1-hour audio in as fast as 10 minutes

72 JPY/hr

Transcribe files at amazing speed
1-hour audio in as fast as 2 minutes

72 JPY/hr

Streaming recognition with real-time transcript
Up to 37 hours per connection

36 JPY/hr

Transcribe audio & video files
1-hour audio in as fast as 10 minutes

72 JPY/hr

Transcribe files at amazing speed
1-hour audio in as fast as 2 minutes

※ All prices are in Japanese Yen (tax inclusive).

※ Enterprise customers can take advantage of volume discounts (for over 5,000 hours per month).

Features	Streaming	Pre-recorded (Standard)	Pre-recorded (VIP)
Basic Specs
Audio Limitation	Up to 37 hours per connection	Audio: 1GB, Video: 2GB, Duration: 5 hours	Audio: 1GB, Duration: 5 hours
Supported Formats	WAV/PCM/MP3	WAV/PCM/OPUS/MP3/MP4/M4A/AMR/3GP/AAC	WAV/PCM/OPUS/MP3/AMR/3GP/AAC
Sampling Rate	16kHz, 8kHz	16kHz, 8kHz	16kHz, 8kHz
Language Support	Japanese, English, Chinese (supports mixed Chinese-English, mixed Japanese-English)	Japanese, English, Chinese (supports mixed Chinese-English, mixed Japanese-English)	Japanese, English, Chinese (supports mixed Chinese-English, mixed Japanese-English)
Pre-recorded Speech-to-Text
Processing Speed	-	1-hour audio in as fast as 10 minutes	1-hour audio in as fast as 2 minutes
Output Formats	-	Script / Subtitle	Script / Subtitle
Speaker Diarization	-
Speech Rate Calculation	-
Advanced Features
Inverse Text Normalization (ITN)
Hotwords
Forced Correction
Forbidden Words
Disfluency Detection
Word-level Information
Intermediate Results
Amplitude Gain

Get started with DolphinVoice's Speech AI services today and experience industry-leading speech recognition and synthesis technology.