Pricing

Start for free, scale seamlessly, and pay only for what you use. DolphinVoice's billing model has no monthly fixed fees and offers tiered rates.

Free

  • Access to industry-leading speech recognition and synthesis models
  • Speech Recognition (Speech-to-Text) Free Trial: 600 minutes/month
  • Speech Synthesis (Text-to-Speech) Free Trial: 10k chars/month
  • Concurrency Limits: 10 for streaming, 2 for pre-recorded
  • Developer docs, email-based support, and resources to help you build

Pay as you go

  • Experience faster file transcription services
  • Customized SLAs for your actual workload
  • Dedicated technical support with rapid response
  • Enjoy tiered rates for monthly usage over 30,000 hours

Speech Recognition

Speech-to-Text

Streaming

90 JPY/hr

Streaming recognition with real-time transcript
Up to 37 hours

Pre-recorded (Standard)

54 JPY/hr

Transcribe audio & video files
1-hour audio in as fast as 10 minutes

Pre-recorded (VIP)

72 JPY/hr

Transcribe files at amazing speed
1-hour audio in as fast as 2 minutes

※ All prices are in Japanese Yen (excluding tax). Volume discounts are available for enterprises.

※ The prices above are for connections with "Audio Storage". The saved audio will be used for our company's internal R&D purposes only.

※ For connection requests with "No Audio Storage", the server will not store any audio data or transcripts. The unit price for this option is 50% higher than the base prices shown above.

FeaturesStreamingPre-recorded (Standard)Pre-recorded (VIP)
Basic Specs
Audio Limitation
Up to 37 hours
Audio: 1GB, Video: 2GB, Duration: 5 hours
Audio: 1GB, Duration: 5 hours
Supported Formats
WAV/PCM/MP3
WAV/PCM/OPUS/MP3/MP4/M4A/AMR/3GP/AAC
WAV/PCM/OPUS/MP3/AMR/3GP/AAC
Sampling Rate
16kHz, 8kHz
16kHz, 8kHz
16kHz, 8kHz
Language Support
Japanese, English, Chinese (supports mixed Chinese-English, mixed Japanese-English)
Japanese, English, Chinese (supports mixed Chinese-English, mixed Japanese-English)
Japanese, English, Chinese (supports mixed Chinese-English, mixed Japanese-English)
Pre-recorded Speech-to-Text
Processing Speed
-
1-hour audio in as fast as 10 minutes
1-hour audio in as fast as 2 minutes
Output Formats
-
Script / Subtitle
Script / Subtitle
Speaker Diarization
-
Speech Rate Calculation
-
Advanced Features
Inverse Text Normalization (ITN)
Hotwords
Forced Correction
Forbidden Words
Disfluency Detection
Word-level Information
Intermediate Results
Amplitude Gain

Need a custom solution? Speak with our sales team for tailored enterprise pricing.

Get started with DolphinVoice's Speech AI services today and experience industry-leading speech recognition and synthesis technology.