A Brief Look at ITN Technology in Speech Recognition
Inverse Text Normalization (ITN) is the process of converting the "normalized" textual output from AI speech recognition back into a "non-normalized" form that matches written conventions.

What is ITN?
Inverse Text Normalization (ITN) is the process of converting the "normalized" textual form produced by AI speech recognition into a "non-normalized" textual form. For example, when you say "three point five four", the ASR system may output the literal string "three point five four"; ITN transforms it into "3.54" to conform to written conventions.
"Normalized" here is defined relative to the modeling units used to train the recognition model. In an English ASR model the basic unit is an English character, so the character sequence "three point five four" is regarded as the normalized form, whereas "3.54" is the non-normalized form.
ITN in DolphinVoice
When you use the DolphinVoice Speech Recognition API, ITN is enabled by default. You can toggle it explicitly with the parameter enable_inverse_text_normalization. DolphinVoice’s ITN module handles the following transformations:
- Numbers & Symbols: With ITN, spoken numbers and symbols are turned into written form, e.g. "twenty percent" → "20%".
- Currencies & Units: With ITN, spoken currency and unit phrases are converted to standard written forms, e.g. "twenty dollars" → "$20".
- Dates & Times: With ITN, diverse spoken date/time expressions are normalized to consistent written forms, e.g. "April twenty-third, twenty twenty-five" → "April 23, 2025".
Examples:
| ITN Enabled | ITN Disabled |
|---|---|
| twenty percent | 20% |
| one thousand two hundred thirty-four dollars | $1,234 |
| April third | April 3 |
How ITN is Implemented
DolphinVoice uses Finite State Transducer (FST) to achieve ITN by defining a series of transformation rules.
FST is an extended finite state machine that can not only handle state transitions but also output corresponding characters or symbols during the state transition process. Each state machine contains a set of states and transitions between those states, with each transition having an input and an output. FST maps input to output through these states and transformation rules, where each rule contains an input expression and its corresponding output expression.
When using FST for text transformation, the input text stream passes through the finite state transducer, and each matching rule is executed within the transducer to generate the corresponding output. Some transformations may depend on context, such as currency symbols or units; for example, converting "twenty dollars" to "$20" requires considering the context of "dollars". FST can remember the context by maintaining states, allowing for more complex transformations.
The advantage of this approach is that as the usage scenarios expand, more complex syntax and semantic rules can be added to continuously optimize the rule set. Machine learning methods can also be utilized to optimize the FST rule set to accommodate more complex natural language processing needs. With the advancement of technology, ITN will increasingly combine data-driven methods to further enhance its accuracy and applicability.
Why ITN Matters
ITN dramatically improves readability. Spoken language in our daily communication is often rich in expressive flexibility, but this diversity, when directly converted into written text, may appear less intuitive or not aligned with reading habits. This is particularly evident in areas such as dates, times, currencies, and percentages, where spoken expressions often differ significantly from their written forms. For example, numbers spoken in oral communication, if directly presented in textual form without conversion, may cause confusion for readers. By transforming these colloquial expressions into conventional written forms, ITN helps make the generated text easier to understand and use. This process not only ensures more precise information delivery but also allows readers to quickly grasp the meaning without additional cognitive effort, thereby significantly enhancing the readability and usability of the information.
FAQs
DolphinAI, K.K. is SOC 2 Type 1 and ISMS (ISO/IEC 27001) certified, providing high-accuracy speech recognition in a secure environment, with an average daily usage of approximately 7,000 hours. In the call center industry, DolphinVoice's services have been officially integrated and commercialized in Cloopen's SimpleConnect platform. We have also collaborated with Sanntsu Telecom Service Corporation to jointly develop and launch the AI Call Memo Service.
For inquiries regarding access to the speech recognition system or related questions, feel free to reach out.
Get started now
- Log in to DolphinVoice – start your free trial
- Browse the API docs – technical specs & guides
- Visit our website – service details & case studies
About the Author
Masahiro Asakura / Andy Yan
- CEO, DolphinAI, K.K.
- Former Director of Global Business, Advanced Media Inc. (8 years)
- 12 years of hands-on experience deploying voice-AI solutions
- Track Record: Supported voice AI deployment for over 30 enterprises
- Domains: ASR, TTS, call center AI, AI meeting minutes, voice-interaction devices
- Markets: Japan, Mainland China, Taiwan, Hong Kong
- Publications: 100+ technical articles
Public Presentations
- "AI New Forces · Product Open Day" by Tokyo Generative AI Development Community (October 25, 2025)
- "TOPAI International AI Ecosystem Frontier Private Salon" by TOPAI & Inspireland Incubator (July 29, 2025)
- "Global AI Conference & Hackathon" by WaytoAGI (June 7, 2025)
Contact
Email: mh.asakura@dolphin-ai.jp
LinkedIn: https://www.linkedin.com/in/14a9b882/
About DolphinAI, K.K.
An AI company specializing in speech recognition, speech synthesis and voice dialogue technologies for Japanese and other languages.
Product: DolphinVoice (Voice-Interactive AI SaaS Platform)
Key Features: ASR (Japanese, Mandarin, English, Mandarin-English mixed, Japanese-English mixed), TTS (Japanese, Mandarin, English)
Usage: About 7,000 accumulated hours per day in call center and AI meeting minutes scenarios
■ Security & Compliance
- ISMS (ISO/IEC 27001) Certified
- SOC 2 Type 1 Report Obtained
- Details ️
■ Contact
️ (+81)03-6161-7298
Share Article
Read more

Conversation "Visualization": AI Call Memo Service
Supporting the improvement of CloCall customer service efficiency with DolphinVoice's call center speech recognition.

DolphinVoice Service Adopted by Cloopen SimpleConnect Cloud Platform
DolphinAI's Pre-recorded & Streaming Speech-to-Text service for call centers has been officially recognized by Cloopen and is now being promoted to enterprise clients on its SimpleConnect platform.

Prof. Kitaoka of TUT Appointed as Technical Advisor for DolphinAI
Professor Norihide Kitaoka of the Department of Computer Science and Engineering, Toyohashi University of Technology will join our company as an AI Technical Advisor.