Concatenated Voice Technology
In plain English, concatenated voice technology is the best choice for public safety applications, where a mispronounced word could misdirect a fire/EMS crew to the wrong address during an emergency.
Concatenated voice technology is based on a pre-recorded audio database. A voice talent records the necessary information (street names, incident types, etc.) and the Locution Systems voice “engine” breaks apart the recordings and stores them in fragments in the audio database. When a call comes through, it assembles specific word “bits” and audio phrases to form sentences, i.e. the emergency dispatch.
The “Pros” of Concatenated Speech Technology
- Natural sounding because it’s based on actual recordings of a human voice
- Clear and easy to understand
- Sentences are spoken with natural rhythm and pitch
- Little chance for pronunciation errors because the audio is pre-recorded
- Recordings are created in a noise-free environment
The “Cons” of Concatenated Speech Technology
- There aren’t many
- Concatenated speech technology requires the development of a pre-recorded audio database
- It requires disk space, RAM, and processing power
- Example: An audio database with 15,000 entries would require about 2 gigabytes of space, which is easily accommodated by today’s PCs.
The other option for automated voice technology is synthesized speech. However, synthesized speech has some serious drawbacks, including:
- Can sound robotic and unnatural
- Requires correctly spelled, well-punctuated, unambiguous text to generate accurate vocalizations
- Mispronounces and misconstrues words quite easily
- Can convey only limited emphasis and emotion

Sitemap