JBI Studios' Blog on Voice-Over, Dubbing, and Multimedia Localization.

How to Use Text-To-Speech (TTS) Voice-Over for Document Accessibility

Document accessibility is one of the more common uses for text-to-speech voice-over. Why? Because TTS is a great tool for producing large amounts of voice audio in a short amount of time. But it’s also a relatively new tool, one that comes with a unique set of challenges  in particular for multimedia localization projects.

This post details what you must know to make documents accessible in multiple languages using text-to-speech.

[Average read time: 4 minutes]

What is document accessibility exactly?

The American with Disabilities Act of 1990 mandated that public documents must be made accessible to people with disabilities. This includes benefits letters, jury summons, health care program guidebooks, web sites, or any other text-based information provided by government agencies or regulated businesses. There are many ways to make documents accessible to the blind or sight-impaired, including large print versions, translation into Grade II Braille, and of course, voicing – whether in-person, by phone, or through human and text-to-speech recordings.

What does text-to-speech offer accessibility?

Two things: its speed and pricing. Reducing user wait-times is crucial for true accessibility. And TTS is generally more cost-effective than human voiceover, especially for internal corporate, instructional and e-Learning translation projects. These two factors make it very attractive for institutions that are mandated to provide audio for their texts.

So, what do you need to know to produce TTS voice-over?

As mentioned earlier, TTS has a unique set of challenges – let's look at them.

1. Documents must be re-formatted for TTS

Unlike human voice-over talents, TTS voices have a very limited ability to gauge context. For example, a human voice talent would have little problem reading the following text:

Audio script text -- a headline followed by a mid-sized paragraph -- with hidden symbols turned on, revealing that there is a hard paragraph return at the end of each line (they've been highlighted yellow).

However, a TTS voice would have issues. Why? Because of the paragraph returns at the end of each line (highlighted in yellow).

A human understands that the heading is one phrase – it’s obvious. The TTS English voiceover font, however, would interpret each line as a separate phrase. Same for the paragraph that follows. And unfortunately, most PDF's have paragraph symbols added to the end of each line during conversion – as well as a host of other formatting elements that re-flow text and trip up TTS systems.

What can you do about this? First, allow time for an audio script formatting round before production. Second, try to get document source files (Word, InDesign, etc.) if possible – that may lower the script formatting cost and timeline. And third, remember that you may have to do this for each language of a localization project.

2. Dealing with lists

Lists require labor-intensive formatting for TTS output. And of course, many official documents contain them. Sample ballots list candidates and polling places. Health care plan guides usually list every doctor or hospital available to a patient. And corporate directories list all employees at an organization. If your document has a list, allow more time for script formatting.

3. File segmentation

If you’re working with a long document, it may be necessary to break it up into smaller audio units. The smaller audio files will be easier-to-use and more portable, whether you’re delivering the audio over the internet or on hard media. Finally, the smaller files will make it easier to manage non-Latin languages – for example, when tracking Chinese voiceover pick-ups against the script.

4. Phonetics

TTS has particular trouble with acronyms, number sequences, names and abbreviations. Depending on the accuracy needed for a project, you may have to create phonetic guides. In some languages it’s easier – for example, for Japanese voiceover projects the terms can be transliterated in katakana. Other languages may require developing a phonetics key with non-standard spellings – this would be done for French voiceover TTS, for example.

5. Not all languages have the same TTS support

TTS support in English is excellent. Engines in this language have even become quite good at recognizing textual and emotional context, and there are many TTS voices from which to choose. But very few other languages have this level of support – for example, there are drastically fewer Latin American Spanish voiceover TTS fonts available. Keep this in mind, especially if you have a large suite of languages.

TTS recordings won’t be “perfect”

This is crucial to remember for TTS projects. Voice fonts sound uncanny, and often have audible, jarring noises, as syllables and phonemes are digitally stitched together. They will trip up over homographs like lead (“to lead a tour”) and lead (“lead pipe”). And they’ll have particular difficulty with foreign-language terms (for example, English words in Brazilian Portuguese voiceover scripts). On top of that, each language has a particular set of pronunciation challenges, meaning that you can’t develop project-wide phonetics guidelines.

That said, a lot of documentation won't require perfect or human-sounding pronunciation. If a juror needs parking instructions, he or she won’t care that the audio file has digital glitches or is read in a monotone way, or doesn’t get the pronunciation of a street name just so. What they do care about is getting the right information, right when they need it, and with enough accuracy so that it's useful. For this kind of application, and for sheer speed and cost-effectiveness, it’s hard to beat TTS.

Download "7 Myths of Audio & Video Translation," JBI Studios' indispensable guide to audio translation and dubbing.

Topics: Voice-over & Audio Text-to-speech

Fill Out Form