What You Must Know about Voice-Over & Audio Files for Localization

There are literally hundreds of digital file formats that support voice-over, music and other kinds of audio recordings – everything from AAC to ZVD. They all have different feature sets, and work with different software. The sheer number of distinct file extensions is mind-boggling. So how can multimedia localization professionals navigate through all of the different formats? Start here.

This post will list what you need to know about audio file formats for voice-over and audio localization.

Why so many voice-over & audio file formats?

For one main reason – many of the thousands of early digital developers created proprietary formats that worked specifically with their own audio editing, recording, or playback software. In fact, most audio formats are really just shells that contain the same audio data, but tell their respective programs how to interact with it. These kinds of files are called “wrapper files,” since they “wrap around,” or contain another file. Moreover, some audio files don’t even contain an audio stream – they exist only to store information related to the audio in a separate file, anything from dubbing marker time-codes to sound effects, to any other data that an audio editing program might produce.

Of the formats that do have audio in them, there are three basic kinds:

  • Uncompressed – As the name suggests, uncompressed audio files are basically sound waves captured in a digital format, at the highest quality possible. These files are relatively large, and are used mainly in audio recording, video editing and post-production.
  • Compressed – These formats allow for smaller file sizes, but lose audio data (and quality) in the process. The trade-off isn’t terrible – compression programs are pretty good these days, so that most people can’t tell the difference between compressed and uncompressed audio, and the smaller file sizes allow the files to transfer much more quickly over the internet. Compressed file formats, in fact, are the reason we have Spotify and iPods. Most deliverables for corporate audio applications are compressed as well.
  • Lossless Compression – These formats reduce file size, but without losing quality or actual audio information. How is this possible? An uncompressed file has the same amount of data for every second of audio, whether it’s silence, a single voice, or an orchestra. Lossless compression formats keep only the data necessary to preserve the audio quality. That makes it a great format for file storage for large audio and video localization projects.

Which audio file formats will you actually use?

The good news is that if you work in corporate, marketing or entertainment multimedia translation, you’ll probably use four formats for the majority of your projects.

WAV (.wav)

WAV is a wrapper format created by Microsoft for use with PCs. Because it’s a wrapper format, it theoretically can contain many kinds of audio, but it’s almost always used with PCM, the most common uncompressed audio stream format. In fact, most uncompressed audio formats are really just wrappers for PCM. Entertainment and dubbing projects generally deliver WAV, since it interfaces with post editing software really well.

AIFF (.aif)

AIFF is the Mac counterpart to WAV – it was created by Apple as a PCM wrapper file. Also commonly used for entertainment and marketing deliverables, and for CD audio.

MP3 (.mp3)

By far the most commonly-used compressed format in the world. MP3 compression maintains an impressive level of audio quality in very small file sizes. Most voiceover audio deliverables for corporate and e-Learning translation use this format.

FLAC (.flac)

This is a lossless compression format, and it’s usually requested by clients who want to back up their files using as little server space as possible, on projects like audiobook narration or telephone directories, which produce a huge amount of audio.

You can see these four formats in the following screen shot:

Screen shot of computer folder with four versions of a 45-second sample voice-over audio file. The WAV version is 4,433 KB; the AIFF is 4,219 KB; the MP3 is 706 KB; and the FLAC is 1,901 KB.

Note that the file sizes for WAV and AIFF are almost identical, since they’re “wrapping” around effectively the same audio stream. Note also how much smaller the FLA and MP3 file are.

Proprietary or lesser-known formats

Occasionally, you’ll have to deliver a proprietary or lesser-known format. Some, like OGG, are well-known in particular industries, but not household names. OGG is used regularly in video games localization, as a way to get around licensing fees for other formats. Likewise, many LMS platforms or online video players also use proprietary formats specifically to get around licensing fees. And sometimes clients request support file formats – the kind that have information about the audio, but not audio itself. For example, many producers request markers in a MIDI file – essentially a text file that has time-codes corresponding to an audio file.

That said, it’s crucial to get thorough file specifications from your client before project start. Remember that many audio formats are wrappers, after all, meaning that the same format can contain different kinds of audio streams. As with multimedia localization in general, getting detailed deliverable specifications is crucial to ensuring a seamless product integration – and of course, delivering on budget and on time.

