If you're just starting out in the voice-over/dubbing industry, it can feel like everyone is speaking a foreign language. Sample rate, bit depth, mixed/unmixed audio...these are just some of the terms used by audio engineers and project managers that can be confusing.
For voice-over and dubbing projects at JBI Studios, we make sure that our clients understand what deliverables they will be receiving and take the time to explain any technical terms.
In this blog, we will follow Emily, a fictitious client, as she makes an order and will explain the terminology along the way. We hope this clears away some of the mystery about audio recording to help you on your journey in the voice-over industry.
[Average read time: 3 minutes]
Emily Requests a Quote - Audio Mixing and Dubbing Terms
Emily is a small business owner and has a 10-minute employee training video in English that she wants to get voice-over for in Chinese Mandarin. She's in a chat with James from JBI Studios who has been helping her.
Emily: Hi James, I just had a few questions about some terms. What's the difference between unmixed and mixed audio?
James: Hi Emily! Sure thing, unmixed audio is audio not combined with any other voice, music, or sound effects tracks. For you, it would just be the Mandarin voice-over audio. Mixed audio would be the voice-over audio combined with the other tracks.
Emily: You had mentioned UN-Style voice-over vs. lip-sync dubbing as potential options, can you elaborate?
James: Definitely. UN-Style voice-over is when the source language—English in your video—plays for 1-2 seconds at normal volume before it is lowered to about 20% volume when the foreign language voice-over starts—in your case, when the Chinese Mandarin voice-over starts.
Emily: So you hear the original speaker in the background?
James: That's correct. It's to keep the authenticity of the source material by imitating live interpretation. It's commonly used for documentaries, news reports, and corporate year-end videos. It would also be considered mixed audio since we'd place the Mandarin voice over the English voice. Here's an example of UN Style voice-over:
James: As for lip-sync dubbing, the original speaker's voice is removed completely. We would then record with the foreign language voice talent to closely match the character's lip movements in the video. See below for an example:
James: As you can see from the examples above, both UN-Style and lip-sync dubbing are used for videos where the speaker is on-screen. If the speaker is not on-screen, which is common for videos with narration, then we would simply replace the original narration with the new recording.
Technical Specs for Audio Recordings
After giving it some thought, Emily decides to go with UN Style voice-over for her video. However, there's still some technical terms she's not too clear about.
Emily: Hi James, I noticed you recommended 48 kHz and 16 bit recording. To be honest, I'm not familiar with these terms, do you mind explaining them?
James: Sure thing. 48 kHz, or 48,000 hertz is a sample rate. Sample rates are how fast a sample (i.e. a snapshot) of audio is taken. Using video as an analogy, we can think of kHz as the number of pictures taken per second in a movie (frames per second). If the frame rate in a movie is too low, the images will stutter and the illusion of motion will be lost. Similarly, if the sample rate of an audio recording is too low, the recording will not be smooth and sound as if something is missing.
Emily: Ah, I think I get it. So if the sample rate is like the frame rate, then 16 bit...
James: 16 bit would be like the resolution of each frame. 16 bit, 24 bit, and 32 bit are common bit depths. If you were to look at one frame from a movie, how detailed is that frame's image? For audio, bit depth measures how detailed of a recording each sample is. So a 24 bit recording captures more than a 16 bit recording, the trade off being that 24 bit recordings are a larger file size. Generally, 16 bit audio is a high enough quality for nearly all usages (e-Learning, corporate videos, etc...). For a visual, here's an image of a digital sound wave of someone speaking, 44.1 kHz, 16 bit.
Emily: Nice, I'm more of a visual learner myself. So, why do you recommend 48 kHz instead of 44.1 kHz for video?
James: 44.1 kHz is the standard for audio recording; audio CDs are traditionally 44.1 kHz and 16 bits. 48 kHz became the standard audio for video because of the popular use of the Sony Digital Audio Tape (DAT) introduced in 1987. It could record audio at 48 kHz, a nice even number compared to the awkward 44.1 kHz. 48 kHz proved to work well with common video recording frame rates and thus became the industry standard for audio recordings for video ever since.
Emily: Wow, you know your audio history.
James: Haha, thanks to the internet.
Hope the example above with Emily and James helped you gain some more knowledge about audio terms. Other key terms, such as understanding different audio formats, can be found in a previous JBI blog on audio files for localization. To summarize, here's the terms we've learned:
- Unmixed vs mixed audio
- UN-Style voice-over
- lip-sync dubbing
- sample rate
- bit depth
In addition to audio terminology, there are other key things to keep in mind when asking for a quote. Download the free checklist below to make sure you cover all your bases.