1
00:00:00,000 --> 00:00:02,100
I watch a lot of movies and TV 

2
00:00:02,100 --> 00:00:03,666
on the train, at home

3
00:00:03,666 --> 00:00:05,433
[overlapping] at the movies,
while working out, while doing dishes

4
00:00:05,433 --> 00:00:05,933
in the bath...

5
00:00:05,966 --> 00:00:07,400
But no matter where I'm watching

6
00:00:07,400 --> 00:00:10,033
I find myself constantly doing this one thing.

7
00:00:10,133 --> 00:00:11,633
[unintelligible]

8
00:00:11,633 --> 00:00:12,600
What?

9
00:00:14,066 --> 00:00:15,633
[unintelligible]

10
00:00:16,166 --> 00:00:17,300
[exhasperated sigh]

11
00:00:18,300 --> 00:00:19,766
[unintelligible]

12
00:00:20,500 --> 00:00:21,633
[exhasperated sigh]

13
00:00:22,933 --> 00:00:24,500
[unintelligible]

14
00:00:25,466 --> 00:00:26,433
Oh.

15
00:00:27,233 --> 00:00:29,666
It turns out this isn't unusual.

16
00:00:29,900 --> 00:00:31,200
We polled our YouTube audience

17
00:00:31,200 --> 00:00:34,500
and about 57% of people 
said that they feel like 

18
00:00:34,500 --> 00:00:37,733
they can't understand the dialogue 
in the things that they watch

19
00:00:37,733 --> 00:00:39,333
unless they're using subtitles.

20
00:00:39,766 --> 00:00:42,900
But it feels like this hasn't 
always been the case.

21
00:00:43,333 --> 00:00:46,600
So to figure out what was going on, 
I made a call.

22
00:00:46,700 --> 00:00:48,666
Hi, my name is 
Austin Olivia Kendrick.

23
00:00:48,666 --> 00:00:51,566
I am a professional dialogue editor 
for film and TV.

24
00:00:51,566 --> 00:00:55,333
I basically perform audio surgery 
on actors words.

25
00:00:55,366 --> 00:00:57,000
Do you watch with subtitles?

26
00:00:57,266 --> 00:00:58,533
I–

27
00:00:58,533 --> 00:01:00,000
I do, actually.

28
00:01:00,000 --> 00:01:01,433
I do a lot of the time. 

29
00:01:01,433 --> 00:01:02,200
So...

30
00:01:02,200 --> 00:01:04,900
Why do you think that we all feel like
we need subtitles now?

31
00:01:05,132 --> 00:01:08,200
I get asked this question all the time.

32
00:01:08,400 --> 00:01:09,166
All the time.

33
00:01:09,166 --> 00:01:10,800
It's something that is...

34
00:01:11,066 --> 00:01:13,266
It doesn't have a simple, 
straightforward answer.

35
00:01:13,666 --> 00:01:16,100
It's very layered and very complex. 

36
00:01:16,100 --> 00:01:19,400
And after talking to Austin 
for almost 2 hours, it's true.

37
00:01:19,400 --> 00:01:22,266
It's a very layered and complex topic.

38
00:01:22,266 --> 00:01:25,466
But everything kept pointing back
to one main thing.

39
00:01:26,100 --> 00:01:28,533
Technology that got us from this...

40
00:01:28,533 --> 00:01:30,600
–I'll get you, my pretty.
–You should be kissed and often.

41
00:01:30,600 --> 00:01:32,533
No, Richard, no. What has happened...

42
00:01:32,533 --> 00:01:33,133
To this. 

43
00:01:33,133 --> 00:01:33,833
Mom, I just woke up. 

44
00:01:33,833 --> 00:01:35,166
...little slim-waisted birdy...

45
00:01:35,166 --> 00:01:36,300
[unintelligible]

46
00:01:38,400 --> 00:01:40,266
Let's start with microphones.

47
00:01:40,266 --> 00:01:42,566
I'm going to use this clip from 
"Singin in the Rain"

48
00:01:42,566 --> 00:01:44,300
to show how mics used to work.

49
00:01:44,300 --> 00:01:46,533
Here's the mic, you talk towards it. 

50
00:01:46,533 --> 00:01:49,533
The sound goes through the cable 
to the box.

51
00:01:49,833 --> 00:01:53,366
A man records it on a big record in wax.

52
00:01:54,866 --> 00:01:57,933
This scene illustrates some of 
the difficulties and intricacies 

53
00:01:57,933 --> 00:01:59,366
early sound recordings.

54
00:01:59,800 --> 00:02:02,066
Mics were big, bulky, temperamental

55
00:02:02,066 --> 00:02:04,600
and required creative solutions to be hidden.

56
00:02:04,966 --> 00:02:07,766
They were wired and recorded 
onto hard memory

57
00:02:07,766 --> 00:02:09,900
like wax and eventually tape.

58
00:02:10,265 --> 00:02:12,500
No matter how many actors 
were in a scene

59
00:02:12,500 --> 00:02:14,700
all sound got recorded to one track.

60
00:02:15,066 --> 00:02:17,400
So performers had to be diligently focused

61
00:02:17,400 --> 00:02:19,066
and facing a certain angle

62
00:02:19,066 --> 00:02:21,166
so that their words could be picked up.

63
00:02:21,166 --> 00:02:22,266
Otherwise...

64
00:02:22,766 --> 00:02:23,833
[muted noise, as if from far away]

65
00:02:24,500 --> 00:02:25,233
[sudden sound up]

66
00:02:25,233 --> 00:02:26,266
[muted noise, as if from far away]

67
00:02:26,266 --> 00:02:26,933
[sudden sound up]

68
00:02:27,066 --> 00:02:29,133
You couldn't hear a thing.

69
00:02:29,133 --> 00:02:31,300
But technology's improved 
to the point where

70
00:02:31,300 --> 00:02:34,166
microphones don't impede performance 
as much anymore.

71
00:02:34,533 --> 00:02:37,933
They become better, smaller, wireless...

72
00:02:38,100 --> 00:02:39,900
and we use more of them

73
00:02:39,900 --> 00:02:42,466
to ensure that performances get captured.

74
00:02:42,566 --> 00:02:45,700
What we typically are working with
from production dialogue

75
00:02:45,700 --> 00:02:47,066
is 2 boom microphones

76
00:02:47,066 --> 00:02:49,666
and then every actor has at least 
one lavaliere microphone

77
00:02:49,666 --> 00:02:51,566
hidden somewhere on them.

78
00:02:51,900 --> 00:02:54,633
These shrinking mics have 
given actors the flexibility

79
00:02:54,633 --> 00:02:57,366
to be more naturalistic in their performances.

80
00:02:57,366 --> 00:02:58,800
They no longer need to project

81
00:02:58,800 --> 00:03:00,566
so that their words reach the mic.

82
00:03:00,566 --> 00:03:02,700
They can speak softly,
knowing that the tiny mic

83
00:03:02,700 --> 00:03:04,833
hidden on their body will pick up 
what they're saying.

84
00:03:05,566 --> 00:03:08,733
And my personal favorite example 
of this performance shift

85
00:03:08,733 --> 00:03:10,933
is Alec Baldwin on 30 Rock.

86
00:03:11,233 --> 00:03:14,200
In a 2011 speech slash roast,

87
00:03:14,233 --> 00:03:17,000
Tina Fey says that "He speaks so quietly

88
00:03:17,000 --> 00:03:19,866
that she can't hear him when 
she's standing next to him."

89
00:03:20,100 --> 00:03:23,233
"And then you play the film back 
and it's there somehow."

90
00:03:23,700 --> 00:03:27,066
Just listen to this whisper off 
between him and Will Arnett.

91
00:03:27,066 --> 00:03:28,166
I'm not afraid of you. 

92
00:03:28,166 --> 00:03:28,900
Yeah.

93
00:03:29,900 --> 00:03:31,066
Well, you should be. 

94
00:03:31,066 --> 00:03:34,100
Let's just see how it all 
shakes out in the meeting. 

95
00:03:34,100 --> 00:03:37,000
Naturalism isn't always the best 
for intelligibility, though.

96
00:03:37,400 --> 00:03:40,700
Take Tom Hardy, an actor that 
I personally love

97
00:03:40,700 --> 00:03:43,566
but who famously is a mumbler. 

98
00:03:44,766 --> 00:03:46,866
????????????

99
00:03:47,800 --> 00:03:48,766
I mean...

100
00:03:48,766 --> 00:03:50,433
the mic picked that line up fine.

101
00:03:50,433 --> 00:03:52,933
Like we can definitely hear that he's talking

102
00:03:52,933 --> 00:03:54,533
he's saying something.

103
00:03:54,533 --> 00:03:56,633
But once that mumble gets recorded

104
00:03:56,633 --> 00:03:59,100
it's on to a dialog editor's shoulder

105
00:03:59,100 --> 00:04:01,500
to make it as intelligible as possible.

106
00:04:01,800 --> 00:04:05,000
And that was a lot harder when 
everything was analog.

107
00:04:05,300 --> 00:04:07,066
While you could pick the best takes

108
00:04:07,066 --> 00:04:08,700
and physically splice them together.

109
00:04:09,000 --> 00:04:12,566
If some piece of dialog was 
truly impossible to understand

110
00:04:12,566 --> 00:04:16,466
then actors will come in and 
rerecord those specific lines

111
00:04:16,466 --> 00:04:20,933
in a process called ADR or 
automated dialog replacement

112
00:04:20,933 --> 00:04:23,466
which you can see Meryl Streep 
do in this scene

113
00:04:23,466 --> 00:04:25,133
from "Postcards from the Edge".

114
00:04:28,233 --> 00:04:30,700
There isn't enough money in the world
to further cause like yours

115
00:04:30,933 --> 00:04:32,966
That still gets done today, but..

116
00:04:33,100 --> 00:04:34,700
ADR also costs money

117
00:04:34,700 --> 00:04:37,700
because you're not only 
paying for the actors time

118
00:04:37,700 --> 00:04:39,900
you're paying for the engineer's time

119
00:04:39,900 --> 00:04:41,166
and then the editor's time.

120
00:04:41,166 --> 00:04:45,033
So we try to do ADR, frankly, 
as little as possible.

121
00:04:45,033 --> 00:04:48,900
And so a lot of her job is 
making words sound better.

122
00:04:48,900 --> 00:04:50,600
The show I'm currently working on

123
00:04:50,600 --> 00:04:53,466
I remember in the middle 
of this one word

124
00:04:53,466 --> 00:04:55,700
there was just this loud metal clang

125
00:04:55,700 --> 00:04:57,366
that I couldn't remove.

126
00:04:57,366 --> 00:05:00,133
So I had to go in and I had to 
find an alternate take of it

127
00:05:00,133 --> 00:05:02,033
that fit and then I had to fit it...

128
00:05:02,300 --> 00:05:05,366
to the movement of her mouth in that moment

129
00:05:05,366 --> 00:05:06,633
and then push it in.

130
00:05:06,633 --> 00:05:09,133
And once she's done with it
it's sent off to a mixer

131
00:05:09,133 --> 00:05:11,766
who works to make sure 
the frequencies of the sound effects

132
00:05:11,766 --> 00:05:13,233
and music don't overlap

133
00:05:13,233 --> 00:05:15,266
with the frequencies of the human voice

134
00:05:15,266 --> 00:05:17,766
something that's only possible 
now that the world

135
00:05:17,766 --> 00:05:19,000
has moved away from tape

136
00:05:19,000 --> 00:05:20,900
and into digital recordings.

137
00:05:20,900 --> 00:05:22,300
That is a big challenge.

138
00:05:22,300 --> 00:05:26,100
Carving out those frequencies, 
that space...

139
00:05:26,400 --> 00:05:28,600
amongst every other element 
of the mix

140
00:05:28,600 --> 00:05:31,500
for the dialogue to be able to punch through

141
00:05:31,500 --> 00:05:35,400
and not be all muddied up by any other sounds

142
00:05:35,400 --> 00:05:37,833
that exist in that band of frequencies.

143
00:05:37,833 --> 00:05:39,066
But even with all that work

144
00:05:39,066 --> 00:05:42,300
lines of dialog can still be hard to understand.

145
00:05:42,400 --> 00:05:45,000
The kind of feeling has been

146
00:05:45,000 --> 00:05:48,300
if you want your movie to feel 
quote unquote cinematic

147
00:05:48,300 --> 00:05:52,733
you have to have wall-to-wall 
bombastic, loud sound.

148
00:05:52,800 --> 00:05:54,133
A lot of people will ask like

149
00:05:54,133 --> 00:05:56,533
"Why don't you just turn the dialog up?"

150
00:05:56,966 --> 00:05:58,900
Like, just turn it up.

151
00:05:58,900 --> 00:06:02,066
And... if only it was that simple.

152
00:06:02,466 --> 00:06:04,833
Because a big thing that we want to preserve

153
00:06:04,833 --> 00:06:06,666
is a concept called dynamic range.

154
00:06:06,666 --> 00:06:10,033
The range between your quietest sound
and your loudest sound.

155
00:06:10,033 --> 00:06:14,633
If you have your dialog,
that's going to be at the same volume

156
00:06:14,633 --> 00:06:17,133
as an explosion that immediately follows it.

157
00:06:17,133 --> 00:06:19,233
The explosion is not going to feel as big.

158
00:06:19,500 --> 00:06:22,733
You need that contrast in volume

159
00:06:22,733 --> 00:06:25,466
in order to give your ear a sense of scale.

160
00:06:25,500 --> 00:06:28,633
But the thing is, you can only make something so loud

161
00:06:28,633 --> 00:06:30,100
before it gets distorted.

162
00:06:30,100 --> 00:06:32,666
So if you want to create 
that wide dynamic range

163
00:06:32,666 --> 00:06:36,066
you have no choice but to push 
those quieter sounds lower

164
00:06:36,066 --> 00:06:39,066
instead of pushing the louder sounds louder.

165
00:06:39,100 --> 00:06:43,033
So explosions go up and dialog comes down.

166
00:06:43,033 --> 00:06:46,033
Which brings us to the
Christopher Nolan of it all. 

167
00:06:46,033 --> 00:06:48,300
[loud music layered over]
A seperate structure within the others—

168
00:06:48,300 --> 00:06:50,266
[Tom Hardy mumbling into 
a face mask]

169
00:06:50,266 --> 00:06:52,166
[rocket blasters layered over]
Pushing out of orbit!

170
00:06:52,166 --> 00:06:54,600
Nearly every film of his has been criticized

171
00:06:54,600 --> 00:06:55,966
for its hard to hear dialogue

172
00:06:55,966 --> 00:06:58,233
that essentially begs for subtitles.

173
00:06:58,533 --> 00:07:02,500
But as as this headline explains, 
he likes it that way.

174
00:07:02,500 --> 00:07:05,433
According to an interview in a book 
called The Nolan Variations

175
00:07:05,433 --> 00:07:07,733
he said that he gets a lot of complaints.

176
00:07:07,900 --> 00:07:10,066
Even from other filmmakers who would say

177
00:07:10,333 --> 00:07:13,233
"I just saw your film and the dialogue is inaudible."

178
00:07:13,766 --> 00:07:17,266
"The truth was it was kind of 
the whole enchilada

179
00:07:17,266 --> 00:07:19,333
of how we had chosen to mix it."

180
00:07:19,333 --> 00:07:22,566
And in his 2017 interview 
with Indiewire, he said

181
00:07:22,566 --> 00:07:24,600
"We made the decision 
a couple of films ago

182
00:07:24,600 --> 00:07:27,900
that we weren't going to mix films 
for substandard theaters"

183
00:07:28,200 --> 00:07:31,800
And this is kind of the crux of the matter.

184
00:07:31,800 --> 00:07:34,433
The content that we watch here 

185
00:07:34,433 --> 00:07:35,366
and here and here

186
00:07:35,366 --> 00:07:37,566
is not mixed for us, primarily.

187
00:07:37,933 --> 00:07:41,433
Rerecording mixers mix 
for the widest surround

188
00:07:41,433 --> 00:07:43,600
sound format that is available

189
00:07:43,600 --> 00:07:45,000
typically like big release films.

190
00:07:45,000 --> 00:07:46,666
That is Dolby Atmos....

191
00:07:46,666 --> 00:07:50,233
which has true 3D sound 
up to 128 channels.

192
00:07:50,433 --> 00:07:52,566
The thing is, if you're not at 
a movie theater

193
00:07:52,566 --> 00:07:55,200
that can showcase the best sound 
Hollywood has to offer...

194
00:07:55,600 --> 00:07:58,000
you can't experience all of those channels.

195
00:07:58,000 --> 00:08:02,666
So after the movie is mixed for 
the 128 Atmos tracks

196
00:08:02,666 --> 00:08:05,666
somebody has to create a separate version 
of the film's audio

197
00:08:05,666 --> 00:08:11,166
where all those same sounds live 
on one or two or five tracks.

198
00:08:11,166 --> 00:08:12,900
This is called downmixing.

199
00:08:12,900 --> 00:08:16,366
Downmixing is the process of taking that biggest mix

200
00:08:16,366 --> 00:08:18,633
and folding it down into formats

201
00:08:18,633 --> 00:08:20,433
with lesser channels available to it.

202
00:08:20,433 --> 00:08:25,133
So say Atmos down to 7.1 or 
7.1 down to 5.1

203
00:08:25,133 --> 00:08:26,733
or 5.1 down to stereo

204
00:08:26,733 --> 00:08:27,900
stereo down to mono.

205
00:08:29,900 --> 00:08:34,933
Unlike old TVs that were gigantic and 
had a ton of space for speakers

206
00:08:34,933 --> 00:08:36,600
TVs today are super thin 

207
00:08:36,600 --> 00:08:38,666
like this one that I have 
in my living room

208
00:08:38,666 --> 00:08:41,433
is about the same thickness as my iPhone.

209
00:08:41,832 --> 00:08:45,000
So even though it's outputting the same 
mono or stereo sound

210
00:08:45,000 --> 00:08:48,900
that an older TV might, it's still going 
to sound worse

211
00:08:48,900 --> 00:08:51,533
because you have to have 
tiny little speakers

212
00:08:51,533 --> 00:08:54,266
to fit into this tiny, sleek form factor.

213
00:08:54,766 --> 00:08:59,500
These tiny speakers are also usually 
on the back of the TV.

214
00:08:59,500 --> 00:09:02,600
So the downmixed version of this movie that went from

215
00:09:02,600 --> 00:09:05,700
128 channels down to just 2

216
00:09:05,700 --> 00:09:09,433
is going to sound even muddier
when it's pointing away from you.

217
00:09:09,800 --> 00:09:13,200
And when you're watching on your phone or a laptop...

218
00:09:13,666 --> 00:09:15,766
it's generally not much better.

219
00:09:16,066 --> 00:09:20,200
When you combine not great speakers, 
naturalistic mumbly performances

220
00:09:20,200 --> 00:09:24,466
dynamic range featuring bombastic sound
over dialogue and a flattened mix...

221
00:09:24,833 --> 00:09:27,166
It's no wonder we have trouble hearing

222
00:09:27,166 --> 00:09:28,433
what's going on.

223
00:09:28,433 --> 00:09:31,766
And it seems like the industry knows this 
because TVs today

224
00:09:31,766 --> 00:09:34,066
are shipping with all kinds of settings

225
00:09:34,066 --> 00:09:36,400
built in like this intelligence mode.

226
00:09:36,900 --> 00:09:39,900
You can put on active voice amplification

227
00:09:39,900 --> 00:09:41,766
in hopes of making that dialog track

228
00:09:41,766 --> 00:09:44,033
come through just a little bit clear.

229
00:09:44,366 --> 00:09:47,933
But of course, that's more band aid than it is solution.

230
00:09:48,100 --> 00:09:51,900
The way movies get mixed likely isn't 
going to revert back to

231
00:09:51,900 --> 00:09:54,100
super pristine dialogue.

232
00:09:54,466 --> 00:09:58,766
So the solutions we have are, one: Buy better speakers

233
00:09:58,766 --> 00:10:02,033
and only go to theaters that have 
impeccable sound.

234
00:10:02,200 --> 00:10:04,100
Two: take a chill pill

235
00:10:04,100 --> 00:10:06,633
and try to just worry a little bit less

236
00:10:06,633 --> 00:10:09,600
about picking up every single word that gets said.

237
00:10:09,800 --> 00:10:11,133
Or, three...

238
00:10:11,433 --> 00:10:13,633
Just keep the subtitles on.

239
00:10:15,266 --> 00:10:17,266
For people who are deaf or hard of hearing

240
00:10:17,266 --> 00:10:20,566
subtitles make movies and TV shows accessible

241
00:10:20,566 --> 00:10:24,033
and this accessibility has just expanded in recent years.

242
00:10:24,033 --> 00:10:25,600
Laws have been passed to ensure

243
00:10:25,600 --> 00:10:27,800
that movie theaters have at least 
a few screenings a week

244
00:10:27,800 --> 00:10:29,033
with captions.

245
00:10:29,033 --> 00:10:31,766
Pretty much every streaming service 
has standardized them

246
00:10:31,766 --> 00:10:34,233
and speech recognition technology

247
00:10:34,233 --> 00:10:36,366
has made them accessible in pretty much

248
00:10:36,366 --> 00:10:39,000
every YouTube video and TikTok.
[Which is partially how our subtitles are made!]

249
00:10:39,000 --> 00:10:41,800
Plus, they're super easy to toggle on and off.