Replay Gain - A Proposed Standard

Calibration

Finding a standard

Having calculated a representative RMS energy value for the audio file, we now need to reference this to a real world sound pressure level. The audio industry doesn't have any standard for listening level, but the movie industry has worked to an 83dB standard for years.

What the standard actually states is that a single channel pink noise signal, with an RMS energy level of -20 dB relative to a full scale sinusoid should be reproduced at 83 dB SPL (measured using a C-weighted, slow averaging SPL meter). In simple terms, this means that everyone can set their volume control to the same (known, calibrated) gain.

ASIDE: This number (83dB SPL) wasn't picked at random. It represents a comfortable average listening level, determined by professionals from years of listening. That reference level of -20dB pink noise isn't random either. It causes the calibrated average level to be 20dB less than the peak level. In other words, it leaves 20dB of headroom for louder than average signals. So, if CDs were mastered this way, the average level would be around -20dB FS, leaving lots of room for the dramatic peaks which make music exciting.

An ideal world...

NOW (are you still with me?) if the mastering engineer set the levels on a CD using that calibrated volume control setting, that CD will sound best at that volume. If all CDs were mastered in such a way, they'd all sound best at that volume. If you (as a listener) didn't want to listen at that particular volume setting, you could always turn it down, but all CDs would still sound equalling "turned down" at your preferred setting. You wouldn't have to change the volume setting between discs.

Reality check! We know CDs aren't made like this. There is NO audio standard replay level. So, here's the clever bit - here's the whole point of this website...

Fixing a non-ideal world

We know the level should average around 83dB SPL, and we know a -20dB pink noise signal will give 83dB SPL in a calibrated system. So, we send the pink noise signal through the ReplayGain program, and store the result (let's call it ref_Vrms). For every CD we process, the difference between the calculated value for that CD and ref_Vrms tells you how much you need to scale the signal in order to make it average 83dB.

The actual process is quicker to do than to say!

One complication

The system calibration uses a single channel of pink noise (reproduced through a single loudspeaker). You then play music through both loudspeakers. So, though we use 1 channel of pink noise to calibrate the system gain, the ideal level of the music is actually the loudness when both speakers are in use. So, in ReplayGain, we calibrate to 2 channels of pink noise, because that's how loud we'd like the music to sound. In reality, we just have a monophonic pink noise wavefile, and ReplayGain automatically assumes you're playing it through both speakers, as it would any monophonic file.

Implementation

ReplayGainScript.m loads a .wav file containing -20dB FS pink noise from disc, and processes this via ReplayGain.m, storing the result as a reference. The reference wavefile is available here. The relevant lines of code are:

      % Calculate perceived loudness of -20dB FS RMS pink noise
      % This is the SMPTE reference signal. It calibrates to:
      % 0dB on a studio meter / mixing desk
      % 83dB SPL in a listening environment (THIS IS WHAT WE'RE USING HERE)
      [ref_Vrms]=replaylevel('ref_pink.wav',a1,b1,a2,b2);
        ...
      % Calculate the perceived loudness of the file using "replaygain" function
      % Subtract this from reference loudness to give replay gain adjustment to 83 dB reference
      Vrms=ref_Vrms-replaygain(current_file,a1,b1,a2,b2);

That's it. That's the value we store. To yield the actual replay gain, just add 83dB.

Suggestions and further work

When the block length and % are standardised (I've chosen 50ms and 95%, but further testing may show these values can be tweaked), there is no need to calculate ref_Vrms every time the program is run - the value can be calculated once, then assigned directly.