The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full scale sine wave at 1kHz sounds much louder than a full scale sine wave at 10kHz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation to the equal loudness curves (sometimes referred to as Fletcher-Munson curves).
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square of the waveform every 50ms.
Where the average energy level of a signal varies with time, the louder moments contribute most to our perception of overall loudness. For example, in human speech, over half the time is silence, but this does not affect the perceived loudness of the talker at all! For this reason, the RMS values are sorted into numerical order, and the value 5% down the list is chosen to represent the overall perceived loudness of the signal.
A suitable average replay level is 83dB SPL. A calibration relating the energy of a digital signal to the real world replay level has been defined by the SMPTE. Using this calibration, we subtract the current signal from the desired (calibrated) level to give the difference. We store this difference in the audio file.
The calibration level of 83dB can be added to the difference from the previous calculation, to yield the actual Replay Gain. NOTE: we store the differential, NOT the actual Replay Gain.
You can find out more about each process by clicking on the headings above each section, or in the list below.
The whole process has been implemented in MATLAB. The required files are ReplayGainScript.m which batch processes .wav files, and ReplayGain.m which carries out most of the actual calculation. You will also need EqualLoudFilt.m to design the equal loudnes filter.
This calculations is currently being implemented in C and C++ by two volunteers.