|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Version 3.0 This page gives a comprehensive text about various topics related to audio compression. Because many foreign pages link to this page we decided to maintain it and recommend it as an comprehensive overview for off-line processing as printout. Please send questions and comments to amm_info@iis.fhg.de Q: O.K., Layer-3 is obviously a key to many applications. Where are its limitations? A: Well, MPEG Layer-3 is a
perceptual audio coding scheme, exploiting the properties of the human
ear, and trying to maintain the original sound quality as far as possible. Q: You mentioned the codec delay. May I have some figures? A: Well, the standard gives some figures of the theoretical minimum delay:
Q: What is "MPEG"? A: MPEG is the "Moving Picture Experts Group", working under the joint direction of the International Standards Organization (ISO) and the International Electro-Technical Commission (IEC). This group works on standards for the coding of moving pictures and audio. MPEG has created its own homepage, providing information on the what, where, when and how of the standards. Q: Are MPEG-3 and Layer-3 the same thing? A: No! Layer-3 is a powerful audio
coding scheme which certainly is part of the MPEG standard. Layer-3 is
defined within the audio part of both existing international standards,
MPEG-1 and MPEG-2. Q: How do I get the MPEG documents? A: Well, you may contact ISO, or you order it from your national standards body. E.g., in Germany, please contact DIN. Q: Is some public C source available? A: Well, there is "public C source" available on various sites, e.g. at ftp://ftp.iis.fhg.de/pub/layer3/public_c/. This code has been written mainly for explanation purposes, so do not expect too much performance.What about Layer-1, Layer-2, Layer-3? Q: Talking about MPEG audio, I always hear "Layer 1, 2 and 3". What does it mean? A: MPEG describes the compression of
audio signals using high performance perceptual coding schemes. It
specifies a family of three audio coding schemes, simply called Layer-1,
Layer-2, and Layer-3. From Layer-1 to Layer-3, encoder complexity and
performance (sound quality per bitrate) are increasing. Q: So we have a family of three audio coding schemes. What does the MPEG standard define, exactly? A: For each Layer, the standard specifies the bitstream format and the decoder. To allow for future improvements, it does not specify the encoder, but an informative chapter gives an example for an encoder for each Layer. Q: What have the three audio Layers in common? A: All Layers use the same basic structure. The coding scheme can be described as "perceptual noise shaping" or "perceptual subband / transform coding". The encoder analyzes the spectral components of the audio signal by calculating a filterbank (transform) and applies a psychoacoustic model to estimate the just noticeable noise-level. In its quantization and coding stage, the encoder tries to allocate the available number of data bits in a way to meet both the bitrate and masking requirements.The decoder is much less complex. Its only task is to synthesize an audio signal out of the coded spectral components. All Layers use the same analysis filterbank (polyphase with 32 subbands). Layer-3 adds a MDCT transform to increase the frequency resolution. All Layers use the same "header information" in their bitstream, to support the hierarchical structure of the standard. All Layers have a similar sensitivity to biterrors. They use a bitstream structure that contains parts that are more sensitive to biterrors ("header", "bit allocation", "scalefactors", "side information") and parts that are less sensitive ("data of spectral components"). All Layers support the insertion of program-associated information ("ancillary data") into their audio data bitstream. All Layers may use 32, 44.1 or 48 kHz sampling frequency. All Layers are allowed to work with similar bitrates:
The last two statements refer to MPEG-1; with MPEG-2, there is an extension for the sampling frequencies and bitrates (see below). Q: What are the main differences between the three Layers, from a global view? A: From Layer-1 to Layer-3, complexity increases (mainly true for the encoder), overall codec delay increases, and performance increases (sound quality per bitrate). Q: What are the main differences between MPEG-1 and MPEG-2 in the audio part? A: MPEG-1 and MPEG-2 use the same family of audio codecs, Layer-1, -2 and -3. The new audio features of MPEG-2 are a "low sample rate extension" to address very low bitrate applications with limited bandwidth requirements (the new sampling frequencies are 16, 22.05 or 24 kHz, the bitrates extend down to 8 kbps), and a "multichannel extension" to address surround sound applications with up to 5 main audio channels (left, center, right, left surround, right surround) and optionally 1 extra "low frequency enhancement (LFE)" channel for subwoofer signals; in addition, a "multilingual extension" allows the inclusion of up to 7 more audio channels.Q: Is this all compatible to each other? A: Well, more or less, yes - with the execption of the low sample rate extension. Obviously, a pure MPEG-1 decoder is not able to handle the new half sample rates. Q: You mean: compatible!? With all these extra audio channels? Please explain! A: Compatibility has been a major topic during the MPEG-2 definition phase. The main idea is to use the same basic bitstream format as defined in MPEG-1, with the main data field carrying two audio signals (called L0 and R0) as before, and the ancillary data field carrying the multichannel extension information. Without going further into details, two terms should be explained here: "forwards compatible": the MPEG-2 decoder has to accept any MPEG-1 audio bitstream (that represents one or two audio channels) "backwards compatible": the MPEG-1 decoder should be able to decode the audio signals in the main data field (L0 and R0) of the MPEG-2 bitstream "Matrixing" may be used to get the surround information into L0 and R0: L0 = left signal + a * center signal + b * left surround signal R0 = right signal + a * center signal + b * right surround signal Therefore, a MPEG-1 decoder can reproduce a comprehensive downmix of the full 5- channel information. A MPEG-2 decoder uses the multichannel extension information (3 more audio signals) to reconstruct the five surround channels. Q: In your footnotes, you indicate the use of some "non-ISO" extension inside your Fraunhofer codec, called "MPEG 2.5", to further improve the performance at very low bitrates (e.g. 8 kbps mono). What do you mean by this? A: Oh, yes. Well, the MPEG-2 standard allows bitrates as low as 8 kbps, for the low sample rate extension. At such a low bitrate, the useful audio bandwidth has to be limited anyway, e.g. to 3 kHz. Therefore, the actual sample rate could be reduced, e.g. to 8 kHz. The lower the sample rate, the better the frequency resolution, the worse the time resolution, and the better the ratio between control information and audio payload inside the bitstream format. As the MPEG-2 standard defines 16 kHz as lowest sample rate, we introduced a further extension, again dividing the low sample rates of MPEG-2 by 2, i.e. we introduced 8, 11.025, and 12 kHz - and we named this extension to the extension "MPEG 2.5". "Layer-3" performs significantly better with 8 kbps @ 8 kHz or 16 kbps @ 11 kHz than with 8 or 16 kbps @ 16 kHz.Advanced Features of Layer-3 - or: Why does Layer-3 perform so well? Q: Well, I read your statement about "CD-like" performance, achieved at a data reduction of 4:1 (or 384 kbps total bitrate) with Layer-1, 6..8:1 (or 256..192 kbps total bitrate) with Layer-2, and 12..14:1 (or 128..112 kbps total bitrate) with Layer-3. Can you explain a little further? A: Well, each audio Layer extends the features of the Layer with the lower number. The simplest form is Layer-1. It has been designed mainly for the DCC (Digital Compact Cassette), where it is used at 384 kbps (called "PASC"). Layer-2 has been designed as a trade-off between complexity and performance. It achieves a good sound quality at bitrates down to 192 kbps. Below, sound quality suffers. Layer-3 has been designed for low bitrates right from the start. It adds a number of "advanced features" to Layer-2: the frequency resolution is 18 times higher, which allows a Layer-3 encoder to adapt the quantisation noise much better to the masking threshold only Layer-3 uses entropy coding (like MPEG video) to further reduce redundancy only Layer-3 uses a bit reservoir (like MPEG video) to suppress artefacts in critical moments and Layer-3 may use more advanced joint-stereo coding methods Q: I see. Now, tell me more about sound quality. How do you assess that? A: Today, there is no alternative to expensive listening tests. During the ISO-MPEG process, a number of international listening tests have been performed, with a lot of trained listeners. All these tests used the "triple stimulus, hidden reference" method and the "CCIR impairment scale" to assess the sound quality. The listening sequence is "ABC", with A = original, BC = pair of original / coded signal with random sequence, and the listener has to evaluate both B and C with a number between 1.0 and 5.0. The meaning of these values is: 5.0 = transparent (this should be the original signal) 4.0 = perceptible, but not annoying (first differences noticable) 3.0 = slightly annoying 2.0 = annoying 1.0 = very annoying Q: Listening tests are certainly an expensive task. Is there really no alternative? A: Well, at least not today.
Tomorrow may be different. To assess sound quality with perceptual codecs,
all traditional "quality" parameters (like signal-to-noise
ratio, total harmonic distortion, bandwidth) are rather useless, as any
codec may introduce noise and distortions as long as these do not affect
the perceived sound quality. So, listening tests are necessary, and, if
carefully prepared and performed, they lead to rather reliable results. Q: O.K., back to these listening tests and the performance evaluation. Come on, tell me some results. A: Well, for more details you should study one of these AES papers or the MPEG documents. For MPEG Layer-3, the main result is that it always performed superior at low bitrates (64 kbps per audio channel or below). Well, this is not completely surprising, as MPEG Layer-3 uses the same tool set as Layer-2, but with some additional advanced coding features that all address the demands of very low bitrate coding. One impressive example is the ISO-MPEG listening test carried out in September 94 at NTT Japan (doc. ISO/IEC JTC1/SC29/WG11 N0848, 11.Nov. 94). Another interesting result is the conclusion of the task group TG 10/2 within the ITU- R, which recommends the use of low bit-rate audio coding schemes for digital sound-broadcasting applications (ITU-R doc. BS.1115). Q: Very interesting! Tell me more about this recommendation! A: The task group TG 10/2 finished its work in 10/93. The recommendation defines three fields of broadcast applications and recommends Layer-2 with 180 kbps per channel for distribution and contribution links (20 kHz bandwidth, no audible impairments with up to 5 cascaded codec), Layer-2 with 128 kbps per channel for emission (20 kHz bandwidth), and MPEG Layer-3 with 60 (120) kbps for mono (stereo) signals for commentary links (15 kHz bandwidth). ActiveWindows is not responsible for any links to third party sites. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||