|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
CPU Architecture
The Intel Celeron 1.8 GHz uses the IA- If using more pipelines presents several advantages it has also a major drawback: to handle the software instructions the processor tries to guess which one will be the next -using tests. With a pipeline enabled CPU the instructions that follow the test should be managed before the processor knows the test result in order to continually feed the pipeline. To know which instructions should be used the CPU uses a ‘branch prediction’ mechanism: most of the time the CPU runs instructions it has already ran before and probably knows the result ahead of time. It has a 4x larger BTB (branch target buffer) than on the Pentium III to store the history of all previous tests results in 4 KB of memory which helps software to make decisions. If the CPU encounters a test that has already run it’ll use the same branch as before in order to accelerate its work speed. Pentium 4 and Celeron processors achieve more than 94% successful predictions (against only 90% for a Pentium III which Intel claims to be a gain of 33%). But in case of a test failure the whole BTB is trashed as well as all the pipelines in order for the CPU to restart the operation: this process obviously slows down the whole performance of the computer. The Celeron also takes charge of ‘out of order’ instructions in order to not block ALU processes unlike when they are run in ordered mode. Like with every P6 based processor the Celeron comes with two arithmetic logic units and one floating point unit known as superscalar architecture (Pentium CPUs were the first to use it). NetBurst architecture brings a major enhancement known as the Rapid Execution Engine to the superscalar architecture since both the ALU (Arithmetic Logic Unit) & the AGU (Address Generation Unit that manages where data are stored and loaded in the correct address) work twice as fast as the CPU frequency, so it can now handle four instructions per cycle rather than two before. For example, the Rapid Execution Engine on a 1.8GHz Intel Celeron processor runs at 3.6 GHz. For those of you who don’t know an ALU is the name that was given to the integer unit that manages math related operations like dividing, adding, multiplying as well as logical operators like ‘OR’, ‘AND’, ‘XOR’, etc. Just like every good superscalar processor worth of this name, the Pentium 4 still includes a ‘Micro Operation Operand’ Unit that comes with simple instructions directly managed by the processor: most of the time x86 instructions are converted into Ops.
With the 486 DX4 and the Pentium, Intel introduced on board cache memory directly in the chip: it was a premiere that boosted performance. Pentium III enhanced further this concept by integrating on-die cache memory. The Celeron cache memory characteristic has also evolved: L1 cache memory now includes a 8 KB data cache (which is quite small when you know the PIII included a 32 KB one) while the L1 Instruction Cache was renamed to Instruction Trace Cache since it has widely evolved too. The Celeron L1 cache uses a four way set and uses 64 byte cache lines and due to its dual port design it can store data while loading it. Trace Cache memory now stores instructions after they are converted from x86 into micro-ops in the order they should be run, saving processor cycles if a bad branch prediction occurs (since the alternative solution is already stored in it). This also allows faster access to the most used instructions avoiding problems Pentium III may have with complex x86 instructions that were decoded with slow decoders. Trace Cache memory can store 12,000 micro-ops which corresponds to an approximate size of 92 or 96 KB (Intel didn’t specify the exact size). Once µOPs are in the trace cache, the Celeron can easily check for dependencies to correctly achieve its branch predictions and ensure that the pipelines are continuously supplied with data: the trace cache can contains a whole pipeline with 6 µOps each 2 clocks. The L1 cache access speed is now about 1.4 nano seconds (twice as fast as Pentium III) and the bandwidth now reaches 41.7 GB/s (against 14.9 GB/s for a Pentium III). L2 memory cache has also been enhanced. The level 2 cache memory amount reaches 128 KB and runs at the full frequency speed of the CPU (and not like on Pentium II or first Pentium III at a twice-slower speed than the nominal frequency of the CPU). We could only regret the Celeron comes with such a limited L2 cache memory size. As a reminder, the more you reduce the L2 cache memory size, the more the performance will globally decrease. Quad Pumped Front Side Bus Being an entry level processor, the Intel Celeron 1.8GHz uses a 400 MHz only front side bus using a Quad Pumped 64-bit bus where each level operates at 100 MHz for a global 3051 MB/s bandwidth. Intel used a technical trick so the FSB sends four 64-bit instructions per cycle making it work like a “400 MHz” normal one. Not only this bus improves performance but it boosts the exchanges of data between the CPU, the memory and the rest of the system components.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||