One reader to my previous article asked why didn't I use dual-socket Core 2 Duo for scaling comparison. The reason is simple: I couldn't find a single pair of SPEC 2006 results where a single-socket and a dual-socket Core 2 Duo machines use the same CPU clock rate, memory technology, compiler, and operating system, where scientifically valid comparison can be made.
In this article I will relax a bit and do not require exact matches among the candidate systems. I will use four x86_64 system models show the "number of cores" and "clock rate" scaling of both Intel Core 2 Duo and AMD Opteron (K8).
Below is the system settings and their SPEC2006_rate scores. I use "ds" for dual-socket dual-core (4 cores), "qc" for single-socket quad-core (4 cores), and "dc" for single-socket dual-core (2 cores) -
Nothing is better than a picture to illustrate complicated data. Below is the performance graph of these systems. Green lines are for Fujitsu/AMD; blue lines for Fujitsu/Intel; red lines for Acer/Intel -
Couldn't resist the temptation, below is a list of observations that I have to make:
First, with 2 cores, Core 2 Duo is undoubtedly the winner on both SPECint_rate and SPECfp_rate. With 4 cores, however, K8 becomes the better choice for SPECfp. The more powerful a system is, the more advantage K8 has, due to its better "number of core" scalability.
Second, Intel's FSB (front-side bus) is a bottleneck for 4 cores, even at 1066MHz. This is obvious from the left-most points of 4-core Core 2 systems (C2ds and C2qc), where the scores are lower than the rest of the clock scaling trend. Looking at the system settings, these lower-than-expected performances come precisely from the 1066MHz FSB (vs. 1333MHz).
Third, the MCM quad-core could be a good cost/power-saving for single-socket home users and low-end servers. It almost matches dual-socket Opteron on integer performance, although its floating-point performance is still somewhat desired.
Fourth, the MCM quad-core does not scale well at/beyond 2.67GHz. You may cry, look, the 2.67GHz C2Q even has lower SPECfp_rate than the 2.40GHz C2Q! There must be something wrong with the Fujitsu systems! Unfortunately, no. As of May 2007, all reported 2.67GHz C2Q SPECfp_rate I can find are "lower than expected." (The highest among them is 33.9 - less than 1% higher - but it uses FB-DIMM, different from the other systems presented here). This is probably why Intel is so late in introducing a higher-clocked Core 2 Quad - if they are not (much) better, why bother?
Fifth, the "clock rate" scaling of K8 performance is slowing down at 2.8GHz, especially for SPECfp_rate. Since all Fujitsu Primergy RX330 systems are identical except the CPU clock rate, the only explanation is that the larger processor-memory speed gap makes higher CPU frequency less effective. Core 2 does not experience the same slow down probably due to its larger cache and a better load/store circuits.
Sixth, doubling L2 cache size helps Core 2 Duo for about as much as a speed grade (0.16GHz). This is seen from the "jump" on the single-socket Core 2 Duo performances (C2dc), where the left two points with 2MB L2 are one step lower than the right three points with 4MB L2.
A personal record of understanding, deciphering, speculating and predicting the development of modern microarchitecture designs.