A personal record of understanding, deciphering, speculating and predicting the development of modern microarchitecture designs.

Tuesday, May 15, 2007

Multi-core scalability (lacking) of Intel Core 2 Duo

It's been over 9 months since Intel release the Core 2 Duo processors. Praise to this processor and its multi-chip module (MCM) quad-core brother, Core 2 Quad, float around the Internet. With this line of processors, Intel is going back to its root - market on and profit from the personal (vs. high-performance or big-server) computing. In other words, while Core 2 Duo/Quad works great for home projects (video encoding, playing games, etc.), it does not scale well to the larger, heavier-duty setups.

Enough of talking, and lets see some proofs with industry-standard SPECint_rate and SPECfp_rate benchmarks. We will only look at the base scores from the new SPEC 2006 benchmark suite.

First we look at how well Core 2 Quad scales from Core 2 Duo:
    [SPECint_rate_base2006]
  • Intel Xeon 3060 2.4GHz, 2 cores/1 chip, 1066MHz FSB - 26.0
  • Intel Xeon X3220 2.4GHz, 4 cores/1 chip, 1066MHz FSB - 43.4 (1.67x)
    [SPECfp_rate_base2006]
  • Intel Xeon 3060 2.4GHz, 2 cores/1 chip, 1066MHz FSB - 22.4
  • Intel Xeon X3220 2.4GHz, 4 cores/1 chip, 1066MHz FSB - 33.5 (1.50x)
The above show that, if you buy a Core 2 Quad, you really get just 3.3 cores of performance for the average integer workloads, and only 3 cores for the floating-point. In other words, the architecture already lacks scalability to quad cores.

In contrast, lets look at how AMD's Opteron (K8) scales to multi-core:
    [SPECint_rate_base2006]
  • AMD Opteron 854 2.8GHz, 2 cores/2 chips - 22.3
  • AMD Opteron 854 2.8GHz, 4 cores/4 chips - 41.4 (1.86x)
  • AMD Opteron 2210 1.8GHz, 2 cores/1 chip - 17.3
  • AMD Opteron 2210 1.8GHz, 4 cores/2 chips - 34.3 (1.98x)
    [SPECfp_rate_base2006]
  • AMD Opteron 854 2.8GHz, 2 cores/2 chips - 24.1
  • AMD Opteron 854 2.8GHz, 4 cores/4 chips - 45.6 (1.89x)
  • AMD Opteron 2210 1.8GHz, 2 cores/1 chip - 17.6
  • AMD Opteron 2210 1.8GHz, 4 cores/2 chips - 34.8 (1.98x)
What we see here is that, for a total of 4 cores per system, not only dual-core Opterons but even single-core Opterons connected by cHT links scale much better than two Core 2 Duos sitting on an MCM. Note that the absolute numbers in the different cases above are not directly comparable to each other, since they use different CPU clock rates, memory technologies, operating systems, and compilers.

Now lets look at how well does Core 2 Duo scale to multi-core, multi-processor setup:
    [SPECint_rate_base2006]
  • Intel Xeon X5355 2.67GHz, 4 cores/1 chip, 1333MHz FSB - 45.9
  • Intel Xeon X5355 2.67GHz, 8 cores/2 chips, 1333MHz FSB - 78.0 (1.70x)
    [SPECfp_rate_base2006]
  • Intel Xeon X5355 2.67GHz, 4 cores/1 chip, 1333MHz FSB - 33.9
  • Intel Xeon X5355 2.67GHz, 8 cores/2 chips, 1333MHz FSB - 56.3 (1.66x)
Again, the scalability is very lacking; you get only 6.8 and 6.6 cores from an 8-core setup for integer and floating-point codes, respectively.

In contrast, lets look at how does Opteron scale from 4 cores to 8. This time we use only the dual-core Opteron processors for comparison:
    [SPECint_rate_base2006]
  • AMD Opteron 2222SE 3.0GHz, 4 cores/2 chips - 44.6
  • AMD Opteron 2222SE 3.0GHz, 8 cores/4 chips - 84.4 (1.89x)
    [SPECfp_rate_base2006]
  • AMD Opteron 2222SE 3.0GHz, 4 cores/2 chips - 47.3
  • AMD Opteron 2222SE 3.0GHz, 8 cores/4 chips - 89.8 (1.90x)
Non-surprisingly, for a total of 8 cores per system, the dual-core Opterons also scale much better than the quad-core Xeons.

What is interesting above is that, for Core 2 Duo, the 4-to-8-cores scaling is actually better than the 2-to-4-cores one. This is probably due to the fact that the 8-core system has 33% faster FSB, plus a chipset intelligent enough to separate traffic to/from the two quad-core processors (rather than a dumb MCM connection as the Core 2 Quad has internally). This also shows that (1) Intel's FSB design is the bottleneck of multi-core scaling even at quad-core, and (2) The MCM quad-core is a even worse approach for scaling performance to multi-core.

In Conclusion - Intel's Core 2 Duo could well be the fastest processor for home computers (or dual-core, single-processor servers) which cost a bit more money for faster video encoding and AI-intensive gaming. On the other hand, with hard proofs we show that for servers that scale to 4 cores or higher, today's dual-core Opteron is a far better choice. This is probably due both to Opteron's Direct-Connect architecture and integrated memory controller, both of which were implemented by AMD in 2003, and will be followed suit by Intel in its next major processor release (Nehalem) in late 2008.

4 comments:

Ho Ho said...

Is it just a coincidence that for Intel you chose mostly Windows and for AMD mostly Linux platforms? Also, are those results highest in their category or simply some chosen randomly?

Anonymous said...

Why no comparison of Intel 2core, 1P to 4 core, 2P to look at scaling? That would seem to be best apples to apples when looking at AMD 2core,1P to 4 core,2P.


"or servers that scale to 4 cores or higher, today's dual-core Opteron is a far better choice."

Is this also true from a cost perspective. SW licensing starts getting pretty pricey and you are comparing a current AMD solution which has twice as many sockets (and therefore licenses)

There is no data in 4 socket in above AMD is the best choice, but to look at spec_rate_base benchmarks ONLY and conclude AMD is a "far better choice" would seem to be a bit narrow minded.

Anonymous said...

8 Intel cores is cheaper than 8 AMD cores.

abinstein said...

Ho Ho -
"Is it just a coincidence that for Intel you chose mostly Windows and for AMD mostly Linux platforms?"

Good question. No, I didn't make any effort to choose different operating systems for different processors, but I did try to find comparisons where identical settings are used for each respective processor except the number of cores.

This could be an effect if Windows and Linux schedulers have very different ability to scale to 4 or 8 cores. Though I believe it's not very likely.

Please Note: Anonymous comments will be read and respected only when they are on-topic and polite. Thanks.