A personal record of understanding, deciphering, speculating and predicting the development of modern microarchitecture designs.

Monday, September 10, 2007

Scalability counts!

As I have said in this article, Intel's new Core 2 line of processors have good cores but poor system architecture. The poor scalability of FSB means that Core 2, without extensive, expensive, and power-hungry chipset support, is only suitable for low-end personal enjoyment.

Take a look at this AnandTech benchmark. I'd note foremost that AnandTech is hardly an AMD-favoring on-line "journal"; thus we can expect its report to be at worst Intel-biased and at best neutral (which I'm hoping for here). In any rate, the benchmark picture is reproduced below:

The comparison between Barcelona (Opteron 2350 2.0GHz) and Clovertown (Xeon E5345 2.33GHz) couldn't be clearer: FSB is an outdated system architecture for today's high-end computing, and scalability does matter for server & workstation grade performance. While AMD's quad-core Opteron at 2.0GHz is slower than Intel's quad-core Xeon at 2.3GHz on single-socket test, the situation is reversed when going to a dual-socket setup, one that used by most workstations and entry-level servers.

The same phenomenon is also observed in this page where AMD's quad-core Opteron, at 17% slower clock rate, performs increasingly better than Intel's quad-core Xeon with more number of cores (picture reproduced below). Again, when it comes to server & workstation performance, scalability counts.

2 comments:

Ho Ho said...

Just a little note. In that WinRar benchmark Xeon doesn't scale noticeably worse than Barcelona, Barcelona simply gets an additional boost when four or more cores are used. I'm not sure what could cause it but my first guess would be that some threads are being run on other CPU, NUMA works in WinRar (kind of odd) and it can use more of its system memory bandwidth.

For Xeon scaling drops slightly when going from 2->4 cores but for some reason it rises slightly with 4->8 cores

abinstein said...

It's actually not difficult to explain.

From single to dual core, Barcelona scales less probably due to clock gating. The L3 cache is attached to northbridge and is slowed down when traffic is low. From dual to quad the traffic picks up and northbridge in Barcelona works mostly in full speed.

Please Note: Anonymous comments will be read and respected only when they are on-topic and polite. Thanks.