- IPC can be used to describes how good a CPU is.
- IPC is roughly proportional to pipeline width of the CPU.
- IPC of modern CPUs are high (>>1).
- Amdahl's law says CPU with higher IPC will have higher single-threaded performance.
- ...
Myth #1: IPC described as a single value
A common problem of all the "statements" above is that they all refer to IPC as if it is some intrinsic property determined by the CPU microarchitecture. In fact, IPC is a property determined not just by the CPU, but more by the program from algorithm down to instruction scheduling. For example, it is very possible for a CPU1 to have higher IPC than CPU2 running program A, but lower IPC running program B.
Thus, saying "CPU1 has higher (or lower) IPC than CPU2" has to be inaccurate, especially when the two processors have different microarchitectures.
Myth #2: Higher IPC means better
Many people believe higher IPC means higher (single-thread) performance. This is as wrong as when people thought higher clock rate means higher performance. Still, many believe higher IPC is better because the CPU can run as fast with slower clock rate. This seems an over-reaction to the Pentium 4, which had very high clock rate but moderate performance compared to Athlon64/Opteron.
The problem with this type of thinking is that the relation between IPC and clock rate is really a tradeoff. Like any tradeoff relation, you don't get optimal results by sliding towards either edge. With microarchitecture and circuit-level advancements, both clock rate and/or IPC can be increased. Which one to improve should depend on the design and application of the processor, and it's definitely not always (not even usually) IPC.
Myth #3: IPC is proportional to CPU pipeline width
We see many arguments like below on the Internet--
- Core 2 can issue up to 4 x86 instructions per cycle, so it should have an IPC close to 4.
- Nehalem brings [this or that features] to circumvent the decode limit, so it's IPC is 25% or 33% higher than Core 2.
- K10 (AMD Family 10h) can only decode 3 x86 instructions per cycle, so its IPC has "bottleneck" at the instruction decode.
In the case of Core 2 and Nehalem, we actually know for sure that the statements above are false. IPC of Core 2 Duo running SPEC CPU2006 was measured in this paper. The values were between 0.4 to 1.8 among all sub-benchmarks, with average only around 1.0, no where near its 4-way decoder width.
If we compare actual SPECint measurements of Core 2 (22.6) with Nehalem (25.1 or 27.8), we see that Nehalem has 11% to 23% higher single-thread performance after taking into account potentially 20% turbo frequency. Thus Nehalem's IPC for SPECint is at most ~20% higher than Core 2, and most likely much less when exclude the turbo mode effect. In other words, if Core 2's IPC for SPECint sub-benchmarks were 0.4~1.8, then Nehalem's should be between 0.5~2.1. Both are far below what is implied by their 4-way pipelines or any sexy-sound marketing features.
Myth #4: Amdahl's law favors CPU designed for higher IPC
This is the strangest argument that I have seen on the Internet, because it is completely the opposite of truth. The main thing that Amdahl's law says is that performance improvement is intrinsically limited by the available parallelism in a program. In the context of single-threaded programs, this means that performance at the same clock rate is limited by the Instruction-Level Parallelism (ILP) available in the program.
Some people see that "limited by the ILP" part and immediately relate it to a CPU designed for higher IPC. The problem here is that, according to Amdahl's law, the ILP is limited by the program, not the CPU. In other words, if your program has low ILP, it will not run fast no matter how high an IPC the CPU was designed for. Thus in fact Amdahl's law favors a CPU designed for higher clock rate but lower IPC than the available ILP in the program.
Furthermore, the available ILP in a program is also a strong function of the window size and the branch prediction accuracy. Both are very difficult to increase in the uber-complex microarchitectures of modern CPUs. That is why features such as SIMD (SSE and AVX), SMT, and turbo frequency are used in Nehalem to improve
Conclusion
IPC is very useful when one wants to optimize his program for a particular system. It is one of the most important metrics that profiling produces. But like any metric, generalizing its implication outside of its intended usage context is usually meaningless and even misleading.
3 comments:
Informative! A+
@Vik: thanks!
Great job! made me change the way i thought of IPC.
jdwii
Post a Comment