Operating Systems Are Unable To Report CPU Utilization Properly According To Brandon Gregg, Netflix’s Senior Performance Architect

Ever since Intel added hyperthreading and AMD introduced SMT, operating systems have been unable to report CPU utilization properly. While a CPU is divided into CPU and threads the tools that report CPU utilization report the threads as actual cores. Thread stalling is a concern that can confuse people. Most believe that when you see the CPU utilization is at 90%, your CPU us busy but it is possible that the CPU is stalled and is not doing any work.

It is possible that you have seen this in the works especially if you create content in Photoshop or do some kind of rendering. You might have noticed that performance can go to a standstill. While there are ways to counter this issue, if you have been working with computers for some time, there are chances that you have noticed that when it says that the CPU utilization is 100%, it is not actually maxed out.

According to Brandon Gregg, Netflix’s Senior Performance Architect:

To summarize my responses: I’m not talking about iowait at all (that’s disk I/O), and there are actionable items if you know you are memory bound (see above). But is CPU utilization actually wrong, or just deeply misleading? I think many people interpret high %CPU to mean that the processing unit is the bottleneck, which is wrong (as I said earlier). At that point, you don’t yet know, and it is often something external.

This issue is due to CPU-DRAM gap. What you need to know about this concept is that CPU manufacturers like AMD and Intel have been scaling clock speeds faster than DRAM manufacturers have been scaling access latency. When you see that the CPU is being used 100% you think that the CPU is the bottleneck when actually it is the DRAM that is to blame here.

CPU manufacturers have tried to fix the CPU utilization issue by introducing more and smarter cache but there is an extent to which that solution can take effect as the years have gone by CPUs have scaled with additional cores, hyperthreading and multiple CPUs as well. All these put a load on the DRAM. Other reasons why CPU utilization are as follows:

  • Temperature trips stalling the processor.
  • Turboboost varying the clockrate.
  • The kernel varying the clock rate with speed step.
  • The problem with averages: 80% utilized over 1 minute, hiding bursts of 100%.
  • Spin locks: the CPU is utilized, and has high IPC, but the app is not making logical forward progress.

You can click here to read about the matter in detail. For more information regarding PC hardware stay tuned to SegmentNext.

Let us know what you think about CPU utilization getting worse over the years and what you think should be done in order to fix the issue.