CPU Performance & Efficiency: SPEC2006

We’re moving on to SPEC2006, analysing the new single-threaded performance of the new Cortex-A77 cores. As the new CPU is running at the same clock as the A76-derived design of the Snapdragon 855, any improvements we’ll be seeing today are likely due to the IPC improvements of the core, the doubled L3 cache, as well as the enhancements to the memory controllers and memory subsystem of the chip.

Disclaimer About Power Figures Today:

The power figures presented today were captured using the same methodology we generally use on commercial devices, however this year we’ve noted a large discrepancy between figures reported by the QRD865’s fuel-gauge and the actual power consumption of the device. Generally, we’ve noted that there’s a discrepancy factor of roughly 3x. We’ve reached out to Qualcomm and they confirmed in a very quick testing that there’s a discrepancy of >2.5x. Furthermore, the QRD865 phones this year again suffered from excessive idle power figures of >1.3W.

I’ve attempted to compensate the data as best I could, however the figures published today are merely preliminary and of lower confidence than usual. For what it’s worth, last year, the QRD855 data was within 5% of the commercial phones’ measurements. We’ll be naturally re-testing everything once we get our hands on final commercial devices.

In the SPECint2006 suite, we’re seeing some noticeable performance improvements across the board, with some benchmarks posting some larger than expected increases. The biggest improvements are seen in the memory intensive workloads. 429.mcf is DRAM latency bound and sees a massive improvement of up to 46% compared to the Snapdragon 855.

What’s interesting to see is that some execution bound benchmarks such as 456.hmmer seeing a 28% upgrade. The A77 has an added 4th ALU which represents a 33% throughput increase in simple integer operations, which I don’t doubt is a major reason for the improvements seen here.

The improvements aren’t across the board, with 400.perlbench in particular seeing even a slight degradation for some reason. 403.gcc also saw a smaller 12% increase – it’s likely these benchmarks are bound by other aspects of the microarchitecture.

The power consumption and energy efficiency, if the numbers are correct, roughly match our expectations of the microarchitecture. Power has gone up with performance, but because of the higher performance and smaller runtime of the workloads, energy usage has remained roughly flat. Actually in several tests it’s actually improved in terms of efficiency when compared to the Snapdragon 855, but we’ll have to wait on commercial devices in order to make some definitive conclusions here.

In the SPECfp2006 suite, we’re seeing also seeing some very varied improvements. The biggest change happened to 470.lbm which has a very big hot loop and is memory bandwidth hungry. I think the A77’s new MOP-cache here would help a lot in regards to the instruction throughput, and the improved memory subsystem makes the massive 65% performance jump possible.

Arm actually had advertised IPC improvements of ~25% and ~35% for the int and FP suite of SPEC2006. On the int side, we’re indeed hitting 25% on the Snapdragon 865, compared to the S855, however on the FP side we’re a bit short as the increase falls in at around 29%. The performance increases here strongly depend on the SoC and particular on the memory subsystem, compared to the Kirin 990’s A76 implementation the increases here are only 20% and 24%, but HiSilicon’s chip also has a stronger memory subsystem which allows it to gain quite more performance over the A76’s in the S855.

The overall results for SPEC2006 are very good for the Snapdragon 865. Performance is exactly where Qualcomm advertised it would land at, and we’re seeing a 25% increase in SPECint2006 and a 29% in SPECfp2006. On the integer side, the A77 still trails Apple’s Monsoon cores in the A11, but the new Arm design now has been able to trounce it in the FP suite. We’re still a bit far away from the microarchitectures catching up to Apple’s latest designs, but if Arm keeps up this 25-30% yearly improvement rate, we should be getting there in a few more iterations.

The power and energy efficiency figures, again, taken with a grain of salt, are also very much in line with expectations. Power has slightly increased with performance this generation, however due to the performance increase, energy efficiency has remained relatively flat, or has even seen a slight improvement.

Introduction & Specifications System Performance
Comments Locked

178 Comments

View All Comments

  • Andrei Frumusanu - Monday, December 16, 2019 - link

    You forgot I'm member of the Illuminati, half mole-people from my dad's side and half lizard-man from my mother's side. I love my monthly deep state paycheck alongside the Apple subsidies I get for spreading their narrative. Wait till people find out the earth is really flat.
  • Quantumz0d - Monday, December 16, 2019 - link

    LOL. Lawyer manipulation is for their Class Actions KB fiasco, Touch Disease, Error 53..not you (Just clarifying) and idk if you know Louis Rossman on YT. If not I suggest to watch and know how the fleecing is done and consumer is kept in dark always. The revelations of their stranglehold on HW IC chip for supplying to repair services and Lobbying against Repair is enough to understand and gauge the fundamemal pillars of a company and its ethics.

    Sorry I take ethics and choice/liberty into account over utopian performance and elitist / Luxury status quo stance.
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    I pleaded with you to not go into tangential rants for this article again, yet here we are.
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    > How? Just like Geekbench, different compilers are used. Different distribution of loads are made.

    Please explain to me what the hell "different distributions of loads are made" is meant to mean? You have zero technical rationale behind such statements. All the comparisons here were made with the Clang/LLVM compilers on all platforms - bar the ISA, there is exactly zero difference in the workload logic between the platforms, and Apple's toolchain isn't doing something completely different either that it would suddenly invalidate the comparison.

    > You are showing Apple A13 (LOL A13 is faster than the fastest AMD or Intel chip) using Jurassic Spec benchmark?

    Yes I am because that is the reality of the matter.

    > We are talking about efficiency here, your beloved Apple chip is sucking twice the power than SD855 or SD865 per workload.

    And it's finishing the workload than twice as fast, ending up being *almost* as efficient in terms of the energy used by the computation. What matters here is the energy efficiency, not the power efficiency, and in this regard Apple's devices are top of the line.

    > While your chart if showing Apple has twice the performance vs SD865, the phone doesn't tell lies.

    What's even your point here? Of course the iPhones are significantly faster in loading webpages?

    Return here when you have an actual factual argument to present, because right now you just have been repeating complete nonsense.
  • joms_us - Monday, December 16, 2019 - link

    > Please explain to me what the hell "different distributions of loads are made" is meant to mean? You have zero technical rationale behind such statements. All the comparisons here were made with the Clang/LLVM compilers on all platforms - bar the ISA, there is exactly zero difference in the workload logic between the platforms, and Apple's toolchain isn't doing something completely different either that it would suddenly invalidate the comparison.

    The compiler maybe the same but the scheduler of tasks in Android and Windows are different than in iOS. Many background apps are running simultaneously on Android and Windows machine, how about iOS? Frozen apps? LOL

    >Yes I am because that is the reality of the matter.

    Only matters to you, not in outside world. If you really think A9 has better IPC than Ryzen or Skylake, why don't you join the Apple engineers and build the fastest gaming/productivity PC with Apple A9 chip and sell it like hotcakes? No? Cannot t be? Even Apple doesn't claim their SoC is faster than even low end desktop today LOL. Even milking the customers with overpriced Macs with "Intel" inside.

    > And it's finishing the workload than twice as fast, ending up being *almost* as efficient in terms of the energy used by the computation. What matters here is the energy efficiency, not the power efficiency, and in this regard Apple's devices are top of the line.

    What matters is how fast it can finish the whole task not each micro-workload nonsense. If I want to zip and upload a file or encode and upload a video, I only care how fast it will finish the whole task and for that matter. If I want to play games, do I care how the fast the damn phone will compute the vector, pixel location, math operations etc? I only care how elegant, smooth and how fast the gaming experience will be.

    iPhone is not twice as fast as loading any web page, any consumer app or even exporting or transcoding videos. Different apps yield different results, you are showing one worthless primitive benchmark where iPhone is fast, but out there, hundreds or thousands of different apps and website are showing the opposite results.

    Here is one or two for you, one is showing twice the performance over the other =D

    https://youtu.be/ay9V5Ec8eiY?t=529

    https://youtu.be/DtSgdrKztGk?t=432
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    > the scheduler of tasks in Android and Windows are different than in iOS.

    The scheduler isn't any different, because the scheduler doesn't do anything when there's only a single thread on a core to be run. There is literally no scheduling.

    > If you really think A9 has better IPC than Ryzen or Skylake

    Correction, I don't really just think it, I know it.

    > What matters is how fast it can finish the whole task not each micro-workload nonsense.

    The whole SPEC suite takes exactly an hour to complete, so quit with the micro nonsense if you have no idea what's even being tested here.

    > Here is one or two for you, one is showing twice the performance over the other =D

    Both phones don't even use the freaking CPU when transcoding videos - they're both offloaded using the dedicated fixed function video encoders much like you can offload encoding on desktop PCs to your GPU's encoders, instead of doing it inefficiently on the CPU.

    You have absolutely ZERO understanding of what's going on here.
  • joms_us - Monday, December 16, 2019 - link

    > The scheduler isn't any different, because the scheduler doesn't do anything when there's only a single thread on a core to be run. There is literally no scheduling.

    Then the SoC is not maximized but underperforming.

    > Correction, I don't really just think it, I know it.

    Sure you do, now where is the fastest processor in this planet? Where is our A9-powered gaming PC LOL.

    > The whole SPEC suite takes exactly an hour to complete, so quit with the micro nonsense if you have no idea what's even being tested here.

    Just goes to show how primitive your tool is. 2020 is just around the corner, here you are still using a 2006 tool. This is like claiming Wolfdale is faster than Ryzen because it can finish 1M SuperPI faster LOL.
  • Dug - Monday, December 16, 2019 - link

    You really don't have any argument because you really aren't sure what you are talking about.
  • joms_us - Monday, December 16, 2019 - link

    Am I or you? Isn't it clear that SPEC result does not translate to real-world? Where is the double performance as shown here? Show us proof that iPhone has twice the performance, I've posted links with two Android phones decimating iPhone 11.

    Sure you can claim all day you want that iPhone is the fastest phone via SPEC LOL, I'd rather see it translate to actual performance, not imaginary numbers.
  • cha0z_ - Monday, December 23, 2019 - link

    You clearly have no idea what you are talking about. Dunno why Andrei dedicated so much of his time trying to explain to you in primitive language what's going on (so you can understand).

Log in

Don't have an account? Sign up now