SoC Analysis: On x86 vs ARMv8

Before we get to the benchmarks, I want to spend a bit of time talking about the impact of CPU architectures at a middle degree of technical depth. At a high level, there are a number of peripheral issues when it comes to comparing these two SoCs, such as the quality of their fixed-function blocks. But when you look at what consumes the vast majority of the power, it turns out that the CPU is competing with things like the modem/RF front-end and GPU.


x86-64 ISA registers

Probably the easiest place to start when we’re comparing things like Skylake and Twister is the ISA (instruction set architecture). This subject alone is probably worthy of an article, but the short version for those that aren't really familiar with this topic is that an ISA defines how a processor should behave in response to certain instructions, and how these instructions should be encoded. For example, if you were to add two integers together in the EAX and EDX registers, x86-32 dictates that this would be equivalent to 01d0 in hexadecimal. In response to this instruction, the CPU would add whatever value that was in the EDX register to the value in the EAX register and leave the result in the EDX register.


ARMv8 A64 ISA Registers

The fundamental difference between x86 and ARM is that x86 is a relatively complex ISA, while ARM is relatively simple by comparison. One key difference is that ARM dictates that every instruction is a fixed number of bits. In the case of ARMv8-A and ARMv7-A, all instructions are 32-bits long unless you're in thumb mode, which means that all instructions are 16-bit long, but the same sort of trade-offs that come from a fixed length instruction encoding still apply. Thumb-2 is a variable length ISA, so in some sense the same trade-offs apply. It’s important to make a distinction between instruction and data here, because even though AArch64 uses 32-bit instructions the register width is 64 bits, which is what determines things like how much memory can be addressed and the range of values that a single register can hold. By comparison, Intel’s x86 ISA has variable length instructions. In both x86-32 and x86-64/AMD64, each instruction can range anywhere from 8 to 120 bits long depending upon how the instruction is encoded.

At this point, it might be evident that on the implementation side of things, a decoder for x86 instructions is going to be more complex. For a CPU implementing the ARM ISA, because the instructions are of a fixed length the decoder simply reads instructions 2 or 4 bytes at a time. On the other hand, a CPU implementing the x86 ISA would have to determine how many bytes to pull in at a time for an instruction based upon the preceding bytes.


A57 Front-End Decode, Note the lack of uop cache

While it might sound like the x86 ISA is just clearly at a disadvantage here, it’s important to avoid oversimplifying the problem. Although the decoder of an ARM CPU already knows how many bytes it needs to pull in at a time, this inherently means that unless all 2 or 4 bytes of the instruction are used, each instruction contains wasted bits. While it may not seem like a big deal to “waste” a byte here and there, this can actually become a significant bottleneck in how quickly instructions can get from the L1 instruction cache to the front-end instruction decoder of the CPU. The major issue here is that due to RC delay in the metal wire interconnects of a chip, increasing the size of an instruction cache inherently increases the number of cycles that it takes for an instruction to get from the L1 cache to the instruction decoder on the CPU. If a cache doesn’t have the instruction that you need, it could take hundreds of cycles for it to arrive from main memory.


x86 Instruction Encoding

Of course, there are other issues worth considering. For example, in the case of x86, the instructions themselves can be incredibly complex. One of the simplest cases of this is just some cases of the add instruction, where you can have either a source or destination be in memory, although both source and destination cannot be in memory. An example of this might be addq (%rax,%rbx,2), %rdx, which could take 5 CPU cycles to happen in something like Skylake. Of course, pipelining and other tricks can make the throughput of such instructions much higher but that's another topic that can't be properly addressed within the scope of this article.


ARMv3 Instruction Encoding

By comparison, the ARM ISA has no direct equivalent to this instruction. Looking at our example of an add instruction, ARM would require a load instruction before the add instruction. This has two notable implications. The first is that this once again is an advantage for an x86 CPU in terms of instruction density because fewer bits are needed to express a single instruction. The second is that for a “pure” CISC CPU you now have a barrier for a number of performance and power optimizations as any instruction dependent upon the result from the current instruction wouldn’t be able to be pipelined or executed in parallel.

The final issue here is that x86 just has an enormous number of instructions that have to be supported due to backwards compatibility. Part of the reason why x86 became so dominant in the market was that code compiled for the original Intel 8086 would work with any future x86 CPU, but the original 8086 didn’t even have memory protection. As a result, all x86 CPUs made today still have to start in real mode and support the original 16-bit registers and instructions, in addition to 32-bit and 64-bit registers and instructions. Of course, to run a program in 8086 mode is a non-trivial task, but even in the x86-64 ISA it isn't unusual to see instructions that are identical to the x86-32 equivalent. By comparison, ARMv8 is designed such that you can only execute ARMv7 or AArch32 code across exception boundaries, so practically programs are only going to run one type of code or the other.

Back in the 1980s up to the 1990s, this became one of the major reasons why RISC was rapidly becoming dominant as CISC ISAs like x86 ended up creating CPUs that generally used more power and die area for the same performance. However, today ISA is basically irrelevant to the discussion due to a number of factors. The first is that beginning with the Intel Pentium Pro and AMD K5, x86 CPUs were really RISC CPU cores with microcode or some other logic to translate x86 CPU instructions to the internal RISC CPU instructions. The second is that decoding of these instructions has been increasingly optimized around only a few instructions that are commonly used by compilers, which makes the x86 ISA practically less complex than what the standard might suggest. The final change here has been that ARM and other RISC ISAs have gotten increasingly complex as well, as it became necessary to enable instructions that support floating point math, SIMD operations, CPU virtualization, and cryptography. As a result, the RISC/CISC distinction is mostly irrelevant when it comes to discussions of power efficiency and performance as microarchitecture is really the main factor at play now.

SoC Analysis: Apple A9X SoC Analysis: CPU Performance
Comments Locked

408 Comments

View All Comments

  • zodiacfml - Saturday, January 23, 2016 - link

    Anandtech needs more people. Where is that video which records the latency of the Apple pencil or SP4? Aren't musicians and sound engineers be interested in the tablet for simple creation of music which I meant, audio should be tested? If testing methodology of Wi-Fi has problems, wouldn't it be nice to test if one could play or edit a high bit rate video saved from a high performance NAS? The device is a small niche but Anandtech could put some more analysis just for the entertainment/education value of it.
  • JoshHo - Saturday, January 23, 2016 - link

    Regarding stylus latency, the videos would be quite boring as it's nothing more than a straight line with the stylus. I've simply taken those videos and done multiple trials and averaged times to determine the approximate latency of the stylus system.

    We would like to properly test speaker and 3.5mm output. We're working on these things but it looks like 3.5mm output testing is quite difficult.

    We are also working on WiFi testing. This one will prove to be quite interesting as well.
  • zodiacfml - Saturday, January 23, 2016 - link

    Thanks. I just thought the device deserves more analysis and work based on the amount of interest and comments here. I have one more critique on camera testing. Why is it not possible to have a static object or studio for camera testing since Anandtech constantly review mobile devices which will make the tests faster to produce and output to be easily comparable between devices?
  • name99 - Sunday, January 24, 2016 - link

    One more issue. When you test storage throughput, do you use traditional file IO or memory mapped files? Apple has ALREADY indicated a strong preference that developers use memory mapped files, and as we move to a world on NVM living more or less directly on the memory bus, memory mapped IO will become SUBSTANTIALLY more performant than traditional file IO.
    It seems to me incumbent that your testing become prepared for this new world today (maybe by running tests both ways and reporting both speeds, or the higher speed); otherwise at some point soon (and it may be as soon as two or three years) Apple or Samsung or MS are going to ship the first consumer device using NVM, and your storage performance tests are just going to look dumb because you're not simply not accessing the storage properly.
  • dontlistentome - Saturday, January 23, 2016 - link

    5 hours to charge? If you started a working day on this with a flat battery and worked on it for 8 hours, would the battery even have charged by the end of the day?
  • digiguy - Saturday, January 23, 2016 - link

    As an ipad pro owner (128GB wifi), I'll give my opinion after owning it for around 2 months and reading this review (plus many others before this one, none as detailed, the best so far had been that of notebookcheck) and 166 comments. I also own a Surface pro 3, a Surface 2, a galaxy note 8, an ipad air and ipad mini 2, plus a few convertibles and a few laptops (no Macs however, Windows only). As expected, in the comments there was the traditional battle full OS vs mobile OS. Microsoft has proven how hard is to make a full OS easy to use on a tablet (some people here don't seem to understand what a titanic effort would be making OSX and its app good for tablets). Of course MS itself cannot control most apps and impose a touch friendly version. They tried the route of a mobile OS with RT but unfortunately it failed. It has to be said that Metro itself had some serious shortcomings, like the lack of a decent touch optimized file manager, onscreen keyboard issues etc. It's not easy to transform a desktop OS into a touch optimized OS and I understand why Apple has not and certainly will never try to make OSX for tablets. Same for Google, they tried to make chrome for touch with pixel c, but gave up and used android. Having said that, let's come to why I bought the ipad pro (especially while owning an SP3). First of all, screen size, I wanted something bigger to display documents in true A4 size, and the additional inch plus the better 4:3 ratio achieve that. The alternative would have been the surface book, but it's too expensive for just this (and has too compromises to replace my asus ultrabook, let alone my desktop replacement). Second reason was IOS music apps. IOS is the only mobile platform that can be used to a decent extent professionally by musicians. And this is great for sound libraries that can be used for example directly on the music rest of a piano/keyboard while connected to it via midi. Or to replace a mixer etc, where touch is essential. You can do this on Windows tablets, but software is not well optimized for touch and you often need anti-piracy dongles etc. so that a single USB port is not enough. None of that is necessary on IOS. Ipad pro sound, the best for any tablet, makes it useful without having to plug an external speaker in some circumstances (ex hotel room for working on music creation). Also an Ipad pro can act as a secondary monitor with duet display. And at it's size it can become useful, contrary to other ipads. So to sum up, screen size (and quality), high quality touch apps (for use cases in which touch is very important) without need for antipiracy dongles (widespread for music software) and sound. And this without mentioning the pencil (I am not an artist and only need to annotate PDFs, and for that I use my SP3, so haven't bought the pencil yet or the keyboard for that matter). Now, the shotcomings of ipad pro: Lack of a kickstand (with variable angles), lack of a pencil holder. Both can be solved by spending another 80$ for a urban armor gear case, with which the ipad pro is still lighter than SP3 with type cover. Lack of a file manager. This can be solved (to a decent extend) by buying a software called imazing. That's another 40$ but gives you a proper file manager and the possibility to copy file and folders from a pc to ipad. Other than that, some apps allow to sync you dropbox folders to ipad. Lack of storage expansion. Again spend the money for the 128GB version. As for SP3, screen is reflective, but, as for SP3, a matt screen protector works great and make the screen even more beautiful (no fingerprints, colors look even better without reflections). IOS not optimized enough for 12.9 inches, yet. No solution yet, we can only wait for IOS 10. So, with money you can fix many of the shortcomings, but is the over 1000$ necessary for that, justified? I would say probably not yet. But by buying the ipad pro I made a sort of bet on Apple to optimize IOS for better multitasking etc. and on IOS developers to continue making pro apps (especially for music, in my case), while already taking advantage of what it already offers. And the sheer power of this machine, so far not completely used, should make it a future-proof device, much more than other ipads (ready for IOS 10, 11, etc and for new powerful apps). What about Surface pro 3? Well to be honest, other than for annotating, I use it mainly as a very portable laptop on the go (only bring the 14 inches ultrabook when out for several days) with a nice, but not absolutely necessary, touch screen and nice pen input for taking handwritten notes. So mainly as a very convenient laptop rather than a tablet (as probably most Surface pro owners do too). As a tablet for the bed or for checking emails etc. on the go, my android phone or one of my 8 inch tablets are the most convenient devices....
  • Klug4Pres - Saturday, January 23, 2016 - link

    One of the better walls of text I have read, thank you.
  • digiguy - Saturday, January 23, 2016 - link

    thanks! well, I myself was impressed by how long it was... I only realized after I posted it.... ;-)
  • id4andrei - Saturday, January 23, 2016 - link

    Damn man, insert some spaces between ideas. Segmentation.
  • digiguy - Saturday, January 23, 2016 - link

    Yeah, right, sorry, the writing box is so small that I didn't think about layout. Next time I'll write in Word first and then copy...

Log in

Don't have an account? Sign up now