Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights
by Andrei Frumusanu on October 25, 2021 9:00 AM EST- Posted in
- Laptops
- Apple
- MacBook
- Apple M1 Pro
- Apple M1 Max
Last week, Apple had unveiled their new generation MacBook Pro laptop series, a new range of flagship devices that bring with them significant updates to the company’s professional and power-user oriented user-base. The new devices particularly differentiate themselves in that they’re now powered by two new additional entries in Apple’s own silicon line-up, the M1 Pro and the M1 Max. We’ve covered the initial reveal in last week’s overview article of the two new chips, and today we’re getting the first glimpses of the performance we’re expected to see off the new silicon.
The M1 Pro: 10-core CPU, 16-core GPU, 33.7bn Transistors
Starting off with the M1 Pro, the smaller sibling of the two, the design appears to be a new implementation of the first generation M1 chip, but this time designed from the ground up to scale up larger and to more performance. The M1 Pro in our view is the more interesting of the two designs, as it offers mostly everything that power users will deem generationally important in terms of upgrades.
At the heart of the SoC we find a new 10-core CPU setup, in a 8+2 configuration, with there being 8 performance Firestorm cores and 2 efficiency Icestorm cores. We had indicated in our initial coverage that it appears that Apple’s new M1 Pro and Max chips is using a similar, if not the same generation CPU IP as on the M1, rather than updating things to the newer generation cores that are being used in the A15. We seemingly can confirm this, as we’re seeing no apparent changes in the cores compared to what we’ve discovered on the M1 chips.
The CPU cores clock up to 3228MHz peak, however vary in frequency depending on how many cores are active within a cluster, clocking down to 3132 at 2, and 3036 MHz at 3 and 4 cores active. I say “per cluster”, because the 8 performance cores in the M1 Pro and M1 Max are indeed consisting of two 4-core clusters, both with their own 12MB L2 caches, and each being able to clock their CPUs independently from each other, so it’s actually possible to have four active cores in one cluster at 3036MHz and one active core in the other cluster running at 3.23GHz.
The two E-cores in the system clock at up to 2064MHz, and as opposed to the M1, there’s only two of them this time around, however, Apple still gives them their full 4MB of L2 cache, same as on the M1 and A-derivative chips.
One large feature of both chips is their much-increased memory bandwidth and interfaces – the M1 Pro features 256-bit LPDDR5 memory at 6400MT/s speeds, corresponding to 204GB/s bandwidth. This is significantly higher than the M1 at 68GB/s, and also generally higher than competitor laptop platforms which still rely on 128-bit interfaces.
We’ve been able to identify the “SLC”, or system level cache as we call it, to be falling in at 24MB for the M1 Pro, and 48MB on the M1 Max, a bit smaller than what we initially speculated, but makes sense given the SRAM die area – representing a 50% increase over the per-block SLC on the M1.
The M1 Max: A 32-Core GPU Monstrosity at 57bn Transistors
Above the M1 Pro we have Apple’s second new M1 chip, the M1 Max. The M1 Max is essentially identical to the M1 Pro in terms of architecture and in many of its functional blocks – but what sets the Max apart is that Apple has equipped it with much larger GPU and media encode/decode complexes. Overall, Apple has doubled the number of GPU cores and media blocks, giving the M1 Max virtually twice the GPU and media performance.
The GPU and memory interfaces of the chip are by far the most differentiated aspects of the chip, instead of a 16-core GPU, Apple doubles things up to a 32-core unit. On the M1 Max which we tested for today, the GPU is running at up to 1296MHz - quite fast for what we consider mobile IP, but still significantly slower than what we’ve seen from the conventional PC and console space where GPUs now can run up to around 2.5GHz.
Apple also doubles up on the memory interfaces, using a whopping 512-bit wide LPDDR5 memory subsystem – unheard of in an SoC and even rare amongst historical discrete GPU designs. This gives the chip a massive 408GB/s of bandwidth – how this bandwidth is accessible to the various IP blocks on the chip is one of the things we’ll be investigating today.
The memory controller caches are at 48MB in this chip, allowing for theoretically amplified memory bandwidth for various SoC blocks as well as reducing off-chip DRAM traffic, thus also reducing power and energy usage of the chip.
Apple’s die shot of the M1 Max was a bit weird initially in that we weren’t sure if it actually represents physical reality – especially on the bottom part of the chip we had noted that there appears to be a doubled up NPU – something Apple doesn’t officially disclose. A doubled up media engine makes sense as that’s part of the features of the chip, however until we can get a third-party die shot to confirm that this is indeed how the chip looks like, we’ll refrain from speculating further in this regard.
493 Comments
View All Comments
caribbeanblue - Saturday, October 30, 2021 - link
Lol, you're just a troll at this point.sharath.naik - Monday, October 25, 2021 - link
The only reason M1 falls behind 3060 RTX is because the games are emulated.. if native M1 will match 3080. This is remarkable.. time for others to shift over to the same shared high bandwith memory on chip.vlad42 - Monday, October 25, 2021 - link
Go back and reread the article. Andrei explicitly mentioned that the games were GPU bound, not CPU bound. Here are the relevant quotes:Shadow of the Tomb Raider:
"We have to go to 4K just to help the M1 Max fully stretch its legs. Even then the 16-inch MacBook Pro is well off the 6800M. Though we’re definitely GPU-bound at this point, as reported by both the game itself, and demonstrated by the 2x performance scaling from the M1 Pro to the M1 Max."
Borderlands 3:
"The game seems to be GPU-bound at 4K, so it’s not a case of an obvious CPU bottleneck."
web2dot0 - Tuesday, October 26, 2021 - link
I heard otherwise on m1 optimized games like WoWAshlayW - Tuesday, October 26, 2021 - link
4096 ALU at 1.3 GHz vs 6144 ALU at 1.4-1.5 Ghz? What makes you think Apple's GPU is magic sauce?Ppietra - Tuesday, October 26, 2021 - link
Not going to argue that Apple's GPU is better, however the number of ALU and clock speed doesn’t tell the all story.Sometimes it can be faster not because it can work more but because it reduces some bottlenecks and because it works in a smarter way (by avoiding doing work that is not necessary for the end result).
jospoortvliet - Wednesday, October 27, 2021 - link
Thing is also that the game devs didn't write their game for and test on these gpus and drivers. Nor did Apple write or optimize their drivers for these games. Both of these can easily make high-double digit differences, so being 50% slower on a fully new platform without any optimizations and running half-emulated code is very promising.varase - Thursday, November 4, 2021 - link
Apple isn't interested in producing chips - they produce consumer electronics products.If they wanted to they could probably trash AMD and Intel by selling their silicon - but customers would expect them to remain static and support their legacy stuff forever.
When Apple finally decided ARMv7 was unoptimizable, they wrote 32 bit support out of iOS and dropped those logic blocks from their CPUs in something like 2 years. No one else can deprecate and shed baggage so quickly which is how they maintain their pace of innovation.
halo37253 - Monday, October 25, 2021 - link
Apple's GPU isn't magic. It is not going to be any more efficient than what Nvidia or AMD have.Clearly a Apple GPU that only uses around 65watts is going to compete with a Nvidia or AMD GPU that only uses around 65watts in actual usage.
Apple clearly has a node advantage at work here, and with that being said. It is clear to see that when it comes to actual workloads like games, Apple still has some work to do efficiency wise. As their GPU in the same performance/watt range compared to a Nvidia chip in the same performance/watt range on a older and not as power efficient node is able to still do better.
Apple's GPU is a compute champ and great for workloads that avg user will never see. This is why the M1 Pro makes a lot more sense then the M1 Max. The M1 Max seems like it will do fine for light gaming, but the cost of that chip must be crazy. It is a huge chip. Would love to see one in a mac mini.
misan - Monday, October 25, 2021 - link
Just replace GPU by CPU and you will see how devoid of logic your argument is.Apple has much more experience in low-power GPU design. Their silicon is specifically optimized for low-power usage. Why wouldn't it be more efficient than the competitors?
Besides, Andreis' test already confirm that your claims are pure speculation without any factual basis. Look at the power usage tests for the GFXbench. Almost three times lower power consumption with a better overall result.
These GPUs are incredible rasterizers. It's that you look at bad quality game ports and decide that they reflect the maximal possible reachable performance. Sure, GFXbench is crap, then look at Wild Life Extreme. That result translates to 20k points. Thats on par with the mobile RTX 3070 at 100W.