Name: Assessing Cavium's ThunderX2: The Arm Server Dream Realized At Last
Item: Assessing Cavium's ThunderX2: The Arm Server Dream Realized At Last
Author: Johan De Gelas

Assessing Cavium's ThunderX2: The Arm Server Dream Realized At Last

by Johan De Gelas on 5/23/2018 9:00 AM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

97 Comments

Back to Article

Davenreturns - Wednesday, May 23, 2018 - link
In the spec table for the AMD EPYC 7601 you have max sockets 4 and PCIe 3.0 lanes as 64. I thought the max sockets was 2 and that the total number of PCIe 3.0 lanes was 128 (64 in a dual socket machine).
davegraham - Wednesday, May 23, 2018 - link
max sockets is 2 and PCIe lanes is 128 (64 from each 7601 for a combined total of 128; remember, each 7601 has 128 PCIe lanes by themselves. 64 from each are ganged together for IF in a 2P system).
davegraham - Wednesday, May 23, 2018 - link
*are not *is
Davenreturns - Wednesday, May 23, 2018 - link
But in a single socket motherboard system, the total PCIe lanes available from one EPYC processor is 128 which I think we are both saying is correct.
Davenreturns - Wednesday, May 23, 2018 - link
The reason I think these two corrections are important and should be addressed by the author is the way the players in the market are competing. The table should read 128 PCIe lanes and 2 sockets max for EPYC. One only needs to look at AMD's EPYC One socket page to understand why it is important.

https://www.amd.com/en/products/epyc-7000-series-1...

The page is filled with marketing trying to convince customers that you are actually getting a two socket server in just one socket. And yes 128 PCIe lanes are available to the customer in these one socket products as part of the reasoning.

The max number of sockets is also important. AMD and probably Cavium are both arguing that 90% of the market only needs 1 or 2 sockets. Intel doesn't agree and provides 4 or more socket configurations.

The one socket argument centers around the I/O and memory channels available in the AMD processor. Even though the table just might have typos, reviewers around the web had a hard time believing that a single chip offered 128 lanes of PCIe connectivity and I found a lot of misinformation. It continues today.
DanNeely - Wednesday, May 23, 2018 - link
AFAIK even for intel 1/2 socket machines are around 90% of their sales. They're just selling enough total server chips in total that catering to the sliver of the market that does want 4/8way configurations is still worth their time.
Arnulf - Sunday, May 27, 2018 - link
Profit margins in that market segment are likely to be way higher so it's worth it for Intel as long as there is no competition, forcing prices downwards.
Ryan Smith - Wednesday, May 23, 2018 - link
You are correct. Thanks for pointing that out.
Davenreturns - Wednesday, May 23, 2018 - link
Thanks so much, Ryan.
vanilla_gorilla - Wednesday, May 23, 2018 - link
"This is because the customers who have invested in expensive enterprise software (Oracle, SAP) are less sensitive to cost on the hardware side, so they are much less likely to change to a new hardware platform."

I don't really follow the logic here. Just because you spend a lot more money on software doesn't mean you wouldn't try to save money on hardware. You don't only focus on one related expense because it's larger.
Gunbuster - Wednesday, May 23, 2018 - link
Because it's hard to explain the critical line of business software or database is having some unknown edge case issue because you thought look at me I'm so smart and saved 1% of the project cost using unproven low penetration hardware.
daanno2 - Wednesday, May 23, 2018 - link
I'm guessing you've never dealt with expensive enterprise software before. They are mostly licensed per-core, so getting the absolute best performance per core, even if the CPU is 2-3x more expensive, is worth it. At the end of the day, the CPUs might be <5% of the total cost.
SirPerro - Wednesday, May 23, 2018 - link
You can swallow a big risk if the benefit is 75% of the cost. Hey, it's definitely worth the try.

If your hardware makes up for 5% of the cost, saving a 3% of the total budget is not worth the risk of migration.
FunBunny2 - Thursday, May 24, 2018 - link
"You can swallow a big risk if the benefit is 75% of the cost. Hey, it's definitely worth the try."

the EOL of today's machines, the amortization schedules must be draconian. only if a 'different' server pays off in dozens of months, not years, will it have chance. to the extent that enterprise software is a C/C++ and *nix codebase, porting won't be onerous. but, I'm willing to guess, even Oracle code isn't all that parallel, so throwing a truckload of teeny cpu at it won't necessarily work.
name99 - Thursday, May 24, 2018 - link
The bigger problem here is the massive uncertainty around the meaning of the word "server" and thus the target for these new ARM CPUs.
Some people seem to think "server" means primarily boxes that run SAP or ORACLE, but I think it's clear that the ARM ecosystem has little interest in that, at least right now.

What's of much more interest is racks on racks of CPUs running commodity (LAMP) or homegrown software, ie data warehouses and HPC. I'm not even sure the Java benchmarks being run are of much interest to this market. The things that matter are the sorts of things Cloudflare was measuring when they tested Centriq -- memcached, nginx, transforming one type of data into another (compression/decompression, encrypt/decrypt, transcode,...) at massive throughput.
That's where I'd expect to see the big sales of the ARM "server" cores -- to Cloudflare, Baidu, Google, and so on.

Also now that Marvell is in the game, will be interesting to see the extent to which they pull this downward, into their traditional sorts of markets like infrastructure network and storage control (eg to go into network appliances and NAS boxes).
Ed469546 - Wednesday, June 13, 2018 - link
Some of the commercial software you pay per core. Intel had the best single threaded performance mening power license costs.

Interesting question is how the Thunderx2 cores are counted in this case: one core can run 4 threads.
andrewaggb - Wednesday, May 23, 2018 - link
I wonder what workloads they are targeting? High throughput with poor single threaded results is somewhat limiting.
peevee - Wednesday, May 23, 2018 - link
Web app servers. VM servers. Hadoop/Spark nodes. All benefit more from having more threads running in parallel instead of each request waiting or switching contexts.

If you are concerned about single-thread performance on 256-thread server (as 2-CPU server with this CPU will provide) AT ALL, you choose outrageously wrong hardware for the task to begin with. Go buy a 2-core i3. Practically the only test in this article which matters is Critical jOPS (assuming the used quality of service metric was configured realistically).
GeekyMcGeekface - Friday, May 25, 2018 - link
I’m building a cluster now with a few hundred Raspberry Pi’s because scale up is expensive and stupid. By distributing across a pool of clusters, I can handle far more memory bandwidth and compute. Consider 100 Raspberry PIs have 400 64-bit cores and 100GB of RAM. Total cost $3500 + power, mounting and switches.

Running three clusters of those with Kubernetes, Couchbase and Azure Functions provides 1200 64-bit cores, about 100GB of extremely high performance storage, incredible failover and a map-reduce environment to die for.

Add some 64GB MicroSD cards and an object storage system to the cluster and there’s 12TB of cold storage (4TB when made redundant).

Pay a service fee to some sweatshop in the Eastern Block to do the labor intensive bits and you can build a massively parallel, almost impossible to crash, CI/CD friendly, multi-tenant, infinitely scalable PaaS... for less than the cost of the RAM for a single one of the servers here.

The only expensive bits in the design are the Netscalers.

Oh... and the power foot print is about the same as one of these servers.

I honestly have no idea what I what I would use a server like these in a new design for.
jospoortvliet - Wednesday, May 30, 2018 - link
single-core performance with your pi's is considerably lower, as is inter-core bandwidth. If your tasks require little inter-process communication you're good but with highly interdependent compute it won't perform well. But for specific tasks, yes, it might be very cost effective.
Eris_Floralia - Wednesday, May 23, 2018 - link
The L2$ for SKX should be 1MB (256+768KiB), 16-way.
Ryan Smith - Wednesday, May 23, 2018 - link
Right you are. Thanks!
danjme - Wednesday, May 23, 2018 - link
Mental.
Duncan Macdonald - Wednesday, May 23, 2018 - link
The CPU may be much cheaper than the equivalent Intel CPU - however on the price of a complete server there would be almost no difference as the vast majority of the price of a server is in other items (RAM, storage, network, software etc). To take a significant share, the performance needs to be better than Intel CPUs on both a per thread and a per socket basis. Potential users will look at this CPU - see that it is not faster than Intel on a per thread basis and is also not X86-64 compatible and turn away with a shrug. A price difference of under 5% for a complete server is not enough to justify the risks of going from x86-64 to ARM.
BurntMyBacon - Thursday, May 24, 2018 - link
Perhaps you are correct and the lack of per thread performance will not allow Cavium to take a "significant' share of the market from Intel. However, at this point, getting even a small amount of market penetration in the server market is a significant achievement for an ARM vendor. This processor doesn't need to take a "significant" share from Intel to be successful. It just needs to establish a solid foothold. Given the data, I think it has a good chance of succeeding in that.

The bigger question in my mind is how Intel will respond. They already have the ability to make a many lite core accelerator as demonstrated by the Xeon Phi line. Will they bring this tech to their CPU lineup, create a new accelerator based on this tech to handle applications that use many light threads, create a new many small core CPU based on Goldmont Plus (or Tremont), or will they consider the ARM threat insignificant enough to ignore.
boeush - Wednesday, May 23, 2018 - link
"(*) EPYC and Xeon E5 V4 are older results, run on Kernel 4.8 and a slightly older Java 1.8.0_131 instead of 1.8.0_161. Though we expect that the results would be very similar on kernel 4.13 and Java 1.8.0_161"

What about Spectre/Meltdown mitigation patches? Were they in effect for 'older' results?
boeush - Wednesday, May 23, 2018 - link
To elaborate: if those numbers really are from July 2017, then they don't reflect true performance in a server context any longer (servers are where Spectre/Meltdown patches would be applied most.). Since the performance impact of Spectre/Meltdown is greatest on speculative execution and memory loads/prefetching, I'd guess those super-aggressive memory subsystem performance numbers, as well as single thread IPC advantages that Intel's CPUs claim in your benchmarks, are not really entirely applicable any longer.
HStewart - Wednesday, May 23, 2018 - link
Spectre has been proved to effect other CPU's than Intel and even effects ARM and AMD.,

Image on this article states that this CPU supports Fully Out of Order execution. So with my understanding of Spectre that this CPU also has issues.

To be honest I not sure how much the whole Spectre/Meltdown stuff is in this real world. It probably cause more harm in the computer industry than help.
Manch - Thursday, May 24, 2018 - link
Commentor: Blah Blah Blah Spectre?
HStewart: Shill Shill Shill must defend Intel by any means...
lmcd - Thursday, May 24, 2018 - link
Commentor: reasonable position taken
Manch: *banned for unreasonable, offensive comments*
imaheadcase - Wednesday, May 23, 2018 - link
I really think Anandtech needs to branch into different websites. Its very strange and unappealing to certain users to have business/consumer/random reviews/phone info all bunched together.

Ever since anand actually left it really did venture into more a business/insider based website with random stuff thrown in. It is in no way a bad thing, its just like this review for instance would not appeal to %95 of readers normally. Everyone likes technology naturally that comes to this website, but its a fine line between talking about high end server components that are out of reach to people who just read the article on the mini-itx gaming motherboard. lol
Andrei Frumusanu - Wednesday, May 23, 2018 - link
You're always free to skip articles, nobody's forcing you to read it.
boeush - Wednesday, May 23, 2018 - link
I guess he'd prefer the site content to be grouped in some manner roughly mirroring market segmentation. For instance: consumer, professional, enterprise, exotic/HPC. As opposed to jumbling everything together. Personally, I don't mind - but then, I'm not known for obsessive-compulsive organizing, either :)
BurntMyBacon - Thursday, May 24, 2018 - link
Given the large differences in tech, focus, needs, and trends, I wouldn't mind breaking out Phones and perhaps servers into their own sections. I think there is more than enough overlap to keep consumer and professional desktop/laptop/workstation together, but that is entirely up to how deeply you want to divide things up. On the other hand, you'll want all of it to show up on the front page in some form, or it'll look like the site doesn't have much activity. Perhaps separate pipelines for each category could work. That all said, I don't really mind just skipping over articles that don't interest me. :)
imaheadcase - Thursday, May 24, 2018 - link
Please, that is just lazy excuse. Even news websites have catagory based on the news you interested in. Anandtech literally had a review of a gaming motherboard then a high end server thing, and newz feed gets filled with phone and other news.
name99 - Thursday, May 24, 2018 - link
God, you must REALLY hate Twitter then...

I argue with Andrei a lot, but every so often he writes a sentence like "You're always free to skip articles, nobody's forcing you to read it" that makes me want to clap him on the back and say "yes, YOU get it" :-)
Threska - Sunday, May 27, 2018 - link
Taken to it's logical extreme the front page could be a dumping ground cesspool and the retort would be "you don't have to wade through any of it" which sounds witty but doesn't solve anything, but over time would lead to the predictable outcome of people leaving.
imaheadcase - Sunday, May 27, 2018 - link
I do hate twitter, but because it has no valid purpose other than to get customer service done faster with companies because it reflects more on them because public venue. Its mostly just a rant inducing place, or a place that is basically just texting anyways since everyone just wants you to send a DM.

The whole idea of saying "you are free to skip it" is kinda silly thing to say on the internet now. Especially since more and more you can filter things according to what you want. Not only that, but with the tight competition with views from tech websites its in best interest to have more options.

Even the layout of website never changed. I mean have you ever been to website without a adblocker on? They don't even advertise tech related stuff on it. Its just stupid clickbait stuff.

Keep in mind, this is not a complaint about articles itself, its just how they are posted. I love this site, been coming to it ever since i built first pc when i was a kid. But its focus is all over the place now vs years ago out what its posting. I'm half thinking one day i will see a review of electronic toothbrush then next day new CPU.
GreenReaper - Monday, June 4, 2018 - link
I'd be fine with that, as long as it was the best darn toothbrush in town!
Threska - Sunday, May 27, 2018 - link
Accessing through RSS might be a better solution especially with a good reader. Just needs accurate tags to match.
imaheadcase - Sunday, May 27, 2018 - link
Yah i tried that for a bit, it worked ok. But was not foolproof, it missed some stuff.
repoman27 - Wednesday, May 23, 2018 - link
Just to provide a counter point, this article made my day. And that’s coming entirely from intellectual curiosity—I don’t plan on deploying any servers with these chips in the near future. I always enjoy Johan’s writing, and was really looking forward to seeing how ThunderX2 would stack up. Many people are convinced that ARM is really only suitable in low power / mobile scenarios, but this is the chip that may finally prove otherwise. That has significant ramifications for the entire industry (including the consumer space), especially when you consider that Cavium could put out a TSMC 10nm or even 7nm shrink of ThunderX2 before Intel can get off of 14nm.
HStewart - Wednesday, May 23, 2018 - link
This does not proved that ARM is suitable in higher end space - look at the core specific speed - it extremely low compare to Intel and AMD server chips. Keep in mind it takes 128 total cores - running at 4SMT system. And what about other operations - what about Virtual Machine situation - where you have many virtual x86 machines on VMWare server,

How about high end mathematical and vector logic?

It does seem like ARM can run more threads - but maybe Intel or AMD has never had the need to

I think this latest Core battle is silly - I think it really not the number of cores you have but combination of type and speed of cores along with number of cores.
Wilco1 - Wednesday, May 23, 2018 - link
It certainly does prove that Arm can do high end servers - the results clearly show IPC/GHz is very close on SPECINT. Base clock speeds are the same as the Intel cores, and that's the speed the server runs at when not idle. But there are more cores as you say, so who will win is obvious.

Now imagine a next-gen 7nm version before Intel manages 10nm. Not a pretty picture, right?
HStewart - Wednesday, May 23, 2018 - link
Ok I have learn to agree to disagree with some people

Can this server run the VMWare server

https://kb.vmware.com/s/article/1003882

The answer is no - just one example - many more,

On 10nm - it not number that matters - it technology behind it - Intel supposely has a i3 and Y based for CannonLake coming this year - probably more.
Wilco1 - Wednesday, May 23, 2018 - link
There are plenty of VMs for Arm, so virtualization is not an issue.

10nm will be behind 7nm even if it ends up as originally promised and not using relaxed rules to become viable for volume production.
ZolaIII - Thursday, May 24, 2018 - link
When optimized for SIMD NEON extension things changed dramatically. All tho NEON isn't exactly the best SIMD never the less number's speak for them self.
https://blog.cloudflare.com/neon-is-the-new-black/
Tho Centriq is a bit pricier, bit overly slower than this but main point is it whose built on comparable lithography to current Intel's 14nm. So you get cheaper hardware, which can be packaged tighter & will consume much less power while being compatible regarding the performance. Triple win situation (initial cost, cost of ownership and scaling) but it still isn't turn key one whit isn't crucial for big vendor server farms anyway.
name99 - Thursday, May 24, 2018 - link
ARM (and this particular chip) aren't trying to solve every problem in the world. They're trying to offer a better (cheaper) solution for a PARTICULAR subset of customers.

If you think such customers don't exist, then why do you think Intel has such a wide range of Xeons, including eg all those Xeon Silvers that only turbo up to 3GHz? Or Xeon Gold's that max out at 2.8GHz?
lmcd - Thursday, May 24, 2018 - link
Second page: supports SR-IOV, which is important for KVM and Xen. If you're not aware, Xen and KVM are powerful virtualization solutions that cover the feature set of VMWare quite nicely.
HStewart - Wednesday, May 23, 2018 - link
"I really think Anandtech needs to branch into different websites. Its very strange and unappealing to certain users to have business/consumer/random reviews/phone info all bunched together."

I different in this - I don't think AnandTech should concentrate on just gaming in focus - this is rather old school - I am not sure about mobile phones in the mess of all this

But comparing ARM cpu's to Intel/AMD is interesting subject. It basically RISC vs CISC discussion - yes RISC can do operations quicker in some cases - but by definition of the architecture they are Reduce in what they do. Fox example it would take RISC a ton of instructions to executed a single AVX style operation.

This article is closest I have seen in comparing ARM vs x86 base machines - but even though I see some holes - it comes close - but having just be Linux based leaves out why people purchase such machine - I think Virtual Machine server is huge - but like everything else on the internet that is just an opinion
Wilco1 - Wednesday, May 23, 2018 - link
You might want to study RISC and CISC first before making any claims. RISC doesn't use more instructions than CISC. Vector instructions are actually quite similar on most ISAs. In fact I would say the Neon ones are more powerful and more general due to being well designed rather than added ad-hoc.
HStewart - Wednesday, May 23, 2018 - link
The following site explain the difference using a simple multiply action, where a CISC architecture can do in single instruction, RISC would need to use multiple instructions

http://www.firmcodes.com/difference-risc-sics-arch...

of course as time move on RISC chips added more complex operations and CISC also found ways to breaking more complex CISC instruction in smaller RISC like microcode increasing the chip ability to multitask the pipeline.
Wilco1 - Thursday, May 24, 2018 - link
The example was about load/store architecture, not multiply. In reality almost all instructions use registers (even on CISCs) since memory is too slow, so it's not a good example of what happens in actual code. The number of executed instructions on large applications is actually very close. The key reason is that compilers avoid all the complex instructions on x86 and mostly use register operations, not memory.
Kevin G - Tuesday, May 29, 2018 - link
Raw instruction counts isn't a good metric to determine the difference between RISC and CISC, especially as both have evolved to include various SIMD and transactional extensions.

The big thing for RISC is that it only supports a handful of instruction formats, generally all of the same length (traditionally 4 bytes)* and have alignment rules in place. x86 on the other hand leverages a series of prefixes to enhance instructions which permits length up to 15 bytes. On the flip side, there are also x86 instructions that consume a single byte. This also means x86 doesn't have the alignment rules that RISC chips generally adhere to.
*ARM does offer some compressed instruction formats in Thumb/Thumb2 but they those are also of a fixed length. 16 bit Thumb instructions are half size as 32 bit ARM instructions and have alignment rules as well.

Modern x86 is radically different internally than its philosophical lineage. x86 instructions are broken down into micro-ops which are RISC-like in nature. These decoded instructions are now being cached to bypass the complex and power hungry decode stages. Compare this to some ARM cores where some instructions do not have to be decoded. While having a simpler decode doesn't directly help with performance, it does impact power consumption.

However, I would differ and say that ARM's FPU and vector history has been rather troubled. Initially ARM didn't specify a FPU but rather a method to add coprocessors. This lead to 3rd parties producing ARM cores with incompatible FPUs. It wasn't until recently that ARM themselves put their foot down and mandated NEON as the one to rule them all, especially in 64 bit mode.
peevee - Wednesday, May 23, 2018 - link
The whole RISC vs CISC distinction is outdated for at least 20 years. Both now include a shi(p)load of instruction far outnumbering original CISC processors like 68000 and 8088 (from the epoch of the whole CISC vs RISC discussion), and both have a lot of architectural registers (which on speculative OoO CPUs are not even the same as real register files). ARMv8 for example includes NEON instructions, which is like... "AVX-128" (or SSE3 or smth).

A lot of instructions means that both have to have huge decoders, which limits how small the CPU can be (because any reduction in other hardware which decrease performance faster than cost). For 64-bit ARMv8.2 it is very unlikely than an implementation can be made smaller than A55, and it is a huge core (in transistors) compared to even Pentium, let alone 8088.
HStewart - Wednesday, May 23, 2018 - link
I think the big difference between SIMD technologies - even though ARM has included they are not as wide as instructions as Intel or AMD. The following link appears to have a good comparison of chip SIMD comparison in size, To me in looks like AMD is on AVX level 8/16 instead of 16/32 in current chips while ARM including Neon is 4 Wide which is actually less than Core 2 SSE instructions from 10 years ago.

https://stackoverflow.com/questions/15655835/flops...

It also interesting to note Ryzen stats - which I heard that AMD implement AVX 256 by combine two 128 together

One thing is that both Intel and AMD CPUs have grown a long ways since 20 years ago. In fact even todays Atom's can out rune most core-2 CPU's from 10 years - not my Xeon 5160 however.
ZolaIII - Thursday, May 24, 2018 - link
It's 2x128 NEON SIMD per ARM A75 core which goes into your smartphone.
Even with smaller SIMD utilising TBL QC Centriq is able to beat up an Xerox Gold.
https://blog.cloudflare.com/neon-is-the-new-black/
Wilco1 - Thursday, May 24, 2018 - link
Modern Arm cores have 2-3 128-bit SIMD units, so 16-24 SP FLOPS/cycle. About half of Skylake theoretical flops, and yet they can match or beat Skylake on many HPC codes. Size is not everything...
peevee - Thursday, May 24, 2018 - link
"ARM including Neon is 4 Wide which is actually less than Core 2 SSE instructions from 10 years ago"

How is it less? It is the same 128 bits, 2x64 or 4x32 or 2x16...

And AMD combines 2 AVX-256 operations (not 2 128-bit SSEs) to get AVX-512.
patrickjp93 - Friday, May 25, 2018 - link
AMD does NOT have AVX-512. They combine 2 128s into a 256 on Ryzen, ThreadRipper, and Epyc.
name99 - Thursday, May 24, 2018 - link
For crying out loud!
At the very least, if you want to pursue this obsession regarding vectors, look at ARM's SVE (Scalable Vector Extensions). THAT is where ARM is headed in the vector space.
Fujitsu is implementing these for the cores of its next HPC machines, and they will likely roll out into other ARM cores (maybe Apple first? but who can be sure?) over the next few years.

To the extent that Cavium has any interest in competing in HPC, if/when they choose to do so it will be on the basis of an SVE implementation, not on the basis of NEON.

Meanwhile ARMv8 NEON is very much the equivalent of SSE. Not AVX, no, but SSE (in all its versions) yes.
tuxRoller - Thursday, May 24, 2018 - link
Nice comment.
BTW, centriq (rip) only supports(ed) aarch64. I've no idea how much die space that saved, though.
Wilco1 - Thursday, May 24, 2018 - link
There is Cortex-A35, smallest AArch64 core so far with FP and Neon.

However there are still big differences between RISC and CISC. For example it's not feasible for CISC to get anywhere near the same size/perf/power. The mobile Atom debacle has clearly shown it's not feasible to match small and efficient RISCs even with a better process and many billions of dollars...
peevee - Thursday, May 24, 2018 - link
It is not 8.2.
lmcd - Wednesday, January 23, 2019 - link
Necro but worth for historic reasons: A35 is AArch32 but ARMv8
ZolaIII - Thursday, May 24, 2018 - link
It would took them a same. AVX is a SIMD FP extension to the prime architectural instruction set same as NEON and cetera. The strict difference between CISC and RISC architecture is long gone and today's one's are combined & further more implement IVIL SIMDs and more & more of DSP components as MAC's. The train only starts on prime integer instruction set (where by the way ARM is stellar) and then switches it's worker's to FP extensions and accelerated blocks of different kinds. The same way lintel grow up AVX to 512 bit in current use NEON can be scaled up & beyond. Fuitsu worked with ARM on 1024 & 2048 NEON SIMD blocks couple of years ago. Still if you think how FP is a best way to do it you are wrong, DSP's use CP and it's much more efficient power & performance wise but less scalable.

On what would you like server's to be compared? Almost 90% of enterprise servers run on Linux, even Microsoft is earning more money this day's on Linux than from selling Windows desktop & server's combined.
You are very ignorant person. Why do you coment about the things you don't know anything about?
Ryan Smith - Thursday, May 24, 2018 - link
"I really think Anandtech needs to branch into different websites. Its very strange and unappealing to certain users to have business/consumer/random reviews/phone info all bunched together."

Although I appreciate the feedback, I must admit that we enjoy doing a variety of things. There are a lot of cool things happening in the technology world, not all of which are in the consumer space. So rare articles like these - and we only publish a few a year - let us keep tabs on what's going on in some of those other markets.
HStewart - Wednesday, May 23, 2018 - link
I would think that a lot of this depends what type of applications are running on server. Highly mathematical and especially any with Vectors will be likely different. Also there is no support for Windows based servers which limits which applications can be done - so my guess this will be useless if desiring a VMWave server.

But it is interesting that it takes a 4SMT to compete with x86 based servers from Intel and AMD and with more cores 32 vs 22/28 depending on version.
Wilco1 - Wednesday, May 23, 2018 - link
You're right, on floating point and vectors the results are different. To be precise - even more impressive. See the last page for example where it soundly beats Skylake on OpenFoam and a few other HPC benchmarks. Hence the huge interest from all the HPC companies.

Note Windows has been running on Arm for quite some time. Microsoft runs Windows Server both on Centriq and ThunderX2. See eg. https://www.youtube.com/watch?v=uF1B5FfFLSA for more info.
HStewart - Wednesday, May 23, 2018 - link
Windows on ARM is DOA,
Wilco1 - Wednesday, May 23, 2018 - link
That's your uninformed opinion... Microsoft has different plans.
ZolaIII - Thursday, May 24, 2018 - link
Windows is DOA anyway. M$ makes more money this day's on Linux then it does on Window's combined. Only thing making it still alive is MS Office but even that will change in couple of years.
Wilco1 - Thursday, May 24, 2018 - link
Calling Windows dead when it ships on 95+% of PCs sold is eh... a little bit premature. Get back to me when 50+% of PCs ship with Linux instead of Windows.
ZolaIII - Friday, May 25, 2018 - link
Get back to me when windows ships with 5% in; servers, embedded, router's, smartphones...
jimbo2779 - Thursday, May 24, 2018 - link
In what way is it making more from Linux?
ZolaIII - Friday, May 25, 2018 - link
https://www.computerworld.com/article/3271085/micr...
Even your Windows PC, Office and everything else from Microsoft this day's is backed up by a cloud which is Linux based.
defaultluser - Wednesday, May 23, 2018 - link
Page 11 has "Apache Spark and Energy Consumption" in the title, but the page only contains
Apache Spark results. WHERE IS THE ENERGY CONSUMPTION?

We need power consumption tests during benchmarks to show if the architecture has better perf/watt than Intel. Otherwise, why did you publish this obviously incomplete article?
Ryan Smith - Wednesday, May 23, 2018 - link
Whoops. Sorry, that was a small section that was moved to page 5.
ruthan - Wednesday, May 23, 2018 - link
Well, where is the most important chart performance per dollar comparison with x86 solution?

That virtualization support, is some arm specific yes i we need feature and proprietary hell like Lpars.. or its finally support Vmware? - that means virtualization.

Where is could it run Crysis test?
HStewart - Wednesday, May 23, 2018 - link
VMWare is not currently support - and probably not for a long time - unless they ran in emulation mode and it would slower than Atom

https://kb.vmware.com/s/article/1003882
DrizztVD - Wednesday, May 23, 2018 - link
It amazes me how the one big advantage ARM could have is the power efficiency, yet no power efficiency numbers in this review? It's like someone just isn't thinking about what can best showcase the ARM advantage and testing it.
boeush - Thursday, May 24, 2018 - link
You must have missed this bit:

"So as is typically the case for early test systems, we are not able to do any accurate power comparisons.

In fact, Cavium claims that the actual systems from HP, Gigabyte and others will be far more power efficient."

This was an early (and apparently quite buggy, especially from the power management standpoint) test system. It's not representative of final production systems in these respects, so doing what you request on it would only put a very crude lower bound on efficiency, at best.

That's why the final section of the write-up has a title ending in ": so far"... (obviously, there will be more to come if/when real production-quality systems are available for benchmarking/analysis.)
ZolaIII - Thursday, May 24, 2018 - link
It's broken currently on the MB. If you want to see real power/performance metrics for a SoC made on comparable lithography to the lintels 14 nm (aka TSMC 10nm) & with optimised software read this:
https://blog.cloudflare.com/neon-is-the-new-black/
drwho9437 - Wednesday, May 23, 2018 - link
Thanks Johan, I've been reading since Ace's. I can't believe it has been more almost 20 years. Even though I don't work in this market I still read everything you write.
JohanAnandtech - Friday, May 25, 2018 - link
It was indeed almost 20 years ago that I published my first article about the K6-2 vs Pentium MMX. And Anand's star was about to rise with the launch of the K6-3 :-).
Spatz - Wednesday, May 30, 2018 - link
Wow. Aces hardware... that used to be my go to for hardware reviews back in the day. I can’t believe your still at it! This article was great. Keep up the good work.
beginner99 - Thursday, May 24, 2018 - link
So it for sure is an option. however I d not get the focus on price. The CPU cost is a small fraction of the total server cost and a tiny if infrastructure cost (network, HVAC,...) is included. Add to that the software and data running on that server and if your CPU is 5% faster at same power it costing $5000 more might be totally worth it.
Apple Worshipper - Thursday, May 24, 2018 - link
Errmm... does ARM feature SMT now?
Ryan Smith - Thursday, May 24, 2018 - link
Not in Arm's own cores. But in Cavium's ThunderX2, yes.
sgeocla - Thursday, May 24, 2018 - link
What's up with EPYC comparison missing in almost all benchmarks?
EPYC has been out for a while and the only benchmarks are from almost a year ago?
JohanAnandtech - Thursday, May 24, 2018 - link
I have been trouble shooting a Java problem for the last 3 weeks now - for some reason our specific EPYC test system has some serious performance issues after we upgraded to kernel 4.13. This might be a hardware/firmware... issue. I don't know. I just know that the current tests are not accurate.
junky77 - Thursday, May 24, 2018 - link
What? A 2.5GHZ ARM core is around 60-70% of a 3.8GHZ Skylake core?? For 3.8GHZ, the ARM is probably at least as fast?
Wilco1 - Thursday, May 24, 2018 - link
Probably around 90% since performance doesn't scale linearly with frequency. Note these are throughput parts so won't clock that high. However a 7nm version might well reach 3GHz.
AJ_NEWMAN - Thursday, May 24, 2018 - link
If Caviums tweaked 16nm hits 3GHz - it would to be unreasonable to aim for 4GHz for a 7nm part.

With 2.3 times as many transistors available - it will be interesting to see what else they beef up?

HIgher IPC? 64 cores? 16 memory controllers? CCIX - or perhaps they will compete with Fujitsu and add some Supercomputer centric hardware?

AJ
meta.x.gdb - Thursday, May 31, 2018 - link
Wonder why the VASP code limped along on ThunderX2 while OpenFOAM saw such gains. I'm pretty familiar with both codes. VASP is mostly doing density functional theory, which is FFT-heavy...
Meteor2 - Tuesday, June 26, 2018 - link
All I want to say (all I can say) is that Anandtech has some of the best writers and commenters in this field. Fantastic article, and fantastic discussion.
paldU - Saturday, July 7, 2018 - link
A typo in Page 2. "it terms of performance per dollar" should be " in terms of performance per dollar".

Assessing Cavium's ThunderX2: The Arm Server Dream Realized At Last

Post Your Comment

97 Comments

Back to Article

Davenreturns - Wednesday, May 23, 2018 - link

davegraham - Wednesday, May 23, 2018 - link

davegraham - Wednesday, May 23, 2018 - link

Davenreturns - Wednesday, May 23, 2018 - link

Davenreturns - Wednesday, May 23, 2018 - link

DanNeely - Wednesday, May 23, 2018 - link

Arnulf - Sunday, May 27, 2018 - link

Ryan Smith - Wednesday, May 23, 2018 - link

Davenreturns - Wednesday, May 23, 2018 - link

vanilla_gorilla - Wednesday, May 23, 2018 - link

Gunbuster - Wednesday, May 23, 2018 - link

daanno2 - Wednesday, May 23, 2018 - link

SirPerro - Wednesday, May 23, 2018 - link

FunBunny2 - Thursday, May 24, 2018 - link

name99 - Thursday, May 24, 2018 - link

Ed469546 - Wednesday, June 13, 2018 - link

andrewaggb - Wednesday, May 23, 2018 - link

peevee - Wednesday, May 23, 2018 - link

GeekyMcGeekface - Friday, May 25, 2018 - link

jospoortvliet - Wednesday, May 30, 2018 - link

Eris_Floralia - Wednesday, May 23, 2018 - link

Ryan Smith - Wednesday, May 23, 2018 - link

danjme - Wednesday, May 23, 2018 - link

Duncan Macdonald - Wednesday, May 23, 2018 - link

BurntMyBacon - Thursday, May 24, 2018 - link

boeush - Wednesday, May 23, 2018 - link

boeush - Wednesday, May 23, 2018 - link

HStewart - Wednesday, May 23, 2018 - link

Manch - Thursday, May 24, 2018 - link

lmcd - Thursday, May 24, 2018 - link

imaheadcase - Wednesday, May 23, 2018 - link

Andrei Frumusanu - Wednesday, May 23, 2018 - link

boeush - Wednesday, May 23, 2018 - link

BurntMyBacon - Thursday, May 24, 2018 - link

imaheadcase - Thursday, May 24, 2018 - link

name99 - Thursday, May 24, 2018 - link

Threska - Sunday, May 27, 2018 - link

imaheadcase - Sunday, May 27, 2018 - link

GreenReaper - Monday, June 4, 2018 - link

Threska - Sunday, May 27, 2018 - link

imaheadcase - Sunday, May 27, 2018 - link

repoman27 - Wednesday, May 23, 2018 - link

HStewart - Wednesday, May 23, 2018 - link

Wilco1 - Wednesday, May 23, 2018 - link

HStewart - Wednesday, May 23, 2018 - link

Wilco1 - Wednesday, May 23, 2018 - link

ZolaIII - Thursday, May 24, 2018 - link

name99 - Thursday, May 24, 2018 - link

lmcd - Thursday, May 24, 2018 - link

HStewart - Wednesday, May 23, 2018 - link

Wilco1 - Wednesday, May 23, 2018 - link

HStewart - Wednesday, May 23, 2018 - link

Wilco1 - Thursday, May 24, 2018 - link

Kevin G - Tuesday, May 29, 2018 - link

peevee - Wednesday, May 23, 2018 - link

HStewart - Wednesday, May 23, 2018 - link

ZolaIII - Thursday, May 24, 2018 - link

Wilco1 - Thursday, May 24, 2018 - link

peevee - Thursday, May 24, 2018 - link

patrickjp93 - Friday, May 25, 2018 - link

name99 - Thursday, May 24, 2018 - link

tuxRoller - Thursday, May 24, 2018 - link

Wilco1 - Thursday, May 24, 2018 - link

peevee - Thursday, May 24, 2018 - link

lmcd - Wednesday, January 23, 2019 - link

ZolaIII - Thursday, May 24, 2018 - link

Ryan Smith - Thursday, May 24, 2018 - link

HStewart - Wednesday, May 23, 2018 - link

Wilco1 - Wednesday, May 23, 2018 - link

HStewart - Wednesday, May 23, 2018 - link

Wilco1 - Wednesday, May 23, 2018 - link

ZolaIII - Thursday, May 24, 2018 - link

Wilco1 - Thursday, May 24, 2018 - link

ZolaIII - Friday, May 25, 2018 - link

jimbo2779 - Thursday, May 24, 2018 - link

ZolaIII - Friday, May 25, 2018 - link