This review should be called Intel Xeon E5-2687W v3 and E5-2650 v3 on Windows Review. I'd think a large number of these servers would be used for other operating systems.
I have some Linux benchmarks in the pipeline that I'm testing but aren't ready for prime time yet. I'll need to get some CPUs back in my office to test with that though, these Xeons are usually only loaner samples and it gets difficult to retest them.
Thanks. I admit it really aggravates me, in 2014, to see screenshots of applications as some sort of qualifier. So I hope you can generate some really useful discrete data for a critical audience.
This review is all workstation loads, so it's not that helpful even if you are using Windows. I think most of the Windows Systems these very pricey Xeons end up in will be servers. IIS, database and active directory performance testing would be more appropriate.
I do find some value in the benchmarks proved by this review. For a review to include a high end workstation with DDR4 to have gaming benchmarks it proves that game engines do not take advantage of the extra bandwidth. The only factor is CPU architecture @ frequency + graphics cards.
I would have liked to see how this CPU handles server applications and storage such as ZFS. More and more converged infrastructure is becoming hardware vendor agnostic ESXi 6 has some pretty cool features that make sense with Super Micro hardware taking advantage of the latest CPU
I think next year Xeon will be much more interesting with 14nm. I am hoping to see an increase from 12 to 16, and 18 to 24/32 Core. Along with much cheaper DDR4.
Hey Ian any more thoughts on power consumption vs. Ivy Bridge in day-to-day use, not just load.
To me the obvious advantage of Grantley on paper is bringing all that Haswell power-gating/idle goodness to the server environment. The technology which lets Haswell spin out battery life in a laptop should also deliver energy and cost savings in a DC - which matters given power consumption (this is assuming your DC has decent periods of under-utilization - i.e. not an HPC plant!).
One thing that really needs spelling out is the clock speed under full load on all cores. That's much more informative than giving the default or the range.
For the 2687W it's 3.2GHz default, and 3.4Ghz with turbo on all cores. That's pretty disappointing Intel.
For ten cores I wouldn't expect a huge bump over the "minimum guaranteed" speed. It's one thing to boost a few cores by a large amount, but the whole problem with multi-core designs is that if you load up all the cores then either you have massive power consumption or you need to curtail the clocks. Honestly, running ten cores at 100% and still hitting 3.1GHz is impressive in my book -- and it still consumes up to 160W.
I got my numbers a bit wrong: the 2687W is 3.1 GHz default and 3.2 GHz all cores on turbo, according to wikipedia.
That's disappointing.
Apart from anything else, they've managed to get their best 12 (yes twelve!) core CPU (E5-2690 v3) to operate at 3.1 GHz turbo all cores in a 135 W design.
With two fewer cores and an extra 25 watts I'd hope for more than a mere 100 MHz performance.
Well, the thing with these "big" multicore systems is no different than testing large SMP system. You have to use programs for applications that where it make sense to use it. For engineering analyses and simulations, even HOW a problem is divided up (from a single, much larger problem) can have an impact on not only the speed for the analysis/simulation, but also the accuracy of the simulation, and you have to have a pretty sound understanding of the math and physics involved in order to make the best determination.
And for some applications, there is such a thing and you CAN have TOO many cores (where you've divided up a problem so much that it's now so small that it can't fully load a core up anymore, and that the process of dividing and re-assembling the results takes an extremely large amount of time.) (You can run into that with some of the FEA analysis).
I was working with Johan and studying a while slew of parameters using LS-DYNA to study how the various ways of decomposing a problem can have an impact on the crash test simulation results, and how swap performance means EVERYTHING when it comes to mechanical engineering simluations.
Oddly enough this can be the case with animation rendering aswell. I know a movie studio which uses a system that can exclude cores from a render pipeline so there is more RAM and cache bandwidth available with a fewer number of cores. This can matter because sometimes complex film renders can use huge amounts of data. Someone at SPI told me one frame of a big movie can involve 500GB of data.
Interesting how the same issue can crop up in such widely different fields.
Could you please test these motherboards for supporting ECC unbuffered DIMMs, reporting that ECC is active, and overclocking potential with ECC DIMMs? It would be good to know whether Xeon chips on non-server motherboards can use ECC.
What still is strange to me is that there is still no workstation cpu focused on a workstation with single threaded software. Wouldn't an i7 cpu still be much faster than this workstation cpu?
This new workstation CPU, Xeon E5-2687W v3, as we see, is intended for multithreaded software.
There are actually workstation CPUs better fitting for singlethreaded software: these are Xeon E3, e.g., Xeon E3-1286 v3 (3.7/4.1 GHz) and slower and cheaper models below it. These are essentially "professionalized" Core i7s for LGA1150. Being the same silicon as Core i7s for LGA1150, these E3s have their own downsides, however: only 32 GB of RAM and only 8 MB of L3 cache.
And the really fastest in single threaded tasks is Core i7-4790K at 4.0/4.4 GHz, but it lacks ECC memory support.
I would like to encourage Ian and AT in general to continue to split the coverage (as they have been doing recently) for dual-socketed platforms into the "low-end" enthusiast / workstation segment, and the "high-end" more heavy-duty server / enterprise segment.
Ian's recent articles hitting this from the "low-end" enthusiast / workstation angle have been really helpful to me, even though I've already been part-time "playing" with dual-socketed systems for some time, both as an educational exercise and a personal curiosity endeavor.
In particular, the effects of NUMA aware software on dual-socketed system performance are of great interest.
I've also noticed a lot of negative feedback to Ian's articles that I think is unwarranted. It's mostly from folks who want Ian to do more complex testing of more complex tasks that are primarily enterprise related. That's all well and good, but as I understand it, that is the job of the "other half" of AT to do.
Ian and AT doing dual-socketed articles on "low-end" Windows builds is exactly what we need to help people know whether or not they would like to "step-up" from X99-E. It also is helpful so that folks know what they are really getting into if they go the dual-socketed route. As Ian pointed out in recent articles there are still some things that X99-E will do better and going into dual-socketed computing all "starry-eyed" isn't necessarily the best way to approach it.
If there is anything that AT could use, it's actually even more comparative testing of X99 Haswell-E versus the C6xx Haswell-EP from a Windows workstation user's perspective. It would be great to see which taskings favored which platform in actual testing.
Everyone has an opinion, but actually doing it is the best way to demonstrate what works and what doesn't.
Btw, disappointing to see the threaded CB R15 result for the 2687W is only 30% better than an oc'd 3930K (mine @ 4.7 gives 1221). Does confirm that to really best a 1-socket oc'd i7, one really has to move to a multi-socket platform, and then of course it boils down to whether the sw is written to match (eg. is Handbrake written as well as it could?)
In the future, is there any chance you can add a benchmark that stresses single-threaded integer performance? I'd love to see how much Int performance has changed from generation to generation, but most sites (including this one) seem to focus on FP performance.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
27 Comments
Back to Article
personne - Monday, October 13, 2014 - link
This review should be called Intel Xeon E5-2687W v3 and E5-2650 v3 on Windows Review. I'd think a large number of these servers would be used for other operating systems.Ian Cutress - Monday, October 13, 2014 - link
I have some Linux benchmarks in the pipeline that I'm testing but aren't ready for prime time yet.I'll need to get some CPUs back in my office to test with that though, these Xeons are usually only loaner samples and it gets difficult to retest them.
personne - Monday, October 13, 2014 - link
Thanks. I admit it really aggravates me, in 2014, to see screenshots of applications as some sort of qualifier. So I hope you can generate some really useful discrete data for a critical audience.Marthisdil - Monday, October 13, 2014 - link
I think a large number of these servers will be used in ESX (or other hypervisor) hosts, so these benchmarks don't really mean a ton.Flunk - Tuesday, October 14, 2014 - link
This review is all workstation loads, so it's not that helpful even if you are using Windows. I think most of the Windows Systems these very pricey Xeons end up in will be servers. IIS, database and active directory performance testing would be more appropriate.elerick - Monday, October 13, 2014 - link
I do find some value in the benchmarks proved by this review. For a review to include a high end workstation with DDR4 to have gaming benchmarks it proves that game engines do not take advantage of the extra bandwidth. The only factor is CPU architecture @ frequency + graphics cards.I would have liked to see how this CPU handles server applications and storage such as ZFS. More and more converged infrastructure is becoming hardware vendor agnostic ESXi 6 has some pretty cool features that make sense with Super Micro hardware taking advantage of the latest CPU
iwod - Monday, October 13, 2014 - link
I think next year Xeon will be much more interesting with 14nm. I am hoping to see an increase from 12 to 16, and 18 to 24/32 Core. Along with much cheaper DDR4.Jon Tseng - Monday, October 13, 2014 - link
Hey Ian any more thoughts on power consumption vs. Ivy Bridge in day-to-day use, not just load.To me the obvious advantage of Grantley on paper is bringing all that Haswell power-gating/idle goodness to the server environment. The technology which lets Haswell spin out battery life in a laptop should also deliver energy and cost savings in a DC - which matters given power consumption (this is assuming your DC has decent periods of under-utilization - i.e. not an HPC plant!).
Curious if any thoughts/data on this... J
isa - Monday, October 13, 2014 - link
I feel personally threatened by the "idea-limited" constraint. I resemble that remark. But I compensate with kool LEDs on my PC.Carl Bicknell - Monday, October 13, 2014 - link
One thing that really needs spelling out is the clock speed under full load on all cores. That's much more informative than giving the default or the range.For the 2687W it's 3.2GHz default, and 3.4Ghz with turbo on all cores. That's pretty disappointing Intel.
JarredWalton - Monday, October 13, 2014 - link
For ten cores I wouldn't expect a huge bump over the "minimum guaranteed" speed. It's one thing to boost a few cores by a large amount, but the whole problem with multi-core designs is that if you load up all the cores then either you have massive power consumption or you need to curtail the clocks. Honestly, running ten cores at 100% and still hitting 3.1GHz is impressive in my book -- and it still consumes up to 160W.Carl Bicknell - Monday, October 13, 2014 - link
I got my numbers a bit wrong: the 2687W is 3.1 GHz default and 3.2 GHz all cores on turbo, according to wikipedia.That's disappointing.
Apart from anything else, they've managed to get their best 12 (yes twelve!) core CPU (E5-2690 v3) to operate at 3.1 GHz turbo all cores in a 135 W design.
With two fewer cores and an extra 25 watts I'd hope for more than a mere 100 MHz performance.
NovoRei - Monday, October 13, 2014 - link
Ian, could you comment on performance with pure AVX2 and mixed AVX instructions and where the W version stands?Thanks.
Laststop311 - Monday, October 13, 2014 - link
4100 for an 18 core ill take 2ruthan - Tuesday, October 14, 2014 - link
I would like to see, benchmarks some of those low power - 6/12 or 12/24 - 55W a 65W models.pokazene_maslo - Tuesday, October 14, 2014 - link
Is it possible to override turbo boost to force all cores to run at maximum turbo freqency? (E5-2687W-v3 running all cores at 3.5GHz)alpha754293 - Tuesday, October 14, 2014 - link
Well, the thing with these "big" multicore systems is no different than testing large SMP system. You have to use programs for applications that where it make sense to use it. For engineering analyses and simulations, even HOW a problem is divided up (from a single, much larger problem) can have an impact on not only the speed for the analysis/simulation, but also the accuracy of the simulation, and you have to have a pretty sound understanding of the math and physics involved in order to make the best determination.And for some applications, there is such a thing and you CAN have TOO many cores (where you've divided up a problem so much that it's now so small that it can't fully load a core up anymore, and that the process of dividing and re-assembling the results takes an extremely large amount of time.) (You can run into that with some of the FEA analysis).
I was working with Johan and studying a while slew of parameters using LS-DYNA to study how the various ways of decomposing a problem can have an impact on the crash test simulation results, and how swap performance means EVERYTHING when it comes to mechanical engineering simluations.
mapesdhs - Thursday, October 16, 2014 - link
Oddly enough this can be the case with animation rendering aswell. I know a movie studio
which uses a system that can exclude cores from a render pipeline so there is more RAM
and cache bandwidth available with a fewer number of cores. This can matter because
sometimes complex film renders can use huge amounts of data. Someone at SPI told me
one frame of a big movie can involve 500GB of data.
Interesting how the same issue can crop up in such widely different fields.
Ian.
RAMdiskSeeker - Tuesday, October 14, 2014 - link
Could you please test these motherboards for supporting ECC unbuffered DIMMs, reporting that ECC is active, and overclocking potential with ECC DIMMs? It would be good to know whether Xeon chips on non-server motherboards can use ECC.nutral - Tuesday, October 14, 2014 - link
What still is strange to me is that there is still no workstation cpu focused on a workstation with single threaded software. Wouldn't an i7 cpu still be much faster than this workstation cpu?TiGr1982 - Wednesday, October 15, 2014 - link
This new workstation CPU, Xeon E5-2687W v3, as we see, is intended for multithreaded software.There are actually workstation CPUs better fitting for singlethreaded software: these are Xeon E3, e.g., Xeon E3-1286 v3 (3.7/4.1 GHz) and slower and cheaper models below it.
These are essentially "professionalized" Core i7s for LGA1150.
Being the same silicon as Core i7s for LGA1150, these E3s have their own downsides, however: only 32 GB of RAM and only 8 MB of L3 cache.
And the really fastest in single threaded tasks is Core i7-4790K at 4.0/4.4 GHz, but it lacks ECC memory support.
hrrmph - Tuesday, October 14, 2014 - link
I would like to encourage Ian and AT in general to continue to split the coverage (as they have been doing recently) for dual-socketed platforms into the "low-end" enthusiast / workstation segment, and the "high-end" more heavy-duty server / enterprise segment.Ian's recent articles hitting this from the "low-end" enthusiast / workstation angle have been really helpful to me, even though I've already been part-time "playing" with dual-socketed systems for some time, both as an educational exercise and a personal curiosity endeavor.
In particular, the effects of NUMA aware software on dual-socketed system performance are of great interest.
I've also noticed a lot of negative feedback to Ian's articles that I think is unwarranted. It's mostly from folks who want Ian to do more complex testing of more complex tasks that are primarily enterprise related. That's all well and good, but as I understand it, that is the job of the "other half" of AT to do.
Ian and AT doing dual-socketed articles on "low-end" Windows builds is exactly what we need to help people know whether or not they would like to "step-up" from X99-E. It also is helpful so that folks know what they are really getting into if they go the dual-socketed route. As Ian pointed out in recent articles there are still some things that X99-E will do better and going into dual-socketed computing all "starry-eyed" isn't necessarily the best way to approach it.
If there is anything that AT could use, it's actually even more comparative testing of X99 Haswell-E versus the C6xx Haswell-EP from a Windows workstation user's perspective. It would be great to see which taskings favored which platform in actual testing.
Everyone has an opinion, but actually doing it is the best way to demonstrate what works and what doesn't.
mapesdhs - Thursday, October 16, 2014 - link
Entirely agree! Good summary.
Btw, disappointing to see the threaded CB R15 result for the 2687W is only 30% better
than an oc'd 3930K (mine @ 4.7 gives 1221). Does confirm that to really best a 1-socket
oc'd i7, one really has to move to a multi-socket platform, and then of course it boils down
to whether the sw is written to match (eg. is Handbrake written as well as it could?)
Ian.
PS. I hasten to add, I'm a different Ian. :D
SanX - Tuesday, October 14, 2014 - link
"And remember this rule Pinnochio for the rest of your life -- two processors with the factor of 1.5 difference are equal"colonelclaw - Wednesday, October 15, 2014 - link
Any chance you could include V-Ray in future benchmarks? It's multi-application and multi-platform and very popular in the CGI world.mapesdhs - Thursday, October 16, 2014 - link
And of course c-ray, which scales extremely well with multiple cores.Ian.
otherwise - Monday, November 17, 2014 - link
In the future, is there any chance you can add a benchmark that stresses single-threaded integer performance? I'd love to see how much Int performance has changed from generation to generation, but most sites (including this one) seem to focus on FP performance.