We have been spoiled. Since the introduction of the Xeon "Nehalem" 5500 (Xeon 5500, March 2009), Intel has been increasing the core counts of their Xeon CPUs by nearly 50% almost every 18 months. We went from four to six (Xeon 5600) on June 2010. Sandy Bridge (Xeon E5-2600, March 2012) increased the core count to 8. That is only 33% more cores, but each core was substantially faster than the previous generation. Ivy Bridge EP (Xeon E5-2600 v2, launched September 2013) increased the core count from 8 to 12, the Haswell-EP (Xeon E5-2600 v3, sept 2014) surprised with an 18-core flagship SKU.

However it could not go on forever. Sooner or later Intel would need to slow down a bit on adding cores, for both power and space reasons, and today Intel has finally pumped the brakes a bit.

Launching today is the latest generation of Intel's Xeon E5 processors, the Xeon E5 v4 series.Fifteen months after Intel's Broadwell architecture and 14nm process first reached consumers, Broadwell has finally reached the multi-socket server space with Broadwell-EP. Like past EP cores, Broadwell-EP is the bigger, badder sibling of the consumer Broadwell parts, offering more cores, more memory bandwidth, more cache, and more server-focused features. And thanks to the jump from their 22nm process to their current-generation 14nm process, Intel gets to reap the benefits of a smaller, denser process.

Getting back to our discussion of core counts then, even with the jump to 14nm, Intel has played it more conservatively with their core counts. Compared to the Xeon E5 v3 (Haswell-EP), Xeon E5 v4 (Broadwell-EP) makes a smaller jump, going from 18 cores to 24 cores, for an increase of 33%. Yet even then, for the new Xeon E5 v4 "only" 22 cores are activated, so we won't get to see everything Broadwell-EP is capable of right away.

Meanwhile the highest (turbo) clockspeed is still 3.6 GHz, base clocks are reduced with one or two steps and the core improvements are very modest (+5%). Consequently, performance wise, this is probably the least spectacular product refresh we have seen in many years.

But there are still enough paper specs that make the Broadwell version of the Xeon E5 attractive. It finds a home in the same LGA 2011-3 socket. Few people will in-place upgrade from Xeon E5 v3s to Xeon E5 v4s, but using the same platform means less costs for the server vendors, and more software maturity (drivers etc.) for the buyers.


They look very different but fit in the same socket: Xeon E5 v4 on top, Xeon E5 v3 at the bottom

Broadwell also has several features that make it a more attractive processor for virtualized servers. Finer granular control over how applications share the uncore (caches and memory bandwidth) to avoid scenarios where low priority applications slow down high priority ones. Meanwhile quite a few improvements have been made to make the I/O intensive applications run smoother on top of a virtualized layer. Most businesses run their applications virtualized and virtualization is still the key ingredient of the fast growing cloud services (Amazon, Digital Ocean, Azure...), and more and more telecom operators are starting to virtualized their services, so these new features will definitely be put to good use. And of course, Intel made quite a few subtle - but worth talking about - tweaks to keep the HPC (mostly "simulation" and "scientific calculation software) crowd happy.

But don't make the mistake to think that only virtualization and HPC are the only candidates for the new up-to-22-cores Xeons. The newest generation of data analytics frameworks have made enormous performance steps forward by widening the network and storage bandwidth bottlenecks. One example is Apache Spark, which can crunch through terabytes of data much more efficiently than its grandparent Hadoop by making better use of RAM. To get results out of a massive hump of text data, for example, you can use some of most advanced statistical and machine learning algorithms. Mix machine learning with data mining and you get an application that is incredibly CPU-hungry but does not need the latest and fastest NVMe-based SSDs to keep the CPU busy.

Yes, we are proud to present our new benchmark based upon Apache Spark in this review. Combining analytics software with machine learning to get deeper insights is one of the most exciting trends in the enterprise world. And it is also one of the reason why even a 22-core Broadwell is still not fast enough.

Broadwell-EP: The 14nm Xeon E5
Comments Locked

112 Comments

View All Comments

  • xrror - Tuesday, April 5, 2016 - link

    Even at 3.3Ghz though, they shouldn't be that slow. I'm taking a guess - if this was a student lab, and they bothered to specifically order xeon (or opteron back in the day) workstations - I'm guessing this was a CAD/CAM lab or something running a boatload of expensive licenced software (like, autodesk, solidworks, etc) and some of that stuff is horrible at thrashing on the hard drive, constantly.

    And I doubt your school could spring the cash for SSD drives in them (because Workstation SKU == you pay dearly OEM workstation 'certified' drive cost).

    This is all guesses though. And not trying to defend - it does suck when you have what should be a sweet machine choking for whatever reason, and you're there trying to get your assignments done and you just want to smash the screen cause it just chhhuuuuuuggggsss... ;p
  • SkipPerk - Friday, April 8, 2016 - link

    I have seen this many times, even in the for-profit sector. I once saw a compute cluster that was choking on server with slow storage. They had a 10 gb network and fast Xeon machines running on flash, but the primary storage was too slow. When they get a proper SAN it was an order of magnitude improvement.

    Back in the day storage was often the bottleneck, but it still comes up today.
  • someonesomewherelse - Thursday, September 1, 2016 - link

    We ran everything in virtual machines with the actual disk images not stored locally.... and the lans in the classrooms were 100mbit, idk about the connection from the classroom to the server with the image. How's that for slow?

    I would have loved it if our stuff was as 'slow' as yours. The wifi in the classrooms was very fast too..... especially since I doubt anyone bothered with turning of their torrents (well I mean it's completely understandable, you are going to watch the new episode of your favorite show once you are back home and not everyone had (well has, but most people can get it now) fth with at least 100Mbit line (ideally symmetrical, but some isps are too gready with ul speeds so 300/50 is cheaper than 100/100...... and good luck getting 1000/1000 on a residential package (the hw isn't the problem since you can get 1000/1000 with a commercial (aka over priced) package..... using the same hw... basically I would just need to sign a new contract, send it back, and enjoy the faster line in 1 business day or less)...well at least there are no bw caps (if I didn't read foreign boards bw caps on non mobile connections would be something I'd think no isp could do and not lose all customers) and there's we have no dmca (or something similar) and afaik no plans for one either (if they tried to pass such a law I can imagine that you'd have enough support for a referendum which you would win with a huge mayority), even better, the methods used to catch people downloading torrents are illegal anyway so any evidence obtained with them or derived from them is inadmissible anyway and just by presenting it you have admitted to several crimes which the police and prosecution are obliged to investigate/prosecute.... copyright infringment however is a civil matter).
  • donwilde1 - Tuesday, April 5, 2016 - link

    One of the more interesting Intel features, in my opinion, is that Broadwell carries an on-board encryption engine with its own interpreter similar to a small-memory, embedded JVM. This enables full Trusted Boot capability, which I view as a necessity in today's hackable world. Would you consider a follow-on article on this? The project was a clean-room development called BeiHai, done in China.
  • JamesAnthony - Wednesday, April 6, 2016 - link

    From what I can tell in looking over the benchmarks, there is not much of an increase in performance at all in core vs core performance speeds going from the V1 CPUs to the V4 CPUs
    As if you look at the benchmarks, and calculate that you are comparing 16 cores to 44 cores, the 44 core setup is not 2.75x faster.

    So while your overall speed goes up, your work accomplished per core is not increasing at the same rate.

    Why does this matter? Well thanks to software licensing costs, as you add cores it gets very expensive quickly. So if your software costs (which can easily exceed the hardware costs very quickly) go up with each core you add, but the work done does not, you quickly wind up in a negative cost / performance ratio.

    For quite a few people the E5-2667 v2 CPU with 8 cores at 3.5 GHz (Turbo 4) comes out around the best value for the software licensing cost.

    So while Intel puts out processors that overall can do more work than the previous ones, the move to per core software licensing is making it a negative value proposition. This is why people keep wanting higher clock speed lower core count processors, but we seem stuck around 3.5 GHz for many years.
  • SkipPerk - Friday, April 8, 2016 - link

    Although you are right for workstations, so much demand is for generic virtualized machines. Many buyers are fine with 2 ghz with as many cores as they can get. They load as little RAM as the spec requires and throw out the cheapest single core, dual thread 2 GB RAM VM they can. This is how call centers work, not to mention many low-level office jobs. They do not care about performance because this is more than enough.

    I have had specialty applications where prosumer 6-core or 8-core CPUs were the better deal (usually liquid cooled and overclocked), but not many buyers are licensing insanely expensive analytical software by the core.
  • SeanJ76 - Sunday, April 10, 2016 - link

    @Xeon chips!! TOTAL GARBAGE!
  • legolasyiu - Wednesday, April 20, 2016 - link

    The ASUS Workstation/Server board with V4 boards are very stable and they have 10% OC. I am very interested how the processor with those boards.
  • Bulat Ziganshin - Saturday, May 7, 2016 - link

    >This increases AES (symmetric) encryption performance by 20-25%

    PCLMULQDQ implements part of Galois Field multiplication and bdw actually improved only GCM part of AES-GCM algo. neither AES nor other popular symmetric encryption algos became faster
  • oceanwave1000 - Monday, May 9, 2016 - link

    This article mentioned that the Broadwell EP e5-v4 family has 3 die configurations. I got the 306mm2 and 454mm2. Did anyone catch the third one?

    Thanks.

Log in

Don't have an account? Sign up now