Today Intel is announcing some of its plans for its future Xeon Scalable platform. The company has already announced that after the Cascade Lake series of processors launched this year that it will bring forth another generation of 14nm products, called Cooper Lake, followed by its first generation of 10nm on Xeon, Ice Lake. Today’s announcement relates to the core count of Cooper Lake, the form factor, and the platform.

Today Intel is confirming that it will be bringing its 56-core Xeon Platinum 9200 family to Cooper Lake, so developers can take advantage of its new bfloat16 instructions with a high core count. On top of this, Intel is also stating that the new CPUs will be socketed, unlike the 56-core Cascade Lake CPUs which are BGA only. In order to necessitate the socketing of the product, this means that a new socket is required, which Intel has confirmed will also support Ice Lake in the same socket. According to one of our sources, this will be an LGA4189 product.

Based on our research, it should be noted that we expect bfloat16 support to only be present in Cooper Lake and not Ice Lake. Intel has stated that the 56-core version of Cooper Lake will be in a similar format to its 56-core Cascade Lake, which we take to mean that it is two dies on the same chip and limited to 2S deployments, however based on our expectations for Ice Lake Xeon parts, we have come to understand that there will be eight memory channels in the single chip design, and perhaps up to 16 memory channels with the dual-die 56-core version. (It will be interesting to see 16 channels at 2DPC on a 2S motherboard, given that 12 channels * 2DPC * 2S barely fits into a standard 19-inch chassis.)

Intel’s Lisa Spelman, VP of the Data Center Group and GM of Xeon, stated in an interview with AnandTech last year that Cooper Lake will be launched in 2019, with Ice Lake as a ‘fast follow-on’, expected in the middle of 2020. That’s not a confirmation that the 56-core version of Cooper will be in 2019, but this is the general cadence for both families that Intel is expected to run to.

At Intel’s Architecture Day in December 2018, Sailesh Kottapalli showed off an early sample of Ice Lake Xeon silicon. At the time I was skeptical, given that Intel’s 10+ process still looked like it was having yield issues with small quad-core chips, let alone large Xeon-like designs. Cooper Lake on 14nm should easily be able to be rolled into a dual-die design, like Cascade Lake, so it will be interesting to see where 10nm Ice Lake Xeon will end up.

Intel states that 56-core based Cascade Lake-AP Xeon Scalable systems are currently available as part of pre-built systems from major OEMs such as Atos, HPE, Lenovo, Penguin Computing, Megware, and authorized resellers. Given that Cooper Lake 56-core will be socketed, I would imagine that the design should ultimately be more widely available.

Related Reading

POST A COMMENT

50 Comments

View All Comments

  • mode_13h - Friday, August 9, 2019 - link

    I don't believe you've ever seen the inside of a H.264 encoder or even read the spec. Reply
  • mode_13h - Friday, August 9, 2019 - link

    GPUs actually do branching very efficiently. You just need to be sure all of the SIMD lanes take the same branch.

    Anyway, I think your point about GPUs being in-order shows you don't understand the point of them. They are massively parallel. That's where they get their speed--not from OoO tricks, like CPUs use.
    Reply
  • Phynaz - Tuesday, August 6, 2019 - link

    Ignorant much? Reply
  • npz - Tuesday, August 6, 2019 - link

    You want to reply with some cogent points to prove to me that a GPU can do what a CPU does? Reply
  • mode_13h - Friday, August 9, 2019 - link

    If you actually look at quorm's question, a lot of the cases that are challenging for GPUs are also bad for AVX-512. Reply
  • npz - Tuesday, August 6, 2019 - link

    As far as "wasting transistors" you can say that for literally any feature that either cpu or gpu does not use for any particular program. For example with GPU code not dealing with graphics, are you going to say that the transistors spent on the geometry, lighting and pixel rendering pipeline are "wasted"? Reply
  • dullard - Tuesday, August 6, 2019 - link

    Yes. Financial simulations for example (see the MC Libor Swaption Portfolio): https://www.xcelerit.com/computing-benchmarks/insi...

    Of course the GPU wins by a big margin in other software. You need to know what you are going to do and use the appropriate hardware for it. Computer simulations tend to run poorly on GPUs, but can benefit greatly by AVX 512: https://www.simutechgroup.com/images/easyblog_arti...
    Reply
  • HollyDOL - Wednesday, August 7, 2019 - link

    Huh, Adapter with this pin count sounds a bit scary... and expensive Reply
  • KurtL - Wednesday, August 7, 2019 - link

    16 memory channels for the socketed 56-core version? Don't hold your breath for that. I think they will only export 8 of the channels to the socket, for various reasons. First, the current Xeon Platinum 9200 series have a BGA with 5903 contacts, significantly more than the 4189 pins which I would expect LGA4189 to have. And only part of the pin increase compared to the Skylake/Cascade Lake LGA3647 socket can be used for the extra memory channels as I would expect power draw to go up also (especially if you want to feed those 56 cores at a reasonable clock speed). And I would guess they may want to export some more PCIe channels also to compete with the higher number of channels supported by AMD? So there is no way you can also pass 16 memory channels through that socket. Moreover, if the 2 die-on-socket Cooper Lake would have more memory channels on the same socket as the single die Cooper Lake or Ice Lake, you'd still need different motherboards. So what would be the point of that exercise? You could then go as well with a socket with even more pins to better satisfy the power needs of a 2 die Cooper Lake socket. Reply
  • Kevin G - Wednesday, August 7, 2019 - link

    Intel could be reverting back to a serial memory interface that would require an additional buffer chip on the motherboard to fan out to normal DIMM modules. Most of those implementations have traditionally permitted a doubling of channel count vs. parallel solutions at the time. However, such a schema would not be compatible with currently planned boards. Reply

Log in

Don't have an account? Sign up now