Today Intel is announcing some of its plans for its future Xeon Scalable platform. The company has already announced that after the Cascade Lake series of processors launched this year that it will bring forth another generation of 14nm products, called Cooper Lake, followed by its first generation of 10nm on Xeon, Ice Lake. Today’s announcement relates to the core count of Cooper Lake, the form factor, and the platform.

Today Intel is confirming that it will be bringing its 56-core Xeon Platinum 9200 family to Cooper Lake, so developers can take advantage of its new bfloat16 instructions with a high core count. On top of this, Intel is also stating that the new CPUs will be socketed, unlike the 56-core Cascade Lake CPUs which are BGA only. In order to necessitate the socketing of the product, this means that a new socket is required, which Intel has confirmed will also support Ice Lake in the same socket. According to one of our sources, this will be an LGA4189 product.

Based on our research, it should be noted that we expect bfloat16 support to only be present in Cooper Lake and not Ice Lake. Intel has stated that the 56-core version of Cooper Lake will be in a similar format to its 56-core Cascade Lake, which we take to mean that it is two dies on the same chip and limited to 2S deployments, however based on our expectations for Ice Lake Xeon parts, we have come to understand that there will be eight memory channels in the single chip design, and perhaps up to 16 memory channels with the dual-die 56-core version. (It will be interesting to see 16 channels at 2DPC on a 2S motherboard, given that 12 channels * 2DPC * 2S barely fits into a standard 19-inch chassis.)

Intel’s Lisa Spelman, VP of the Data Center Group and GM of Xeon, stated in an interview with AnandTech last year that Cooper Lake will be launched in 2019, with Ice Lake as a ‘fast follow-on’, expected in the middle of 2020. That’s not a confirmation that the 56-core version of Cooper will be in 2019, but this is the general cadence for both families that Intel is expected to run to.

At Intel’s Architecture Day in December 2018, Sailesh Kottapalli showed off an early sample of Ice Lake Xeon silicon. At the time I was skeptical, given that Intel’s 10+ process still looked like it was having yield issues with small quad-core chips, let alone large Xeon-like designs. Cooper Lake on 14nm should easily be able to be rolled into a dual-die design, like Cascade Lake, so it will be interesting to see where 10nm Ice Lake Xeon will end up.

Intel states that 56-core based Cascade Lake-AP Xeon Scalable systems are currently available as part of pre-built systems from major OEMs such as Atos, HPE, Lenovo, Penguin Computing, Megware, and authorized resellers. Given that Cooper Lake 56-core will be socketed, I would imagine that the design should ultimately be more widely available.

Related Reading

POST A COMMENT

50 Comments

View All Comments

  • quorm - Tuesday, August 6, 2019 - link

    Is there any workload that runs better on AVX512 than a gpu? Reply
  • npz - Tuesday, August 6, 2019 - link

    x265 encoder.
    GPU appropriate workloads are things that can be broken down into small simple code and highly parallelized. Literally something like a "dumb" matrix multiplication operation done 4096 times concurrently.
    Reply
  • azfacea - Tuesday, August 6, 2019 - link

    bullshit GPU shader language does an awful lot more than "dumb" matrix multiplication. and you wont be using the shader language u will be using the compiler and toolsets nvidia gives u. just because they destroy avx at number crunching doesnt mean they cant do anything else.
    software encoder is desired because its flexible and ez available. Youtube Amazon use ASICs anyway.

    wasting transistors on avx 512 does not benefit 99% market. its a complete ripoff. it only helps intel pretend they have something they dont.
    Reply
  • npz - Tuesday, August 6, 2019 - link

    A shader is limited to performing one op, in order one at a time. You can do more types of operations, but they're all the linear algebra types, plus a few bitwise operators. A shader unit won't do any out of order execution. Last I read, at least up until Pascal, the GPU itself can't do recursion. Context switching ie.e. register save/restore? Stupidly expensive. Likewise for branching.

    And show me where Youtube or Amazon uses ASICs for encoding. There isn't an implemention and I KNOW they use ffmpeg and x264 and x265, along with VP8/9 toolset for Youtube
    Reply
  • npz - Tuesday, August 6, 2019 - link

    Encoding isn't pure number crunching either. It's things like searching through forwards and backwards frames, searching within frames, developing *heuristics* for psychovisual analysis Reply
  • npz - Tuesday, August 6, 2019 - link

    The x264 developer had stated himself that GPU programming is very limited when it comes to actually implementing the full x264 feature set in hardware. You simply can't do it! Reply
  • mode_13h - Friday, August 9, 2019 - link

    This is silly. Nobody is implementing an entire H.264 encoder on a GPU (or, at least not in software). You can run some of the most expensive parts on the GPU, but not stuff like the bitstream encoder.

    Just because you can't implement the full thing on a GPU doesn't mean the GPU doesn't add a lot of value.
    Reply
  • npz - Tuesday, August 6, 2019 - link

    And let me tell you again, with more emphasize, that branching--something heuristical search algorithms are full of--absolutely kills a GPU Reply
  • Kevin G - Wednesday, August 7, 2019 - link

    Modern GPUs can do branching.... just very poorly in terms of performance compared to CPUs. However, it is part of their feature set so it should be possible to at least code it and run it as a proof of concept albeit slowly.

    I'm very curious when that developer stated that as the GPGPU feature set in GPUs is still expanding. It maybe old enough that GPU's have caught up to 'good enough' in this area that the initial complaint no long applies.
    Reply
  • mode_13h - Friday, August 9, 2019 - link

    They're actually better at branching than general-purpose CPUs, in terms of efficiency. Just not if you're using predication, rather than true branches. Reply

Log in

Don't have an account? Sign up now