Today Arm is announcing four new products in its NPU, GPU and DPU portfolio. The company is branding its in-house machine learning processor IPs the Ethos line-up detailing more the existing N77 piece and also revealing the smaller N57 and N37 siblings in the family. To top things off, the company is also making ready its first mid-range GPU IP based on the brand-new Valhall architecture, the new Mali-G57. Finally, we’re seeing the release of a new mid-range DPU in the form of the Mali-D37.

Introducing the Ethos NPU Family

Arm’s NPU IP offering was first announced early last year, detailing its architecture a few months later, and has been publicly been known until known just as “the Arm Machine Learning processor”. Arm at TechCon this year has officially branded the IP as the Ethos line-up, and the N77 has been the main product that’s been previously referred to as the Arm MLP codename.

Microarchitecturally, the new branded Ethos-N77 now publicly changes its specs compared to what had been revealed last year by allowing for a configurable 1 to 4MB SRAM implementation, whilst last year it had been disclosed it would scale up to 1MB only. Arm explains that customers needed more memory bandwidth for processing these mesh networked NPUs, as DRAM bandwidth doesn’t scale up in the premium segment as fast as the core count does. The flagship IP offers up to 4TOPS processing power at 1GHz clock and has a respectable 5TOPS/W efficiency.

Arm is able to use the same building blocks across the different IPs. The NPUs all share the same MAC computation engine (MCE) and programmable layer engines (PLE). The MCE consists out of 128 MAC units, as disclosed last year, and is paired alongside a PLE. An MCE and PLE, plus SRAM, make up a computation engine (CE), and this is the scaling block that differs between the N77, N57 and N37, coming in 16x, 8x and 4x configurations in terms of the CE count.

The mid-range and low-end variants are being released a lot faster than other new IP technologies because Arm is seeing a lot more interest in doing ML in cost-constrained devices where every mm² of silicon is important. Particularly features like smartphone face unlocking or DTV resolution upscaling are becoming commodity features.

The new NPUs have already been licensed and delivered to customers.

Revealing the Mali-G57 - First Mid-range Valhall Based GPU

Earlier this year, Arm had announced the new Valhall architecture in the new Mali-G77 that we’re expecting to see in SoCs next year. The new GPU architecture is a more major departure from the Bifrost based GPUs we’ve seen over the last three years as Arm has completely revamped its graphics ISA and computation microarchitecture.

Today, Arm reveals that the company is adopting the new Valhall architecture in the mid-range, starting off with the new Mali-G57. We currently don’t have too many details on exactly what the finer microarchitecture configurations of the new GPU looks like, but we’re very likely looking at something that will be very similar to the G77, scaled down similar to how the G52 looked like compared to the G72.

Improvements compared to a G52 with three execution engines per core (3EE) promise 1.3x better performance in a similar core configuration, 30% better energy efficiency, and 30% better silicon density (due to the better performance).

Mali-D37 DPU - Bringing High-End Features To the Mid-Range

Finally, to wrap things up, Arm is now bringing to market a new mid-range DPU in the form of the Mali-D37.

The new IP is based on the “Komeda” architecture which was first introduced in the Mali-D71 and its follow-up, the Mali-D77 announced this year. Then new DPU targets resolutions of 2K and FHD and promises to take up only <1mm² on 16nm.

Related Reading

Comments Locked

12 Comments

View All Comments

  • rpg1966 - Wednesday, October 23, 2019 - link

    "Improvements compared to a G52 with three execution engines per core (3EE) promise 1.3x better performance in a similar core configuration, 30% better energy efficiency, and 30% better silicon density"

    Every new chip seems to have a comment like this. But what does it mean? Do you get all those benefits all the time, or does the designer have to pick which improvement they require (i.e. you can have extra speed OR extra energy efficiency)?
  • boozed - Wednesday, October 23, 2019 - link

    I always read that to be a 30% energy efficiency improvement for the same level of performance.

    These words are from the marketing... people. It's in their nature to stretch the truth as far as it'll go. Or to be ambiguous, hence the mixing of "1.3x" with "+30%".
  • close - Wednesday, October 23, 2019 - link

    You should also read it as "up to". I'm sure there are cases where the claimed improvements are close to 0 and if past experience showed us anything, sometimes the "new thing" can even perform worse even if just in corner cases.
  • boozed - Wednesday, October 23, 2019 - link

    100%
  • Krysto - Wednesday, October 23, 2019 - link

    99.99% of the time you should read the "AND" as "OR".

    1.3x better performance OR 30% better energy efficiency OR 30% better silicon density.

    In practice, most chip designers do a mix of those improvements, such as:

    1.05x increase in performance AND 15% better efficiency AND 10% better silicon density (roughly speaking).
  • ET - Wednesday, October 23, 2019 - link

    The 'or' is only true for performance and energy efficiency. Silicon density is unrelated (and rarely advertised). I think it would be more correct to say that you can select two out of the three, but it's also not a perfect description.

    Some valid combinations are:

    - 30% smaller area and 30% lower power for the same performance as previous gen.
    - Smaller area and higher performance for the same power, using higher clocks.
    - Same area and higher performance and somewhat lower power, using more units but middle clocks.
  • levizx - Wednesday, October 23, 2019 - link

    "how the G52 looked like compared to the G72"

    No it's not. G52 was a massive step up from G72 (other than absolute max core count). G52 is architecturally similar to G76.
  • SydneyBlue120d - Wednesday, October 23, 2019 - link

    AV1 encoding support?
  • tuxRoller - Thursday, October 24, 2019 - link

    Yes!

    No. Wrong ip. Arm handles codec support with their vpu. This article is looking at the arm gpu & dpu announcements.
  • name99 - Wednesday, October 23, 2019 - link

    "a respectable 5TOPS/W efficiency"

    Hah.
    Compare (about Spring Hill) "This works out to power-efficiency of 2.0 to 4.8 TOPs/w ... It’s also considerably higher than anything on the market today."
    https://fuse.wikichip.org/news/2837/intel-spring-h...

    Let's see which we can actually buy first...

Log in

Don't have an account? Sign up now