Intel has recently updated its developer documentation for instruction set extensions, and in the process has disclosed information on both new instructions for and the codename of its next-generation low-power processor microarchitecture. Dubbed "Tremont", the forthcoming processor core look to replace Goldmont Plus in the upcoming Atom, Celeron, and Pentium Silver-branded SoCs.

According to the Intel Architecture Instruction Set Extensions (ISE) and Future Features Programming Reference document, the Goldmont Plus microarchitecture will not be the end of the road for Intel’s low-cost/low-power cores. In the coming years it will be succeeded by the codenamed Tremont microarchitecture and its successors. On the manufacturing side of matters, nothing has officially been disclosed, but right now our suspicion is that processors based on the Tremont will be made using the company’s 10 nm process technology. To date we haven't seen Intel use their enhanced “+” and “++” 14nm process technologies to make SoCs for entry-level and energy-efficient PCs - as the original 14nm provides better density - so it seems unlikely that Intel would start now.

A key question about the Tremont is what architecturaly improvements it will bring. While Intel's document does specify the new instructions, it doesn't offer any general architectural insight. Intel's general trend thus far since Silvermont has been to gradually widen their out-of-order execution design, starting with two-way, moving to three-way (Goldmont), and then to a three-way front-end plus a four-way allocation and retirement backend. So it may be that we see Intel go this route, as they already have a number of tricks left in their bag from Core, and it meshes well with the high density aspects of their 10nm processes, which favors more complex processors.

As for the ISE improvements, Intel’s Tremont will feature CLWB, GFNI (SSE-based), ENCLV, and Split Lock Detection instruction set extensions, which are also set to arrive with Intel’s Ice Lake processors. Also set to arrive with Tremont will be CLDEMOTE, direct store, and user wait instructions (see details in the table below). Unlike the earlier instructions, these are unique to Tremont and are not scheduled to be supported by the Ice Lake (or other documented Intel’s cores).

New Instruction Set Extensions of Goldmont Plus and Tremont CPUs
  Instruction Purpose Description
Goldmont Plus PTWRITE

Write Data to a Processor Trace Packet
Debugging Unclear.
UMIP

User-Mode Instruction Prevention
Security Prevents execution of certain instructions if the Current Privilege Level (CPL) is greater than 0. If these instructions were executed while in CPL > 0, user space applications could have access to system-wide settings such as the global and local descriptor tables, the task register and the interrupt descriptor table.
RDPID

Read Processor ID
General Quickly reads processor ID to discover its feature set and apply optimizations/use specific code path if possible.
Tremont CLWB

Cache Line
Write Back
Performance Writes back modified data of a cache line similar to CLFLUSHOPT, but avoids invalidating the line from the cache (and instead transitions the line to non-modified state). CLWB attempts to minimize the compulsory cache miss if the same data is accessed temporally after the line is flushed if the same data is accessed temporally after the line is flushed.
GFNI (SSE) Security SSE-based acceleration of Galois Field Affine Transformation alghorithms.
ENCLV Security Further enhancement of SGX version 1 capabilities.
CLDEMOTE Performance Enables CPU to demote a cache line with a specific adress from the nearest cache to a more distant cache without writing back to memory. Speeds up access to this line by other cores within a CPU.
Direct stores: MOVDIRI, MOVDIR64B Performance  
User wait: TPAUSE, UMONITOR, UMWAIT Power Direct CPU to enter certain stages before an event happens.
Split Lock Detection    
Source: Intel Architecture Instruction Set Extensions and Future Features Programming Reference (pages 12 and 13)

The fact that Intel is readying its “Future Tremont and later” microarchitectures reveals that even after the company withdrew from smartphone SoCs, it sees plenty of applications that could use its low-power/low-cost Atom cores. There is sitll a notable market for budget PCs as well as embedded and semi-embeded markets for items like IoT edge devices, all of which Intel intends to continue serving with the line of smaller, cheaper cores. Meanwhile, consistent ILP and performance improvements as well as introduction of new ISEs to these microarchitectures show that Intel wants these cores to offer competitive performance to other low-cost processors, while still maintaining near feature set parity to Intel's high-performance cores.

Related Reading

Sources: Intel, WikiChip

POST A COMMENT

56 Comments

View All Comments

  • StevoLincolnite - Monday, April 23, 2018 - link

    The entire tablet market is in decline and has been for almost 2 years straight.

    2 in 1's seem to have picked up some of the slack though, people are pretty content with the amount of capability on offer for something as simple as Facebook.
    Reply
  • Speedfriend - Tuesday, April 24, 2018 - link

    The entire tablet market isn't in decline. Per IDC, detachable tablets (which are mostly) windows based, started growing again in the second half of last year. I have started to notice a lot more people at work now moving from laptops to detachable tablets at work for their portables. My surface pro is a fantastic computer to have and I am unlikely to buy a laptop personally again. Reply
  • PeachNCream - Tuesday, April 24, 2018 - link

    Agreed with this. Detachables are a growth segment, but a lot of that growth is due to the fact that the segment enjoyed very few sales so even a comparably small numeric increase will seem like a substantial percentage growth rate. That said, I think the tablet market as a whole has reached a saturation point and further sales are generally going to customers purchasing replacements rather than customers entering the market for the first time. There's also a perception among consumers that there hardware, particularly ARM-based platforms like the iPad and various Android variants, aren't offering big enough improvements over previous generations that therefore justify a purchase. I'd say we're looking at stagnation and stability at the moment and suggest taking a wait-and-see approach before putting netbook-style nails into the tablet coffin. Reply
  • name99 - Tuesday, April 24, 2018 - link

    iPad sales since 2015 are best characterized as flat, not declining.
    https://www.statista.com/statistics/269915/global-...

    There'll presumably be a bump this year from the new (very nicely priced) $330 iPad.
    There may also be bumps (or declines...) depending on whether Apple does or does not update the Pro line this year, and what they do with the mini (revamp it? cancel it?)
    Reply
  • HStewart - Monday, April 23, 2018 - link

    One thing that is interest is the MOVDIR* instructions, this deal with automously storing 64 bytes of memory fast - why would this be used - well 64 bytes is same thing as 512 bits - maybe used in AVX512 or possibly some kind of encryption algorithm. or a way to quickly past data between two cpu's in multi-cpu setup. Reply
  • mode_13h - Monday, April 23, 2018 - link

    Or they just want to avoid a cache miss in the target by copying entire cachelines at a time. Reply
  • satai - Tuesday, April 24, 2018 - link

    Can ve get big.LITTLE style design, where core architecture successor can trade some power efficency for some raw single-thread power? Reply
  • name99 - Tuesday, April 24, 2018 - link

    big.LITTLE becomes problematic when the two cores are not identical in various ways.
    One obvious way they need to be identical is the instruction set. Another less obvious way is in some of the cache details (eg protocol, cache line lengths).

    But if there is one thing Intel loves, it is making incomprehensible minor differences to every damn CPU they ship. Which means it's not clear to me if they have ANY pair of cores that actually form a usefully matched big.LITTLE set.
    Reply
  • iwod - Tuesday, April 24, 2018 - link

    The Goldmount + is getting very close to Sandy Bridge IPC, I am hoping this is another step closer.

    Does any one have figure for Goldmount + and Kaby Lake or Sandy Bridge transistor or die size?
    Reply
  • dealcorn - Tuesday, April 24, 2018 - link

    Is Tremont vulnerable to branch predictor issues? Reply

Log in

Don't have an account? Sign up now