Maxwell: Designed For Energy Efficiency

While Maxwell doesn’t come with a significant overhaul of its high level feature set, the same cannot be said for the low level design of Maxwell. In fact the consistency at a high level betrays just how much work NVIDIA has done under the hood in order to improve their efficiency for Maxwell. Maxwell isn’t a complete overhaul of NVIDIA’s designs, nor is it even as aggressive as Kepler was when it eliminated Fermi’s hot clocks in favor of a wider design, but it has a number of changes that are important to understanding the architecture and more importantly understanding how NVIDIA is achieving their efficiency goals.

Broadly speaking, with Maxwell NVIDIA is almost solely focused on improving energy efficiency and performance per watt. This extends directly from NVIDIA’s mobile first design strategy for Maxwell, where the company needs to maximize energy efficiency in order to compete and win within the mobile space. If NVIDIA can bring down their energy consumption, then due to the power limiting factor we mentioned earlier they can use that recovered power overhead to further improve their performance. This again being especially noticeable in SoC-class products and discrete mobile due to the low power budgets these platforms provide.

To a lesser extent NVIDIA is also focused on space efficiency. GPU production costs and space efficiency go hand-in-hand, so there’s an interest in improving the density of their designs with Maxwell. This is especially the case when the earlier power savings allow for a wider GPU with a larger number of functional units within the same power envelope. Denser designs allow for NVIDIA to offer similar performance as larger Kepler GPUs (e.g. GK106) with a smaller Maxwell GPU.

To achieve this NVIDIA has taken a number of steps, some of which they’ve shared with us at a high level and some of which they haven’t. NVIDIA is taking a bit of a “secret sauce” approach to Maxwell from a design level, so while we know a fair bit about its execution model we don’t know quite as much about the little changes that add up to Maxwell’s energy and space savings. However NVIDIA tells us that overall they’ve been able to outright double their performance-per-watt on Maxwell versus Kepler, which is nothing short of amazing given the fact that all of this is being done on the same 28nm process as Kepler.

We’ll go over execution flow and the other gritty details on the next page, but for now let’s start with a look at NVIDIA’s Streaming Multiprocessor designs for Kepler (SMX) and Maxwell (SMM).

Immediately we can see a significant difference in the layout between the SMX and the new SMM. Whereas the SMX was for all practical purposes a large, flat design with 4 warp schedulers and 15 different execution blocks, the SMM has been heavily partitioned. Physically each SMM is still one contiguous unit, not really all that different from an SMX. But logically the execution blocks which each warp scheduler can access have been greatly curtailed.

The end result is that in an SMX the 4 warp schedulers would share most of their execution resources and work out which warp was on which execution resource for any given cycle. But on an SMM, the warp schedulers are removed from each other and given complete dominion over a far smaller collection of execution resources. No longer do warp schedulers have to share FP32 CUDA cores, special function units, or load/store units, as each of those is replicated across each partition. Only texture units and FP64 CUDA cores are shared.

Among the changes NVIDIA made to reduce power consumption, this is among the greatest. Shared resources, though extremely useful when you have the workloads to fill them, do have drawbacks. They’re wasting space and power if not fed, the crossbar to connect all of them is not particularly cheap on a power or area basis, and there is additional scheduling overhead from having to coordinate the actions of those warp schedulers. By forgoing the shared resources NVIDIA loses out on some of the performance benefits from the design, but what they gain in power and space efficiency more than makes up for it.

NVIDIA hasn’t given us hard numbers on SMM power efficiency, but for space efficiency a single 128 CUDA core SMM can deliver 90% of the performance of a 192 CUDA core SMX at a much smaller size.

Moving on, along with the SMM layout changes NVIDIA has also made a number of small tweaks to improve the IPC of the GPU. The scheduler has been rewritten to avoid stalls and otherwise behave more intelligently. Furthermore by achieving higher utilization of their existing hardware, NVIDIA doesn’t need as many functional units to hit their desired performance targets, which in turn saves on space and ultimately power consumption.

While on the subject of performance efficiency, NVIDIA has also been working on memory efficiency too. From a performance perspective GDDR5 is very powerful, however it’s also very power hungry, especially in comparison to DDR3. With GM107 in particular being a 128-bit design that would need to compete with the likes of the 192-bit GK106, NVIDIA has massively increased the amount of L2 cache they use, from 256KB in GK107 to 2MB on GM107. This reduces the amount of traffic that needs to cross the memory bus, reducing both the power spent on the memory bus and the need for a larger memory bus altogether.

Increasing the amount of cache always represents an interesting tradeoff since cache is something of a known quantity and is rather dense, but it’s only useful if there are memory stalls or other memory operations that it can cover. Consequently we often see cache implemented in relation to whether there are any other optimizations available. In some cases it makes more sense to use the transistors to build more functional units, and in other cases it makes sense to build the cache. After staying relatively stagnant on their cache sizes for so long, it looks like the balance has finally shifted and the cache increase makes the most sense for NVIDIA.

Of course even these changes are relatively high level from an ASIC perspective. There’s always the possibility for low-level changes and NVIDIA has followed through on these too. Case in point, both NVIDIA and AMD have been steadily improving their clock gating capabilities, and with Maxwell NVIDIA has taken another step in their designs. NVIDIA isn’t telling us just how fine grained their gating is now for Maxwell, but it’s a finer granularity than it was on Kepler. Given the new SM design, the most likely change was likely the ability to control the individual partitions and/or the functional units within those partitions, but this is just supposition on our part.

Finally there’s the lowest of low level optimizations, which is transistor level optimizations. Again NVIDIA hasn’t provided a ton of details here, but they tell us they’ve gone through at the transistor level to squeeze out additional energy efficiency as they could find it. Given that TSMC 28nm is now a very mature process with well understood abilities and quirks, NVIDIA should be able to design and build their circuits to a tighter tolerance now than they would have been able to when working on GK107 over 2 years ago.

Maxwell’s Feature Set: Kepler Refined GeForce GTX 750 Ti & GTX 750 Specifications & Positioning
Comments Locked


View All Comments

  • EdgeOfDetroit - Tuesday, February 18, 2014 - link

    This card (Evga 750 Ti OC) is replacing a 560Ti for me. Its slower but its not my primary game machine anymore anyways. I'll admit I was kinda bummed when the 700 series stopped at the 760, and now that the 750 is here, its like they skipped the true successor to the 560 and 660. I can probably still get something for my 560Ti, at least.
  • rhx123 - Tuesday, February 18, 2014 - link

    I wonder if we'll get the 750Ti or even the 750 in a half height config.

    It would be nice for HTPCs given the power draw, but I'm not optimistic.
    There's still nothing really decent in the half height Nvidia camp.
  • Frenetic Pony - Tuesday, February 18, 2014 - link

    "it is unfortunate, as NVIDIA carries enough market share that their support (or lack thereof) for a feature is often the deciding factor whether it’s used"

    No this time. Both the Xbone and PS4 are fully feature compliant, as is GCN 1.1 cards, heck even GCN 1.0 has a lot of the features required. With the new consoles, especially the PS4, selling incredibly well these are going to be the baseline, and if you buy a NVIDIA card without it, you be SOL for the highest end stuff.

    Just another disappointment with Maxwell, when AMD is already beating Nvidia price for performance wise very solidly. Which is a shame, I love their steady and predictable driver support and well designed cooling set ups. But if they're not going to compete, especially with the rumors of how much Broadwell supposedly massively improves on Intel's mobile stuff, well then I just don't know what to say.
  • Rebel1080 - Tuesday, February 18, 2014 - link

    Can we all come to a consensus by declaring the 8th console generation an a epic bust!!! When the Seventh console generation consoles (PS3/XB360) made their debut it took Nvidia and AMD 12-18 months to ship a mainstream GPU that could match or exceed thier performance. This generation it only took 3 months at 2/3rds the price those cards sold at (3870/8800GT).

    It's pretty condemning that both Sony and MSFT's toy boxes are getting spanked by $119-149 cards. Worst of all the cards are now coming from both gpu companies for which I'm sure gives Nvidia all smiles.
  • FearfulSPARTAN - Tuesday, February 18, 2014 - link

    Really an epic bust.... Come on now we all knew from the start they were not going to be bleeding edge based on the specs. They were not going for strong single threaded performance they were aiming for well threaded good enough cpu performance and the gpus they had were average at their current time. However considering the ps4 and x1 are selling very well calling the entire gen a bust already is just stupid. You dont need high performance for consoles when you have developers coding to scrape every bit of performance they can out of your hardware, thats something we dont have in the pc space and why most gamers are not using those cards that just met last gen console performance seven years ago.
  • Rebel1080 - Tuesday, February 18, 2014 - link

    They're selling well for the same reasons iTards keep purchasing Apple products even though they only offer incremental updates on both hardware and less on software. It's something I like to call "The Lemming Effect".

    Developers code to the metal but that only does so much and then you end up having to compromise the final product via lower res, lower fps, lower texture detail. Ironcially I was watching several YouTube videos of current gen games (BF3&4, Crysis 3, Grid 2, AC4) running at playable fps between 720p & 900P on a Radeon 3870.
  • oleguy682 - Tuesday, February 18, 2014 - link

    Except that unlike Apple, Sony and Microsoft are selling each unit at a loss once the BOM, assembly, shipping, and R&D are taken into consideration. The PS3 was a $3 billion loss in the first two years it was available. The hope is that licensing fees, add-ons, content delivery, etc. will result in enough revenue to offset the investment, subsidize further R&D, and leave a bit left over for profit. Apple, on the other hand, is making money on both the hardware and the services.

    And believe it or not, there are a lot more console gamers than PC gamers. Gartner estimates that in 2012, PC gaming made up only $14 billion of the $79 billion gaming market. This does include hardware, in which the consoles and handheld devices (likely) get an advantage, but 2012 was before the PS4 and Xbone were released.

    So while it might be off-the-shelf for this generation, it was never advertised as anything more than a substantial upgrade over the previous consoles, both of which were developed in the early 2000s. In fact, they were designed for 1080p gaming, and that's what they can accomplish (well, maybe not the Xbone if recent reports are correct). Given that 2160p TVs (because calling it 4K is dumb and misleading) are but a pipe dream for all but the most well-heeled of the world and that PCs can't even come close to the performance needed to drive such dense displays (short of spending $1,000+ on GPUs alone), there is no need to over-engineer the consoles to do something that won't be asked of them until they are near EOL.
  • Rebel1080 - Tuesday, February 18, 2014 - link

    PC Gaming is growing faster globally than the console market because purchasing consoles in many nations is extremely cost prohibitive due to crushing tariffs. Figure that in 3yrs time both Intel and AMD will have IGPs that will trounce the PS4 and will probably sell for under $99 USD. PC hardware is generally much more accessible to people living in places like Brazil, China and India compared to consoles. It would actually cost less to build a gaming PC if you live there.

    The console market is the USA, Japan and Western Europe, as the economies of these nations continue to decline (all 3 are still in recession) people who want to game without spending a ton will seek lower cost alternatives. With low wattage cards like the 750Ti suddenly every Joe with a 5yr old Dell/HP desktop can now have console level gaming for a fraction of the cost without touching any of his other hardware.
  • Rebel1080 - Tuesday, February 18, 2014 - link
  • oleguy682 - Wednesday, February 19, 2014 - link

    Brazil is only Brazil. It does not have any bearing on China or India or any other developing nation as they all choose their own path on how they tax and tariff imports. Second, throwing a 750Ti into a commodity desktop (the $800-1,200 variety) from 3 years ago, let alone 5, is unlikely to result in performance gains that would turn it into a full-bore 1080p machine that can run with the same level of eye-candy as a PS4 or XBone. The CPU and memory systems are going to be huge limiting factors.

    As far as the PC being a faster growing segment, the Gartner report from this fall thinks that PC gaming hardware and software will rise from the 2012 baseline of 18.3% of spending to 19.4% of spending in 2015. So yes, it will grow, but it's such a small share already that it barely does anything to move the needle in terms of where gaming goes. In contrast, consoles are expected to grow from 47.4% to 49.6% of spending. The losing sectors are going to be handheld gaming, eaten mostly by tablets and smartphones. PCs aren't dying, but they aren't thriving, regardless of what Brazil does with PS4 imports in 2014.

Log in

Don't have an account? Sign up now