Maxwell: Designed For Energy Efficiency

While Maxwell doesn’t come with a significant overhaul of its high level feature set, the same cannot be said for the low level design of Maxwell. In fact the consistency at a high level betrays just how much work NVIDIA has done under the hood in order to improve their efficiency for Maxwell. Maxwell isn’t a complete overhaul of NVIDIA’s designs, nor is it even as aggressive as Kepler was when it eliminated Fermi’s hot clocks in favor of a wider design, but it has a number of changes that are important to understanding the architecture and more importantly understanding how NVIDIA is achieving their efficiency goals.

Broadly speaking, with Maxwell NVIDIA is almost solely focused on improving energy efficiency and performance per watt. This extends directly from NVIDIA’s mobile first design strategy for Maxwell, where the company needs to maximize energy efficiency in order to compete and win within the mobile space. If NVIDIA can bring down their energy consumption, then due to the power limiting factor we mentioned earlier they can use that recovered power overhead to further improve their performance. This again being especially noticeable in SoC-class products and discrete mobile due to the low power budgets these platforms provide.

To a lesser extent NVIDIA is also focused on space efficiency. GPU production costs and space efficiency go hand-in-hand, so there’s an interest in improving the density of their designs with Maxwell. This is especially the case when the earlier power savings allow for a wider GPU with a larger number of functional units within the same power envelope. Denser designs allow for NVIDIA to offer similar performance as larger Kepler GPUs (e.g. GK106) with a smaller Maxwell GPU.

To achieve this NVIDIA has taken a number of steps, some of which they’ve shared with us at a high level and some of which they haven’t. NVIDIA is taking a bit of a “secret sauce” approach to Maxwell from a design level, so while we know a fair bit about its execution model we don’t know quite as much about the little changes that add up to Maxwell’s energy and space savings. However NVIDIA tells us that overall they’ve been able to outright double their performance-per-watt on Maxwell versus Kepler, which is nothing short of amazing given the fact that all of this is being done on the same 28nm process as Kepler.

We’ll go over execution flow and the other gritty details on the next page, but for now let’s start with a look at NVIDIA’s Streaming Multiprocessor designs for Kepler (SMX) and Maxwell (SMM).

Immediately we can see a significant difference in the layout between the SMX and the new SMM. Whereas the SMX was for all practical purposes a large, flat design with 4 warp schedulers and 15 different execution blocks, the SMM has been heavily partitioned. Physically each SMM is still one contiguous unit, not really all that different from an SMX. But logically the execution blocks which each warp scheduler can access have been greatly curtailed.

The end result is that in an SMX the 4 warp schedulers would share most of their execution resources and work out which warp was on which execution resource for any given cycle. But on an SMM, the warp schedulers are removed from each other and given complete dominion over a far smaller collection of execution resources. No longer do warp schedulers have to share FP32 CUDA cores, special function units, or load/store units, as each of those is replicated across each partition. Only texture units and FP64 CUDA cores are shared.

Among the changes NVIDIA made to reduce power consumption, this is among the greatest. Shared resources, though extremely useful when you have the workloads to fill them, do have drawbacks. They’re wasting space and power if not fed, the crossbar to connect all of them is not particularly cheap on a power or area basis, and there is additional scheduling overhead from having to coordinate the actions of those warp schedulers. By forgoing the shared resources NVIDIA loses out on some of the performance benefits from the design, but what they gain in power and space efficiency more than makes up for it.

NVIDIA hasn’t given us hard numbers on SMM power efficiency, but for space efficiency a single 128 CUDA core SMM can deliver 90% of the performance of a 192 CUDA core SMX at a much smaller size.

Moving on, along with the SMM layout changes NVIDIA has also made a number of small tweaks to improve the IPC of the GPU. The scheduler has been rewritten to avoid stalls and otherwise behave more intelligently. Furthermore by achieving higher utilization of their existing hardware, NVIDIA doesn’t need as many functional units to hit their desired performance targets, which in turn saves on space and ultimately power consumption.

While on the subject of performance efficiency, NVIDIA has also been working on memory efficiency too. From a performance perspective GDDR5 is very powerful, however it’s also very power hungry, especially in comparison to DDR3. With GM107 in particular being a 128-bit design that would need to compete with the likes of the 192-bit GK106, NVIDIA has massively increased the amount of L2 cache they use, from 256KB in GK107 to 2MB on GM107. This reduces the amount of traffic that needs to cross the memory bus, reducing both the power spent on the memory bus and the need for a larger memory bus altogether.

Increasing the amount of cache always represents an interesting tradeoff since cache is something of a known quantity and is rather dense, but it’s only useful if there are memory stalls or other memory operations that it can cover. Consequently we often see cache implemented in relation to whether there are any other optimizations available. In some cases it makes more sense to use the transistors to build more functional units, and in other cases it makes sense to build the cache. After staying relatively stagnant on their cache sizes for so long, it looks like the balance has finally shifted and the cache increase makes the most sense for NVIDIA.

Of course even these changes are relatively high level from an ASIC perspective. There’s always the possibility for low-level changes and NVIDIA has followed through on these too. Case in point, both NVIDIA and AMD have been steadily improving their clock gating capabilities, and with Maxwell NVIDIA has taken another step in their designs. NVIDIA isn’t telling us just how fine grained their gating is now for Maxwell, but it’s a finer granularity than it was on Kepler. Given the new SM design, the most likely change was likely the ability to control the individual partitions and/or the functional units within those partitions, but this is just supposition on our part.

Finally there’s the lowest of low level optimizations, which is transistor level optimizations. Again NVIDIA hasn’t provided a ton of details here, but they tell us they’ve gone through at the transistor level to squeeze out additional energy efficiency as they could find it. Given that TSMC 28nm is now a very mature process with well understood abilities and quirks, NVIDIA should be able to design and build their circuits to a tighter tolerance now than they would have been able to when working on GK107 over 2 years ago.

Maxwell’s Feature Set: Kepler Refined GeForce GTX 750 Ti & GTX 750 Specifications & Positioning
Comments Locked


View All Comments

  • TheJian - Wednesday, February 19, 2014 - link

    You are only able to say AMD is winning with 265 because of magical pricing that probably won't exist, just like 290x/290 are not $550/400, which we now know are really $709/550 (amazon's lowest pricing, which is below newegg on 290, about even on 290x - amazon has 290x in stock). At least most are out of stock now, so maybe they're selling better or it really is just a shortage of chips like PNY says. I'm guessing the shortage of chips that can run 1ghz is causing problems and higher pricing on 290/290x, not selling like crazy. If they were selling like crazy to miners etc AMD would have had a quarter like NV had where GPU revenues rose 14% on the backs of HIGH-END gpu sales in a 11% down PC market. The 290/290x are AMD's high end, yet selling out means zero profits? So High-End isn't selling much and is just a shortage of 1ghz chips then right? 10mil console chips were all of AMD's gpu profits (10mil x $12 each=120mil pretty much exactly AMD's profits).

    I don't see how Anandtech etc can say crypto mining is insane, when AMD's quarterly report shows miners must not be buying them much at all after the first rush at launch. Otherwise they would have made more than just console money. AMD said they get low double digits on consoles now (so not mid which would be 15%, if closer to 15% profits would be like 150mil), which is 8.2mil units already purchased in retail consoles, and another ~2mil in transit or already in MS/Sony being boxed up for more retail boxes. AMD gets their money long before we see it on the shelf in a console (hence the ~2mil in transition). MS/Sony don't pay AMD AFTER the console sales, they pay them BEFORE it gets anywhere near a box on a shelf. So AMD has already sold more than the shelf sales show.

    On the flipside, NV has a quarter with ~$145mil in profits and basically ALL of that is from GPU's. So again, if AMD is selling out (in any volume that is) how come they didn't have $240mil in profits or something like this? Why no profits from GPU's showing up? There must be a real shortage that isn't due to them selling out like crazy, but instead due to manufacturing chips that can do 1ghz without throttling. This is the ONLY assumption that fits the financial data. Miners are NOT buying these in massive quantities. AMD just can't make enough to satisfy anyone, thus the price goes through the roof and companies like PNY say they can't get chips. In turn this causes AMD's MSRP to basically be magical fairy dust pricing which may not be REALITY for many months to come :( Your price perf story doesn't fit. AMD is winning nothing.

    Let me know when you can buy a 290 or 290x for $400 or $550. Let me know when you can buy a 265 for $149. This may turn out to be real for 265 but it isn't now and Anandtech shouldn't be comparing cards that don't even exist yet and pricing is unknown. I mean the reviews of 290/290x said "these are awesome buys", blah blah, but at $709 for 290x isn't it a terrible deal with 780TI OC models going for the same exact price but winning by 20%?
    OC 290x vs. OC 780ti.
    "The current street price of the ASUS R9 290X DirectCU II OC is $699.99 at several etailers (if you can find it in stock), representing a significant bump from its MSRP of $569.99. If you were to compare the two cards at MSRP, then the 20% performance difference between these could easily be accounted for with the 20% difference in price. However, at the current street pricing, the MSI GeForce GTX 780 Ti GAMING 3G simply slaps the ASUS R9 290X DirectCU II OC around with a large rainbow trout."

    Slaps AMD around like a rainbow trout? OK, OC & price contest settled then. Custom cooling won't magically trump 780TI and MSRP for 290x of $550 and reality for a card that can actually do AMD's magical ref speeds and up is a $150 difference. I don't know why anyone would buy a 265 for compute (or any card in this category, stupid to benchmark this for these low end models), so maybe AMD will actually get to $150 on them. But giving reviews based on pricing we now know may not happen on cards that aren't even available yet vs. a HARD LAUNCH with OC models already out far above what is tested here by anandtech is a bit of a pipe dream at best.
    "In short, you'll have to pardon our skepticism that Radeon R7 265 will show up on time and at the price point AMD is claiming. We've seen fingers pointed at gun-shy add-in board partners, performance-thirsty cryptocurrency miners, price-gouging retailers, and foundries unable to keep up with supply. But at the end of the day, we're left wondering why AMD is setting prices if it can't control what you pay for its hardware? After piling praise onto the Radeon R9 280X at $300 and 290X at $550, it's our credibility on the line now, and we've been burnt too many times to give you guidance on a card you can't buy yet."

    Just one of 3-4 of their paragraphs outlining the pricing problems and AMD's magical prices :) You get a whole page dedicated to AMD's pricing issues at tomshardware...LOL. Tomshardware is worried about credibility claiming AMD's pricing is real.

    Even anandtech says it's probably magical pricing, so why compare the 265 to 750ti as if 265 will actually be $149?
    "but unless something changes to bring the other Pitcairn cards back down to their MSRPs, then $149 for 265 may be an unreasonable expectation"

    "The lack of selection has done no favors for the pricing, leading to 260 prices starting at $125. This is $15 above MSRP – a significant difference for this segment of the market – and just a stone’s throw away from the 260X at current prices."

    More comments about lack of 250's etc also. AMD can't seem to put a card out at MSRP. How do you come to the conclusion AMD wins at price perf, when no card is MSRP? Reality check please pal. If 260x is supposed to be $120 for new MSRP how is this possible given we already have regular 260 at $125? Again magical pricing is used for your statements not REALITY.

    Current pricing on 290x/780TI are the same, and 780TI smoke it slapping it around like a rainbow trout. :) Not sure how you get AMD is winning from all of these comments. Yeah if you include magical pricing that may ONE DAY exist, but not REALITY for right now. You keep living in your fantasy world, I'll just stay in reality thanks. I fail to see how AMD will be able to keep up with R&D NV is clearly investing in GPU's. I'm not sure why anandtech even bothered to benchmark the ref design, when they admit NOBODY will be shipping them.
    "NVIDIA’s partners will be launching with custom cards from day-one, and while NVIDIA has put together a reference board for testing and validation purposes, the partners will not be selling that reference board. Instead we’ll be seeing semi-custom and fully-custom designs; everyone has their own cooler, a lot of partners will be using the NVIDIA reference PCB, and others will be rolling out their own PCBs too."

    So why test them? That isn't reality as they clearly point out.
    I don't get it. Further showing their AMD love they left the Zotac out of most high end benchmarks and only used ref. What? After clearly stating REF won't even be sold why KEY on REF designs in your benchmarks? Oh right, they only have an AMD portal on anandtech...Never mind...LOL. I get it. :)
    For $5-10 more over a REAL MSRP on 750ti you get 1176/1255 or 1202/1281 which are both WAY over stock.

    If maxwell is designed for mobile gaming so "who cares" then AMD is designed for compute/mining crap that has just about NOTHING to do with gaming so "who cares" too right? I mean if you want to win synthetic crap buy AMD. If you want to win in gaming buy NV. It would seem NV has the right idea for their audience. Also I wasn't aware Intel has ever put anything out in GPU that is damned good. LOL. They can't even catch BROKE AMD's Kaveri.

    You're basing your price perf comment on pricing that is not REAL. Get back to us when AMD puts out something that sells at their MSRP. Until then their PRICE is FAKE, and price to performance crap is meaningless unless you talk in terms of REAL pricing. In which case NV looks great as hardocp shows.

    "On a pure price/performance basis, the GTX 750 series is not competitive. If you’re in the sub-$150 market and looking solely at performance, the Radeon R7 260 series will be the way to go."
    Fantasy pricing makes this comment moot.

    "With that said however, we will throw in an escape clause: NVIDIA has hard availability today, while AMD’s Radeon R7 265 cards are still not due for about another 2 weeks. Furthermore it’s not at all clear if retailers will hold to their $149 MSRP due to insane demand from cryptocoin miners; if that happens then NVIDIA’s competition is diminished or removed entirely, and NVIDIA wins on price/performance by default."

    That comment is REALITY. We know they won't be MSRP if recent history is any indication. They have been so wrong that tomshardware can't recommend anything on MSRP now. Anyone making comparisons on MSRP for AMD at this point is not credible. Hardocp, tomshardware etc all note it's currently FAKE pricing. Until that changes any site should be writing reviews with REAL pricing in the recommendations/conclusions and people like you should just avoid using phrases like "price performance" ;) AMD is losing everything on price performance when using REAL pricing. I don't know why anyone even quotes MSRP. It's merely a SUGGESTED price. You should just quote the lowest newegg or amazon price as that is the lowest you can get MSRP or not. IF it's NOT a hard launch so you can get that pricing, you shouldn't be reviewing something (since I can't REALLY buy it and have no idea of the REAL price). Get it?

    NV needs to say Titan Black is MSRP of $550 I guess and start acting like AMD. Reviews would have to be written with MSRP conclusions for them then too right? Has anyone gotten a 290 for $400 or 290x for $550? Doubtful. Time for NV to join the lying game and soft launches with pricing that may not come for months?

    From anandtech's 265 article:
    "Finally, for the time being NVIDIA’s competition is going to be spread out, leaving a lack of direct competition for AMD’s latest arrivals. With the retirement of GTX 650 Ti Boost, NVIDIA doesn’t currently have a product directly opposite 265 at $149, with GTX 660 well above it and GTX 650 Ti well below it. On the other hand, NVIDIA’s closest competition for 260 as it stands is GTX 650 Ti, though this would shift if 260X cards quickly hit their new $119 MSRP."

    Totally false considering all AMD pricing is fake right? If 260x QUICKLY hits $119 new MSRP? ROFL. Until they HIT that pricing shut up please. NV doesn't have a direct competitor to $149 265? Yeah because it won't REALLY be $149...LOL. 270 shows it, 260 shows it, 290, 290x, 280, 280x...jeez. Does AMD have a card that really is MSRP? So anandtech is comparing the NV stack to a magically priced fairy dust AMD stack right? That seems a bit unfair considering all the data on AMD pricing currently and EVERY site commenting on it. They continue to heap praise on AMD in reviews based on fake pricing. You can't write a whole page on how bad AMD's pricing situation is then go ahead and write conclusions and recommendations on that fake pricing as if it is real or will be real at some point MAYBE. Misleading the public at best, which is why tomshardware/hardocp etc backed off now. NV wins by default, until AMD puts out a real MSRP card. Anandtech still gives AMD the benefit of the doubt giving NV an escape clause...It should be the other way around. AMD needs that clause as no card they've released recently is MSRP.
  • ninjaquick - Thursday, February 20, 2014 - link

    AMD's MSRP is very real, the issue is supply cannot meet demand. AMD only sees the money they make selling cards to vendors, if they had direct supply control (AMD badged products) they could be making money hands over fists with the inflation, but AMD isn't in the position to do that. They sell their chipsets at the stipulated pricepoint to their vendors, and the vendors then either pass on the savings, or squeeze supply and drive up prices for retailers. If they pass on the savings, then ultimately retailers are capitalizing on the consumer's willingness to pay more, but AMD, ultimately, will see no profits from this, and their sales will be hurt by lower flow due to high prices.

    Plenty of people bought 2XX series cards for their MSRP before LiteCoin made the prices skyrocket. To date, only the 290 series is still hyper inflated. And the MSRPs being determined there are from the vendors, not AMD.
  • chiechien - Friday, February 28, 2014 - link

    The 280x are still priced 50% to 100% over MSRP, too. They're supposed to be $300, but you can't find cheaper than $450, with $5-600 quite common. The R9 270x runs about $50-$100 over MSRP (25-50%).
  • Zetbo - Thursday, February 20, 2014 - link

    I just bought 4096MB Asus Radeon R9 290 DirectCU II OC Aktiv PCIe 3.0 x16 for 397,53eur. I think it's fair price.
  • vision33r - Sunday, March 9, 2014 - link

    That tiny fraction currently buys more GPUs than the avg consumer. Thus the demand for AMD's high end GPUs.
  • A5 - Tuesday, February 18, 2014 - link

    If you really need full FP64, get whoever is paying you to buy a Tesla card.
  • extide - Tuesday, February 18, 2014 - link

    Or go with a big-GCN card :)
  • A5 - Tuesday, February 18, 2014 - link

    Or that, assuming your code isn't locked in to CUDA.
  • ddriver - Tuesday, February 18, 2014 - link

    Thank god it is not. Running about 50 TFLOPS here, nice cheap radeons, no tesla overpriced junk thank you very much nvidia.
  • Morawka - Tuesday, February 18, 2014 - link

    where you buying your radeons? they are overpriced price gouged to hell, Steaming hot thermals but sure it does fp 64 great go get em tiger!!!

Log in

Don't have an account? Sign up now