NVIDIA Publishes Statement on GeForce GTX 970 Memory Allocation
by Ryan Smith on January 24, 2015 8:00 PM ESTOn our forums and elsewhere over the past couple of weeks there has been quite a bit of chatter on the subject of VRAM allocation on the GeForce GTX 970. To quickly summarize a more complex issue, various GTX 970 owners had observed that the GTX 970 was prone to topping out its reported VRAM allocation at 3.5GB rather than 4GB, and that meanwhile the GTX 980 was reaching 4GB allocated in similar circumstances. This unusual outcome was at odds with what we know about the cards and the underlying GM204 GPU, as NVIDIA’s specifications state that the GTX 980 and GTX 970 have identical memory configurations: 4GB of 7GHz GDDR5 on a 256-bit bus, split amongst 4 ROP/memory controller partitions. In other words, there was no known reason that the GTX 970 and GTX 980 should be behaving differently when it comes to memory allocation.
GTX 970 Memory Allocation (Image Courtesy error-id10t of Overclock.net Forums)
Since then there has been some further investigation into the matter using various tools written in CUDA in order to try to systematically confirm this phenomena and to pinpoint what is going on. Those tests seemingly confirm the issue – the GTX 970 has something unusual going on after 3.5GB VRAM allocation – but they have not come any closer in explaining just what is going on.
Finally, more or less the entire technical press has been pushing NVIDIA on the issue, and this morning they have released a statement on the matter, which we are republishing in full:
The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.
We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment. The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.
Here’s an example of some performance data:
GeForce GTX 970 Performance Settings GTX980 GTX970 Shadows of Mordor
<3.5GB setting = 2688x1512 Very High
72fps
60fps
>3.5GB setting = 3456x1944
55fps (-24%)
45fps (-25%)
Battlefield 4
<3.5GB setting = 3840x2160 2xMSAA
36fps
30fps
>3.5GB setting = 3840x2160 135% res
19fps (-47%)
15fps (-50%)
Call of Duty: Advanced Warfare
<3.5GB setting = 3840x2160 FSMAA T2x, Supersampling off
82fps
71fps
>3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on
48fps (-41%)
40fps (-44%)
On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.
Before going any further, it’s probably best to explain the nature of the message itself before discussing the content. As is almost always the case when issuing blanket technical statements to the wider press, NVIDIA has opted for a simpler, high level message that’s light on technical details in order to make the content of the message accessible to more users. For NVIDIA and their customer base this makes all the sense in the world (and we don’t resent them for it), but it goes without saying that “fewer crossbar resources to the memory system” does not come close to fully explaining the issue at hand, why it’s happening, and how in detail NVIDIA is handling VRAM allocation. Meanwhile for technical users and technical press such as ourselves we would like more information, and while we can’t speak for NVIDIA, rarely is NVIDIA’s first statement their last statement in these matters, so we do not believe this is the last we will hear on the subject.
In any case, NVIDIA’s statement affirms that the GTX 970 does materially differ from the GTX 980. Despite the outward appearance of identical memory subsystems, there is an important difference here that makes a 512MB partition of VRAM less performant or otherwise decoupled from the other 3.5GB.
Being a high level statement, NVIDIA’s focus is on the performance ramifications – mainly, that there generally aren’t any – and while we’re not prepared to affirm or deny NVIDIA’s claims, it’s clear that this only scratches the surface. VRAM allocation is a multi-variable process; drivers, applications, APIs, and OSes all play a part here, and just because VRAM is allocated doesn’t necessarily mean it’s in use, or that it’s being used in a performance-critical situation. Using VRAM for an application-level resource cache and actively loading 4GB of resources per frame are two very different scenarios, for example, and would certainly be impacted differently by NVIDIA’s split memory partitions.
For the moment with so few answers in hand we’re not going to spend too much time trying to guess what it is NVIDIA has done, but from NVIDIA’s statement it’s clear that there’s some additional investigating left to do. If nothing else, what we’ve learned today is that we know less than we thought we did, and that’s never a satisfying answer. To that end we’ll keep digging, and once we have the answers we need we’ll be back with a deeper answer on how the GTX 970’s memory subsystem works and how it influences the performance of the card.
93 Comments
View All Comments
boarsmite - Sunday, January 25, 2015 - link
"...the GTX 970 has something usual going on after 3.5GB VRAM allocation – but they have not come any closer in explaining just what is going on."I think you meant UNusual?
limitedaccess - Sunday, January 25, 2015 - link
Will you be following up with Nvidia and inquiring about the behavior of other Maxwell GPUs (980m, 970m, 965m, 750)? Perhaps also how Kepler behaves as well?htwingnut - Sunday, January 25, 2015 - link
It isn't just about overall performance, but about consistent performance. Does it cause stuttering or micro-pauses? I've seen CLI and CrossFire performance with end FPS showing 60FPS+, on par with where it should be but it stuttered like it was running at 10-15FPS. Unbelievable. nVidia needs to provide the technical details to this because we're not all a bunch of laymen sheep.Gothmoth - Sunday, January 25, 2015 - link
no it does not result in any stuttering.. not that i noticed.and i have a 4k monitor and actually can make use of 4GB.
for me that sounds like a pure theoretical problem.. based on low level benchmark results.
blown out of proportion by ATI fanboys and people who have nothing better to do than to worry about a 1-3% performance difference.
someone please post here how i can make my 970 stutter with >3.5GB compared to 3 GB vram used.... im eager to test that.
Black Obsidian - Sunday, January 25, 2015 - link
Perhaps you should check out the post by nuoh_my_god, which is on the bottom of page 3 of comments as I type this. He reports stuttering and other issues, and explains under exactly what conditions it happens. Test away.D. Lister - Monday, January 26, 2015 - link
@Ryan Smith[On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and...]
I'm sorry Ryan, but that example has done more to rile up the neophytes then it has to shed a light on any potential hardware problems of the 970. Case in point, from: http://www.anandtech.com/bench/product/1068?vs=133...
Grid 2 performance
290 vs 280
[2560x1440 - Max Quality + 4x MSAA]
290: 80.2fps
280: 58.9fps
[1920x1080 - High Quality + 4x MSAA]
290: 194.6
280: 159.9
performance delta
290: 41.2% (-58.8)
280: 36.8% (-63.2)
Which is a difference of 4.4%.
Perhaps some frame rate/variance tests conducted while the VRAM usage hovers roughly between 3.25 and 3.75GB would be useful for the proverbial digging.
Galatian - Monday, January 26, 2015 - link
While I agree that is is probably a non-issue for most use cases, the fact remains that it is false advertisement. I just went over the nvidia website. They state 4GB of RAM with a bandwidth of 224 GB/s for both the 980 and 970. Now it is clear that for technical reasons the 970 actually can only achieve this bandwidth with 3,5 GB of RAM. I'm not sure how anybody can claim this is NOT false or misleading advertisementSloppySlim - Monday, January 26, 2015 - link
NVidia stepping on their D!@ks .it's a 4 gig card with 3.5 fast addressing , and .5 gig indirect translation due to the missing SMs ?
I'm guessing there's a corner case hardware/software bug that can result in 'up to'™ a 70% frame rate drop .
hopefully a firmware update can fix it , but the non-disclosure of the effect of the missing SMs leaves a sour taste .
jnieuwerth - Monday, January 26, 2015 - link
The problem is very easily observable in CoD: Advanced Warfare.. play on max settings and see what happens when you use the grapple hook over long distances (i.e. a lot of texture loading within a very short time). I didn't know what it was and thought it was just poor optimization of the PC port but now it seems my new 970 GTX was the cause. Very annoying indeed.chizow - Monday, January 26, 2015 - link
Interesting discussion, thanks for the details Ryan. I've often wondered at what cost these culled functional units extoll on overall GPU performance. As enthusiasts we often try to pinpoint and isolate performance deltas with readily known variables like clockspeeds, SPs, ROPs, TMUs, memory bus and we often see performance is not quite linear, or as expected based on these specs alone.But what about the unknowns? I guess we have a better understanding now, although I am not sure it really matters. In the end, this may help explain some of the differences in performance for fully functional SKUs based on the same ASIC, ie. GTX 480 vs 580, GTX 780 vs. 780Ti. I guess we now understand that cutting SM modules and functional units carries additional costs, which is not surprising since those vias to the crossbar would also be severed, incurring performance penalties.
In the end it sounds pretty simple without overcomplicating things: the cheaper SKUs will incur performance penalties and will be slower, if you want full performance, pay for the fully performing ASIC.
Even happier I picked up the 980 over the 970, now. anyways!