In the continual progression of GPU technology, we've seen GPUs become increasingly useful at generalized tasks as they have added flexibility for game designers to implement more customized and more expansive graphical effects. What started out as a simple fixed-function rendering process, where texture and vertex data were fed into a GPU and pixels were pushed out, has evolved into a system where a great deal of processing takes place inside the GPU. The modern GPU can be used to store and manipulate data in ways that go far beyond just quickly figuring out what happens when multiple textures are mixed together.

What GPUs have evolved into today are devices that are increasingly similar to CPUs in their ability to do more things, while still specializing in only a subset of abilities. Starting with Shader Model 2.0 on cards like the Radeon 9700 and continuing with Shader Model 3.0 and today's latest cards, GPUs have become floating-point powerhouses that are able to do most floating-point calculations many times faster than a CPU, a necessity as 3D rendering is a very FP-intensive process. At the same time, we have seen GPUs add programming constructs like looping, branching, and other abilities previously only used on CPUs, but which are crucial to enable effective programmer use of the GPU resources . In short, today's GPUs have in many ways become extremely powerful floating-point processors that have been used for 3D rendering but little else.

Both ATI and NVIDIA have been looking to put the expanded capabilities of their GPUs to good use, with varying success. So far, the only types of programs that have effectively tapped this power other than applications and games requiring 3D rendering have also been video related, such as video decoders, encoders, and video effect processors. In short, the GPU has been underutilized, as there are many tasks that are floating-point hungry while not visual in nature, and these programs have not used the GPU to any large degree so far.

Meanwhile the academic world has been working on designing and utilizing custom-built floating-point hardware for years for their own research purposes. The class of hardware related to today's topic, stream processors, are extremely powerful floating-point processors able to process whole blocks of data at once, where CPUs carry out only a handful of numerical operations at a time. We've seen CPUs implement some stream processing with instruction sets like SSE and 3DNow!+, but these efforts still pale in comparison to what custom hardware has been able to do. This same progress was happening on GPUs, only in a different direction, and until recently GPUs remained untapped as anything other than a graphics tool.

Today's GPUs have evolved into their own class of stream processors, sharing much in common with the customized hardware of researchers, as a result of the 3D rendering process also being a streaming task. The key difference here however is that while GPU designers have cut a couple of corners where they don't need certain functionality for 3D rendering as compared to what a custom processor can do, by and large they have developed extremely fast stream processors that are just as fast as custom hardware but due to economies of scale are many, many times cheaper than a custom design.

It's here where ATI is looking for new ideas on what to run on their GPUs as part of their new stream computing initiative. The academic world is full of such ideas, chomping at the bit to run their experiments on more than a handful of customized hardware designs. One such application, and part of the star of today's announcement, is Folding@Home, a Stanford research project designed to simulate protein folding in order to unlock the secrets of diseases caused by flawed protein folding.

Comments Locked


View All Comments

  • photoguy99 - Sunday, October 1, 2006 - link

    Basic research is like that - it may takes a lot of years to benefit from it.

    Look at Einstein, his work was fundamental research but the benefits are still being realized 100 years later.

    So even if they have major breakthroughs they may be at such a foundational level that the actual cure for Alz. comes 25 years later.

    Nature of the beast.
  • JarredWalton - Sunday, October 1, 2006 - link">Published Results

    Current research includes:">Alzheimer's, Cancer, Huntington's Disease, Osteogenesis Imperfecta, Parkinson's Disease, Ribosome and antibiotics.

    And of course, there's always">the Folding@Home FAQ.

    Do they know in advance that all of the issues are related to protein folding? No, but I'd assume they have good cause to suspect it. The problem is that it takes time; breakthrough results might not materialize soon, next year, or even for 5-10 year. Should research halt just because the task is difficult? Personally, I think FAH has a far greater chance of impacting the world during my lifetime than SETI@Home.

  • Baked - Sunday, October 1, 2006 - link

    I wonder if a X1600 card will work. I've tried both the graphics and command line version of F@H on my new system but both had problem connecting to F@H server. Hopefully this new F@H version will work.
  • JarredWalton - Sunday, October 1, 2006 - link

    Next step is to extend to X1800 and probably from there to X1600. Beyond that, the G70 chips are probably the next up would be my guess.
  • smitty3268 - Sunday, October 1, 2006 - link

    I assume the new client uses the cpu + gpu, and not just the gpu? Also, it would be nice to have some sort of explanation for the poor nvidia performance in the next article. Is it just their architecture, or has Folding@Home been getting assistance from ATI and not NVidia?

    This doesn't make much sense:

    Additionally, as processors have recently hit a cap in terms of total speed in megahertz, AMD and Intel have been moving to multiple-core designs, which introduce scaling problems for the Folding@Home design and is not as effective as increasing clockspeeds.

    The Folding@Home design is quite obviously a massivly parallel design, as shown by the fact that hundreds of thousands of computers are all working on the same problem. Therefore, doubling the amount of cores would double the amount of work being done and this seems to be happening faster than the old incremental speed bumps.

    Otherwise, it was a good article.
  • z3R0C00L - Monday, October 2, 2006 - link

    It's Simple.. F@H uses Dynamic Branching Calculations. nVIDIA GPU's are technologically inferior to ATi VPU's when it comes to shading performance and branching performance.

    As such.. nVIDIA's highly mighty GeForce 7950GX2 would perform much like an ATi Radeon x1600XT. In other words.. too slow.
  • tygrus - Monday, October 9, 2006 - link

    The Nvidia FP hardware is fast enough but the overall design doesn't fit well with the software(task) design of F@H. For other tasks the Nvidia GPU's may be very fast. The next Nvidia GPU & API will hopefully be better.

    The CPU handles the data transformation and setup before sending to GPU for the accelerated portion. Then the CPU overseas the return of data from the GPU. The CPU also looks after the log, text console, disk read/write, internet upload&download, and other system ovreheads.

    More information is available from ???

    i just found a really great article which covers the public release:">


    Talk w/Vijay Pande

    ATI is currently 8X faster than Nvidia. Nvidia has our code, running it internally, hope we can close the gap. But even 4X difference is large, and ATI is getting faster all of the time.

    Lot of work goes into qualifying GPUs internally so they can run.

    Making apps like this run on a GPU requires a lot of development work. Currently, science is best served by using ATI chips. Nv may come in future.


    The CPU has to poll the GPU to find out if it finished a block and needs help (data from GPU->CPU etc). This takes a context switch and CPU time to wait for the reply (ns wait not fixed number of cycles). Any acutual work is done in the remaining time slice or more. The faster the GPU, the more it demands uses the CPU. Slow the CPU in half and you may be slowing GPU by upto half.
  • Ryan Smith - Sunday, October 1, 2006 - link

    The new client "uses" the CPU like all applications do, but the core is GPU-based, so it won't be pushing the CPU like it does on the CPU-only client, I don't know to what level that means however.

    As for the Nvidia stuff, we only know what the Folding team tells us. They made it clear that the Nvidia cards do not show the massive gains that ATI's cards do when they try to implement their GPU code on Nvidia's cards. Folding@Home has been getting assistance from Nvidia, but they also made it clear that this is something they can do without help, so the problem is in the design of the G7x in executing their code.

    As for the core stuff, this is something the Folding team explicitly brought up with us. The analogy they used is trying to bring together 2000 grad students to write a PhD thesis in 1 day, it doesn't scale like that. They can add cores to a certain point, but the returns are diminishing versus faster methods of processing. This is directly a problem for Folding@Home, which is why they are putting efforts in to stream processing, which can offer the gains they need.
  • smitty3268 - Sunday, October 1, 2006 - link

    Does the G7x have as much support for 32bit floats as ATI does? It seems like I read somewhere that one of the two had moved to 32 bit exclusively while the other was still much faster at 16/24 bit fp. Could that be why they aren't seeing the same performance from NVidia?
  • Clauzii - Monday, October 2, 2006 - link

    Probably that, and the fact that the big ATI models contain 48 shaders - pretty beefes the calculations up!

Log in

Don't have an account? Sign up now