Today at the annual Hot Chips conference, AMD’s new CTO Mark Papermaster unveiled the first details about the Steamroller x86 CPU core.

Steamroller is the third instantiation of AMD’s Bulldozer architecture, first conceived in the mid-2000s and finally brought to market in late 2011. Committed to this architecture for at least one more design after Steamroller, AMD has settled on roughly yearly updates to the architecture. For 2012 we have the introduction of Piledriver, the optimized Bulldozer derivative that formed the CPU foundation for AMD’s Trinity APU. By the end of the year we’ll also see a high-end desktop CPU without processor graphics based on Piledriver.

Piledriver saw a switch to hard edge flip flops, which allowed for a considerable decrease in power consumption at the expense of careful design and validation work. Performance didn’t change, but AMD saw a 10% - 20% reduction in active power. Piledriver also brought some scheduling efficiency improvements, but prefetching and branch prediction were the two other major design improvements in Piledriver.

Steamroller is designed to keep the ball rolling. It takes fundamentals from the Bulldozer/Piledriver architectures and offers a healthy set of evolutionary improvements on top of them. In Intel speak Steamroller wouldn’t be a tick as it isn’t accompanied by a significant process change (28nm bulk is pretty close to 32nm SOI), but it’s not a tock as the architecture is mostly enhanced but largely unchanged. Steamroller fits somewhere in between those two extremes when it comes to changes. 
 

Front End Improvements

 
One of the biggest issues with the front end of Bulldozer and Piledriver is the shared fetch and decode hardware. This table from our original Bulldozer review helps illustrate the problem:
 
Front End Comparison
  AMD Phenom II AMD FX Intel Core i7
Instruction Decode Width 3-wide 4-wide 4-wide
Single Core Peak Decode Rate 3 instructions 4 instructions 4 instructions
Dual Core Peak Decode Rate 6 instructions 4 instructions 8 instructions
Quad Core Peak Decode Rate 12 instructions 8 instructions 16 instructions
Six/Eight Core Peak Decode Rate 18 instructions (6C) 16 instructions 24 instructions (6C)
 
Steamroller addresses this by duplicating the decode hardware in each module. Now each core has its own 4-wide instruction decoder, and both decoders can operate in parallel rather than alternating every other cycle. Don’t expect a doubling of performance since it’s rare that a 4-issue front end sees anywhere near full utilization, but this is easily the single largest performance improvement from all of the changes in Steamroller. 
 
The penalties are pretty obvious: area goes up as does power consumption. However the tradeoff is likely worth it, and both of these downsides can be offset in other areas of the design as you’ll soon see.

Steamroller inherits the perceptron branch predictor from Piledriver, but in an improved form for better performance (mostly in server workloads). The branch target buffer is also larger, which contributes to a reduction in mispredicted branches by up to 20%. 
 

Execution Improvements

 
AMD streamlined the large, shared floating point unit in each Steamroller module. There’s no change in the execution capabilities of the FPU, but there’s a reduction in overall area. The MMX unit now shares some hardware with the 128-bit FMAC pipes. AMD wouldn’t offer too many specifics, just to say that the shared hardware only really applied for mutually exclusive MMX/FMA/FP operations and thus wouldn’t result in a performance penalty. 
 
The reduction of pipeline resources is supposed to deliver the same throughput at lower power and area, basically a smarter implementation of the Bulldozer/Piledriver FPU. 

There’s no change to the integer execution units themselves, but there are other improvements that improve integer performance. 
 
The integer and floating point register files are bigger in Steamroller, although AMD isn’t being specific about how much they’ve grown. Load operations (two operands) are also compressed so that they only take a single entry in the physical register file, which helps increase the effective size of each RF. 
 
The scheduling windows also increased in size, which should enable greater utilization of existing execution resources. 
 
Store to load forwarding sees an improvement. AMD is better at detecting interlocks, cancelling the load and getting data from the store in Steamroller than before.
Cache Improvements & Looking Forward
Comments Locked

126 Comments

View All Comments

  • Origin64 - Thursday, August 30, 2012 - link

    Just like Phenom II was what Phenom should've been, but by then it was too late. AMD is always a generation behind.
    In the notebook market this isn't much of a problem, Intel's even further behind there, but in the mid-end desktop chips it shows. Which is a shame, because mid-end Intel is way too expensive.

    Although I still insist that this Bulldozer/steamroller/whatever architecture will have its 15 months of fame when games start running on octocores.
  • Spunjji - Thursday, August 30, 2012 - link

    Pretty much what you said at the start there. God knows I miss AMD being competitive in the CPU market, but in anything but a "value" sense I don't see them bringing that game for another 2/3 years, if ever.
  • Dracconus - Friday, November 30, 2012 - link

    If you think that AMD has "always been a generation behind" then you're seriously mistaken. The AMD Athlon 64 series STOMPED the living HELL out of the Pentium 4 series processors and cost less. AMD WAS good at single threaded applications, but they started focusing on the growth abilities of the 64 bit architecture, and lost track of what would in the end be most important. You can't fault a company for looking to the future, and attempting to expand their horizons. Had software developers thought more about the future of hardware instead of the present limitations then things would have gone in AMD's favor CONSIDERABLY.

    Intel has good processors, we'll give them that. But where they have ALWAYS lacked is price-performance ratio. They don't scale, overclock, cool, or deal with heat as well, and up until the I5 series they BARELY managed to give two shits about power consumption.
    Yes, Intel is better for RAW performance, but quite frankly, how many average gamers are going to be able ot afford a 2 thousand dollar processor just to play their favorite game in six years? NONE How many enthusiasts...plenty.
    AMD serves a greater portion of the population, and they know it. They just got freaking lazy, and it started to show.
    They have a chance to pick it back up, and it's up to them to admit they slipped up, but don't get fooled. Even IF AMD slips, they'll still have budget minded consumers worried about price to performance ratios, and will ALWAYS have customers as long as they're in business solely because of the economic standstill the world is in.
  • yankeeDDL - Tuesday, August 28, 2012 - link

    Any idea of when could the first legitimate benchmark start to surface?
    The lack of competition in the CPU market is not healthy for users, that's for sure.
    I'd love to see AMD back in the game in other areas, in addition to Netbooks (with Brazos) and Value (Llano offers pretty good bang for the bucks).
  • SpamHammer - Wednesday, August 29, 2012 - link

    I fail to see how it's been negative to users? Have you seen the cost to performance ratio of Intel's Sandy Bridge and Ivy Bridge chips?? I mean, seriously! When the Core i5 2500k came out, UN-overclocked, it was able to go toe-to-toe with the $1,000 Intel Core Extreme from the generation before! All this from a chip that runs $220?? That's insane value!

    The value is only furthered when you take into account its low thermal output, and it's high overhead for over clocking. I have mine OC'd to 4.1GHz, and that's not even beginning to stretch it. I've seen them OC'd to 4.5GHz regularly on air cooling. This isn't "hard" or "the exception to the rule"; it is the norm for these chips. And that's just Sandy Bridge! Ivy Bridge offers a 10-15% improvement right out of the box!

    Hell, the Core i3 2100, running *only* $120, despite being just a dual-core chip, is able to easily wipe the floor with even AMD's octo-core Bulldozer and Piledriver chips, in just about every gaming and synthetic benchmark, despite that chip costing nearly twice as much!
    It powers my brother's gaming PC, and he's able to run Battlefield 3 on Ultra at 1600x900 (his monitor's resolution) with 50+FPS!
    Thanks to all this "non competitive consumer screwing" you're preaching about, I was able to build his entire rig, sans monitor, for $469 shipped!
    I mean, you couldn't ask for a better time to buy new PC gear!
  • thehat2k5 - Wednesday, August 29, 2012 - link

    "Battlefield 3 on Ultra at 1600x900 (his monitor's resolution) with 50+FPS!" " for $469 shipped!" Yeah right. Maybe if there was a 75% off sale on video cards where you bought it. BF3 on ultra requires at least a $500 video card, regardless of how much you cheap out on a current cpu.
    Hell, if you came into my shop with a budget of $469 shipped, i have 7 employees that will laugh at you and kindly hand you a business card from Best Buy with two letters on the back....HP.
    that said, the best bang for the bug gaming cpu is the AMD FX4100 for about $140. Why go weak i3 dual core when you can go mid range quad from AMD for $20 more. I like your fairy tale, almost as much as I like some of the ones in the bible.
  • StevoLincolnite - Wednesday, August 29, 2012 - link

    A $500 video card just for Battlefield 3? Seriously? Lol? With that kind of ignorance, I would never wan't to buy from your shop.
  • thehat2k5 - Wednesday, August 29, 2012 - link

    If you want it running on Ultra in the middle of a firefight at min. 60fps, you bet! Considering our customers are buying 21.5" LCD's with resolutions of 1980x1050 as a minimum.
    Even our customers would laugh at the claim of BF3 on Ultra for $469. Sorry guys, i'm no AMD "fanboy", but around here we call a spade a spade. This dudes claim is fantasy based on a bath salts hallucination.
  • taltamir - Wednesday, August 29, 2012 - link

    Your customers are buying 1980x1050 resolution monitors but he explicitly stated his brother is running on a 1600x900
    Also he said 50FPS+ not 60FPS steady like you are claiming.
  • thehat2k5 - Wednesday, August 29, 2012 - link

    I would like to personally see that running in Ultra at 50fps, even on a 1600x900. If i can sell BF3 Ultra desktops for under $500, i'm going to open a few more stores and put Best Buys computer department out of business lol

Log in

Don't have an account? Sign up now