Java Performance

The SPECjbb 2015 benchmark has "a usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations." It uses the latest Java 7 features and makes use of XML, compressed communications, and messaging with security.

We tested with four groups of transaction injectors and backends. The reason why we use the "Multi JVM" test is that it is more realistic: multiple VMs on a server is a very common practice.

The Java version was OpenJDK 1.8.0_91. We applied relatively basic tuning to mimic real-world use, while aiming to fit everything inside a server with 128 GB of RAM:

"-server -Xmx24G -Xms24G -Xmn16G -XX:+AlwaysPreTouch -XX:+UseLargePages"

The graph below shows the maximum throughput numbers for our MultiJVM SPECJbb test.

SPECJBB 2015-Multi Max-jOPS

The Critical-jOPS metric is a throughput metric under a response-time constraint.

The 8-core Tyan POWER8 server offers about 72% of the performance of the 10-core IBM S812LC. That is not too bad as the latter not only has 20% more cores, but the chip can also boost 16% higher. In total, the IBM POWER8 CPU inside the 2U S812LC offers about 45% greater processing power ("35 GHz" vs "24 GHz") and delivers about 40% better performance. So compared to the S812LC, the 1U Tyan delivers very decent performance.

But Intel is the one to beat. And by caging the POWER8 inside a 1U, performance has dropped below the power sipping (90W TDP!) Xeon E5-2640v4.

SPECJBB 2013-Multi Critical-jOPS

Meanwhile our next benchmark is a good reminder that OpenJDK 8's performance is not optimal for the POWER8. The IBM JDK (More details here) does not offer much better throughput, unless you start tuning frantically. However, it does increase the most important score, critical-jOPS, with reasonable tuning.

However, while the more powerful 2U POWER8 can still keep up with Intel's best and most expensive (only 9% slower), the frequency capped CPU inside the Tyan 1U fails to impress as it trails the less expensive and less power hungry Xeon E5-2640 v4 by a large margin.

Benchmark Configuration and Methodology Database Performance: MySQL 5.7.0
Comments Locked

28 Comments

View All Comments

  • Zzzoom - Friday, February 24, 2017 - link

    "As important as performance per watt is, several markets – HPC, Analytics, and AI chief among them – consider performance the most important metric. Wattage has to be kept under control, but that is it."

    What a load of garbage.
  • JohanAnandtech - Saturday, February 25, 2017 - link

    And now maybe some arguments that substantiate your opinion?
  • SarahKerrigan - Sunday, February 26, 2017 - link

    In HPC specifically, power consumption is a major issue. This was the entire root of the success of the Blue Gene line back in the day, and why NEC is shifting its supercomputing CPUs to progressively more efficient cores instead of higher-performance cores now (SX-9: 102.4GF/core; SX-ACE: 64GF/core.) . HPC is sensitive to running cost, and power dissipation is a critical factor in that.
  • Zzzoom - Monday, February 27, 2017 - link

    Go read the 7+ years worth of materials from the EE HPC Working Group.
  • JohanAnandtech - Wednesday, March 1, 2017 - link

    In a system with 2-4 GPUs, 512 GB of RAM, the TDP of the CPU is not a dealbreaker. I can agree that some HPC markets are more sensitive to perf/watt; but I have seen a lot of examples where raw performance per dollar was just as important.
  • Zzzoom - Wednesday, March 1, 2017 - link

    POWER8 TDP is 45W-102W higher per socket than the highest spec Xeon E5. That's 90W-204W higher per node where each node consumes 1500W-2000W, or 6-10% total on a site with a multi-million dollar power bill that went to great lengths to bring down the PUE by a similar amount. So for anyone to pick POWER8 it has to do better on energy to solution through its unique features, or be considerably cheaper (ha!). POWER8's advantage is NVLink, but TSUBAME3 going with Intel+PLX switches on top of NVLink shows that it's not that big of a deal.
    Anyway, the efficiency requirements on the CORAL procurements are pretty strict so scale-out POWER9+Volta will have to shed a lot of weight.
  • Zzzoom - Wednesday, March 1, 2017 - link

    I forgot about the memory buffers. It's even worse.
  • mystic-pokemon - Sunday, March 5, 2017 - link

    Guys, I know shit ton of stuff about a server Johan listed above. He has a point when he says Power consumption is only so much important.
    In short, when you combine all aspects to TCO model: POWER8 server delivers most optimal TCO value
    We consider all the following into our TCO model
    a) Cost of ownership of the server
    b) Warranty (Lesser than conventional server, different model of operations)
    c) What it delivers (How many independent threads (SMT8 on POWER8 remember ? 192 hardware threads), how much Memory Bandwidth (230 GBPs), how much total memory capacity in 1 server ( 1 TB with 32 GB)
    d) For a public cloud use-case, how many VMs (with x HW threads and x memory cap / bw ) can you deliver on 1 POWER8 server compared to other servers in fleet today ? Based on above stats, a lot .
    e) Data center floor lease cost in DC ( 24 of these servers in 1 Rack, much denser. Average the lease over age of server: 3 years ). This includes all DC services like aggers, connectivity and such.
    f) Cost per KWH in the specific DC ( 1 Rack has nominal power 750W)

    All this combined POWER has good TCO. Its a massively parallel server, what where major advantage comes from. Choose your workload wisely. That's why companies continue to work on it.

    I am talking about all this without actually combining with CAPI over PCIe and openCAPI. Get it ? POWER is going no where.
  • Michael Bay - Friday, February 24, 2017 - link

    I think at this point in time intel has more to fear from goddamn ARM than IBM in server space.
    Okay, maybe AMD as well.
  • JohanAnandtech - Friday, February 24, 2017 - link

    Personally I think OpenPOWER is a viable competitor, but in the right niches (In memory databases, GPU accelerated + NVlink HPC). Just don't put that MHz beast in a far too small 1U cage. :-)

Log in

Don't have an account? Sign up now