Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI

Name: Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI
Item: Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI
Author: Johan De Gelas

by Johan De Gelas on July 29, 2019 8:30 AM EST

56 Comments | Add A Comment

56 Comments

AI Is More Than Deep Learning

At a high level, while deep learning is a form of artificial intelligence, the converse isn't always true; an application implementing AI does not necessarily use deep learning. Many AI applications use “conventional statistical” or “traditional” machine learning. After all, Support Vector Machines, Logistic Regression, K-nearest, Naive Bayes, and decision trees still make a lot of sense to use in automating information classification, especially if you don’t have a lot of data.

For example, Conditional Random Field (CRF) is used in natural language processing, and a lot of recommendation engines are based upon Boltzman Machines, Alternate Least Square (ALS), and so on. Case in point: one of most demanding and unique benchmarks – our "big data" benchmark – uses an ALS algorithm as recommendation engine ("collaborative filtering").

Of course, the use of neural networks – itself a whole field of study – is booming, and their use tends to dominate the latest AI applications. Neural networks are also among the most demanding workloads, requiring lots of processing power and signing expensive (and well-published) hardware contracts. All of which contrasts heavily with logistic regression, which remains the most used machine learning method, and also happens to need much less processing.

The reason for these difference in processing power requirements is, in turn, actually pretty simple. To quote Wouter Gevaert, an AI expert at the university department I work in:

“Each Neuron in a Neural Network can be considered like a logistic regression unit. Therefore, a Neural Network is like a massive amount of logistic regressions” (When you use sigmoid as activation function)

With all of that said, however, while neural networks are the most processing-intensive of AI technologies (especially with a large number of layers), there are several traditional machine learning techniques that also require a lot of processing power. Support Vector Machines with their complex transformations also tend to require a lot of computational time, for example. And in our Spark test, the Stanford NER system is based on a supervised CRF model using a labeled collection of English data. In that test, it has to crunch through a massive amount of unstructured text – several hundreds of gigabytes.

And of course, most of the analytical queries are still written in good old SQL. For structured and semi-structured data, for OLAP cubes etc., SQL code is still prevalent. As a single SQL query is nowhere near as parallel as Neural Networks – in many cases they are 100% sequential – the CPU is best tool for the job.

So in practice, most data (pre) processing and lots of AI software is still running on a CPU. GPUs mostly run massively parallel HPC applications and neural networks, an important market to be sure, but still only a piece of the larger AI market. This is one of the reasons why NVIDIA closed on $3 Billion of datacenter revenue last year, while Intel’s datacenter group made $20 Billion. Yes, Intel’s number includes Networking and storage. Yes, that includes several other markets than HPC and Data analytics. But still, a significant part of that revenue is going to be based upon servers that store and process data for analytics.

Compounding this whole picture, however, is not just revenue, but opportunities for growth. NVIDIA has been seeing massive growth in the datacenter market, while Intel has only seen single digit growth. Customer needs are continuing to shift as new technologies become available; the battle for the data analytics market has begun, and it is intensifying.

Enter the Era of AI Convolutional, Recurrent, & Scalability

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

56 Comments

View All Comments

C-4 - Monday, July 29, 2019 - link
It's interesting that optimizations did so much for the Intel processors (but relatively less for the AMD ones). Who made these optimizations? How much time was devoted to doing this? How close are the algorithms to being "fully optimized" for the AMD and nVidia chips?
quorm - Monday, July 29, 2019 - link
I believe these optimizations largely take advantage of AVX512, and are therefore intel specific, as amd processors do not incorporate this feature.
RSAUser - Monday, July 29, 2019 - link
As quorm said, I'd assume it's due to AVX512 optimizations, the next generation of AMD Epyc CPU's should support it, and I am hoping closer to 3GHz clock speeds on the 64 core chips, since it seems the new ceiling is around the 4GHz mark for 16 all-core.

It will be an interesting Q3/Q4 for Intel in the server market this year.
SarahKerrigan - Monday, July 29, 2019 - link
Next generation? You mean Rome? Zen2 doesn't have any AVX512.
HStewart - Tuesday, July 30, 2019 - link
I believe AMD AVX 2 is dual-128 bit instead of 256bit - so AVX 512 would probably be quad 128bit .
jospoortvliet - Tuesday, July 30, 2019 - link
That’s not really how it works, in the sense that you explicitly need to support the new instructions... and amd doesn’t (plan to, as far as we know).
Qasar - Tuesday, July 30, 2019 - link
from wikipedia :
" AVX2 is now fully supported, with an increase in execution unit width from 128-bit to 256-bit. "

" AMD has increased the execution unit width from 128-bit to 256-bit, allowing for single-cycle AVX2 calculations, rather than cracking the calculation into two instructions and two cycles."
which is from here : https://www.anandtech.com/show/14525/amd-zen-2-mic...

looks like AVX2 is single 256 bit :-)
name99 - Monday, July 29, 2019 - link
Regarding the limits of large batches: while this is true in principle, the maximum size of those batches can be very large, is hard to predict (at leas right now) and there is on-going work to increase the sizes, This link describes some of the issue and what’s known:

http://ai.googleblog.com/2019/03/measuring-limits-...

I think Intel would be foolish to pin many hopes on the assumption that batch scaling will soon end the superior performance of GPUs and even more specialized hardware...
brunohassuna - Monday, July 29, 2019 - link
Some information about energy consumption would very useful in comparisons like that
ozzuneoj86 - Monday, July 29, 2019 - link
My first thought when clicking this article was how much more visibly-complex CPUs have gotten in the past ~35 years.

Compare the bottom of that Xeon to the bottom of a CLCC package 286:
https://en.wikipedia.org/wiki/Intel_80286#/media/F...

And that doesn't even touch the difference internally... 134,000 transistors to 8 million and from 16Mhz to 4,000Mhz. The mind boggles.

Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI

AI Is More Than Deep Learning

Post Your Comment

56 Comments

View All Comments

C-4 - Monday, July 29, 2019 - link

quorm - Monday, July 29, 2019 - link

RSAUser - Monday, July 29, 2019 - link

SarahKerrigan - Monday, July 29, 2019 - link

HStewart - Tuesday, July 30, 2019 - link

jospoortvliet - Tuesday, July 30, 2019 - link

Qasar - Tuesday, July 30, 2019 - link

name99 - Monday, July 29, 2019 - link

brunohassuna - Monday, July 29, 2019 - link

ozzuneoj86 - Monday, July 29, 2019 - link

Log in

Don't have an account? Sign up now