02:34PM EDT - Xilinx, the manufacturer of FPGAs, announced its new Versal AI engine last year as a way of moving FPGAs into the AI domain. This talk is set to expand on those announcements.

02:36PM EDT - Xilinx device categories: FPGA, SoC, ACAP. Versal is ACAP

02:37PM EDT - ACAP = Adaptive Compute Acceleration Platform

02:37PM EDT - Scalar processors, programmable logic, AI engines/DSP engines, onboard networking

02:38PM EDT - Currently shipping samples to early customers

02:38PM EDT - TSMC 7nm, 37B transistors, 855 Mb on-die memory, 400 engine cores, 785 IOs, 44 SerDes

02:39PM EDT - Versal NOC - vertical NOC and horizontal NOC

02:39PM EDT - Packetized NOC, Aggressive clock gating

02:40PM EDT - >1 Tbps bidrectional bandwidth per row, >0.5 Tbps bidirectional bandwidth per column

02:40PM EDT - Always have virtual channels in the NOC

02:40PM EDT - Compute acceleration is all about data movement

02:41PM EDT - Unified Memory Subsystem

02:43PM EDT - 2nd gen CCIX

02:44PM EDT - Coherent Home Node, and L2 cache

02:45PM EDT - CCIX ESM - supports PCIe Gen 5 x16

02:45PM EDT - Versal Processor System

02:45PM EDT - in all Versal devices

02:46PM EDT - Dual core a72

02:46PM EDT - Dual Core R5 RPU w/locksetp

02:46PM EDT - Platform Management Control

02:47PM EDT - Crypto accelerators for security

02:47PM EDT - 10 Gbps Debug and trace interface

02:48PM EDT - 50 Gbps config interface

02:48PM EDT - Hardware RoT

02:48PM EDT - SHA3 engine, AES, RSA, AES-key

02:48PM EDT - Programmable logic - 900K LUTs, 2M LC, and 1.8M FLOPs

02:48PM EDT - 4x larger CLB -> 64 flip flops

02:48PM EDT - 158 Mb of URAM and BRAM - 50% lower power than prev gen

02:48PM EDT - Versal Programmable Logic DSP, DSP58

02:48PM EDT - 1958x DSP58 with FP support

02:49PM EDT - 400 AI Engine Tiles, 133 TOPs in INT8

02:49PM EDT - Non-blockign interconnect mesh

02:49PM EDT - 12.5MB L1 distributed memory

02:51PM EDT - In AI engine: 32b Scalar RISC Processor, 2 scalar ops/stream access, 512-bit SIMD vector processor, vec128int8 or vec8fp32, 7+ OPs per cycle VLIW

02:51PM EDT - 128 INT8 MACs per cycle per core

02:51PM EDT - Suited for signal processing workloads

02:52PM EDT - Data movement is done through orchistrated DMA

02:53PM EDT - Memory caches supports multicast/broadcast

02:53PM EDT - Very efficient transmission of streamed data

02:56PM EDT - Versal offers deterministic performance and low latency

02:56PM EDT - Map the compute onto different parts of the ACAP

02:57PM EDT - Software programmable framework

02:58PM EDT - Trying to abstract the programming from hardware into regular C++

02:58PM EDT - Frameworks like mxnet, TensorFlow, Caffee

02:59PM EDT - First Xilinx 7nm device, 133 TOPs, PCIe Gen4 and CCIX

02:59PM EDT - Adaptable heterogeneous system architecture

03:00PM EDT - Q&A

03:00PM EDT - Q: HBM? A: Coming

03:01PM EDT - Q: Support CCIX? Support for Gen-Z and CXL? You're members of those consortiums. A: No official plan on Gen-Z. CXL is still new.

03:01PM EDT - That's a wrap. Next talk is Intel's Spring Crest.

Comments Locked

19 Comments

View All Comments

Log in

Don't have an account? Sign up now