MONDAY July 20, 1:30pm - 3:00pm
Tutorial 8 Part 1: Compute Acceleration using FPGAs
Vinod Kathail - Xilinx Inc., San Jose, CA
Ashish Sirasao - Xilinx Inc., San Jose, CA
Zhiru Zhang - Cornell Univ., Ithaca, NY
Kumar Deepak - Xilinx Inc., San Jose, CA

Field Programmable Gate Arrays or FPGAs provide significant flexibility in implementing compute-intensive applications providing an advantage in throughput, latency, and energy efficiency when compared to general-purpose CPUs and GPUs. Over the last decade, FPGAs have evolved into highly configurable SoCs with on-chip general-purpose CPUs, domain-specific programmable accelerators, and flexible connectivity options. Recently, Xilinx introduced a new heterogeneous compute architecture, the adaptive compute acceleration platform (ACAP), which delivers the best of all three worlds—world-class vector and scalar processing elements tightly coupled to next-generation programmable logic (PL), all tied together with a high-bandwidth network-on-chip, which provides memory-mapped access to all three processing element types. This tightly coupled hybrid architecture allows more dramatic customization and performance increase than any one implementation alone. This advancement on the device side is accompanied by the availability of higher-level programming approaches with a broad set of tools, flows, libraries, and off the shelf platforms, which has made FPGAs significantly easy to use for a wide range of compute intensive applications. Today FPGAs can be programmed using C, C++, OpenCL, and can be deployed for large-scale data processing and machine learning using frameworks such as SPARK and TensorFlow. The Xilinx toolchain supports multiple entry methods for every type of developer. For example, certain applications (such as AI machine learning inference) can be coded at the framework level (e.g., Caffe, TensorFlow); others can be coded in C using pre-optimized libraries (e.g., filters for 5G radio). Traditional hardware developers can still port their existing RTL to ACAP via the traditional RTL entry flow. As a result, FPGAs are getting deployed to accelerate a broad range of applications namely: Deep learning, Genomics, HPC workloads, Security, Encryption & Decryption, Compression & Decompression, Video analytics, Databases queries & analytics, Graph processing, Storage acceleration, and Network acceleration.

This tutorial will cover the following areas:

1. Xilinx ACAP architecture

2. Algorithm and application examples which use FPGAs to accelerate compute-intensive workloads

3. Challenges in using the FPGA architecture for general purpose computing

4. Higher-level programming models, Compiler technology, Tools, Libraries, and Design flows

5. Large-scale data processing deployment of FPGAs 6. Future FPGA architectures and programming models

Vinod Kathail Bio: Kathail is a Xilinx Fellow and Chief Architect for the Vitis Development Environment. He is also leading the company-wide focus on embedded vision including machine learning usage in edge and endpoint applications. At Xilinx, he initiated the software programmability effort for the Zynq family and developed and drove the adoption of SDSoC earlier on. Prior to joining Xilinx, Vinod was the founding CEO and later CTO of Synfora, a high-level synthesis startup. Vinod brings over 25 years of experience in heterogenous programming environments, high-performance parallel and VLIW architectures, parallelizing compilers and high-level synthesis, working in both research labs (HP Labs) and startups. 
Vinod received an ScD in Electrical Engineering and Computer Science from MIT. He holds over 25 patents, and he has authored numerous research publications.

Ashish Sirasao Bio: Sirasao (M. Tech, EE, IIT Mumbai, 1993) is a Fellow Engineer in the Xilinx Central Engineering team. He is currently involved in defining and implementing hardware and software architectures for high-performance accelerators in Deep Learning, Data Analytics, Computer Vision, and Video Codecs on Xilinx FPGAs.

Zhiru Zhang Bio: Zhang is an Associate Professor in the School of ECE at Cornell University. His current research investigates new algorithms, design methodologies, and automation tools for heterogeneous computing. His research has been recognized with a Google Faculty Research Award (2019), the DAC Under-40 Innovators Award (2018), the Rising Professional Achievement Award from the UCLA Henry Samueli School of Engineering and Applied Science (2018), a DARPA Young Faculty Award (2015), and the IEEE CEDA Ernest S. Kuh Early Career Award (2015), an NSF CAREER Award (2015), the Ross Freeman Award for Technical Innovation from Xilinx (2012), and multiple best paper awards and nominations. Prior to joining Cornell, he was a co-founder of AutoESL, a high-level synthesis start-up later acquired by Xilinx.

Kumar Deepak Bio: Deepak is a Distinguished Engineer in the Data Center Group (DCG) at Xilinx.  He has over 20 years of experience in architecting and developing large-scale complex software and hardware systems.  Currently, he is leading solutions for Database Acceleration using Alveo.  He received his B.S in Electronics and Communication Engineering from Indian Institute of Technology, Kharagpur.