MONDAY July 20, 4:00pm - 5:30pm
Tutorial 13: Co-designing Hardware-Software-Algorithms for Next Generation AI Systems
Jungwook Choi - Hanyang Univ., Seoul, Republic of Korea
Swagath Venkataramani - IBM T.J. Watson Research Center, Yorktown Heights , NY

Deep Neural Networks (DNNs) have achieved super-human levels of algorithmic performance on many Artificial Intelligence (AI) tasks involving images, videos, text and natural language. However, their superior accuracy comes at an unprecedented computational cost, outstripping the capabilities of modern CPU and GPGPU platforms. This has resulted in an AI efficiency gap, bridging which is pivotal to the ubiquitous adoption of DNNs. To this end, this tutorial summarizes three key approaches adopted by IBM Research (and broadly by others in the research community) to improve the computational efficiency of DNNs. The first approach is the use of hardware accelerators tailored to leverage the computational characteristics of DNNs. Specifically, we will describe the RAPID AI core, which embodies a dataflow architecture and fabricated at 14nm technology. RAPID is designed with a 2D systolic array of processing elements to efficiently execute convolutions and matrix multiplications. It also possesses a 1D array of special function units tailored to realize low Bytes/FLOP operations such activation, normalization and pooling functions. The RAPID core can be programmed to orchestrate various dataflows between the processing elements and memory hierarchy, balancing the trade-off between flexibility and efficiency. The second approach involves building custom compilers to map DNN workloads onto accelerators. DNNs are static dataflow graphs and their computations can be encapsulated using using few (tens of) primitives. While this lends well to hardware acceleration, DNNs also exhibit abundant heterogeneity across layers making each layer computationally unique and to be programmed differently. To this end, we present DEEPTOOLS, a suite of software extensions that leverage and work within popular deep learning frameworks (e.g., TensorFlow). Given a DNN workload description and the target system specification, DEEPTOOLS carry out aggressive performance optimizations such as graph modifications (e.g. node ordering/fusion), optimized data-structure placement, tiling and loop orders and program generation. The third approach, approximate computing, leverages the error resilient nature of DNNs and executes a chosen subset of computations in an approximate manner. Based on comprehensive algorithmic studies, we will describe systematic methodologies to quantize various DNN data-structures viz. activations, deactivations, weights and weight gradients with little to no loss in accuracy. We also investigate custom number representations for DNNs which further reduce compute and memory requirements. Through an end-to-end AI system evaluation (including silicon measurements) on multiple state-of-the-art workloads, we demonstrate over an order of magnitude improvement in compute efficiency by leveraging the above approaches. We believe the tenets described in this tutorial are critical in enabling next generation AI compute platforms.

Swagath Venkataramani bio: Venkataramani is a Research Staff Member at the IBM T.J. Watson Research Center in Yorktown Heights, NY. Previously, he received his Ph.D. in Electrical and Computer Engineering from Purdue University in 2016. His research interests include hardware and software optimizations for machine/deep learning and approximate computing. His research has received multiple best paper awards and nominations and has been supported through prestigious fellowships. He is a member of the ACM.

Jungwook Choi bio: Choi is an assistant professor at Hanyang University in South Korea. His main research interest is efficient implementation of deep learning algorithms. He received the B.S. and M.S.  degree in Electrical and Computer Engineering from Seoul National University, South Korea, in 2008 and 2010, respectively, and his Ph.D. in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign, the US, in 2015. He worked at the IBM T.J. Watson Research Center as a Research Staff Member from 2015 to 2019. He has received several research awards such as DAC 2018 best paper award and has actively contributed to the academic activities, such as Technical Program Committee of DATE 2018-2020(co-chair) and DAC 2018-2020, Technical Committee (DiSPS) in IEEE Signal Processing Society.