Lingchuan Meng - Arm, Ltd., San Jose, CA
The promise of the Internet-of-Things (IoT) is to improve the quality of services using the information inferred from data collected over a large pool of users. To make this real, yet sustainable at large scale, the computational effort of the sensemaking process shall be distributed over the largest possible number of end-nodes. This encompasses the deployment of complex data-analytics based on Deep Neural Networks (DNNs) on low-power devices with limited computational resources and small storage capacity. The objective of this tutorial is to provide a comprehensive overview of the most effective training and optimization methods to shrink down the complexity of DNNs and make inference engines able to fit tiny cores still preserving enough expressive power to discover useful information from raw data. The tutorial is structured in four parts. The first part provides an overview of the problem, the main figures related to the complexity of state-of-the-art DNNs, and the limiting factors that prevent their use on resource constrained compute nodes. Practical use-cases that adopt off-the-shelf RISC cores will be discussed. The second part will focus on static optimization methods used at design-time to reduce the cardinality of DNNs. Traditional techniques from the machine learning domain, such as quantization and pruning, but also more recent strategies based on net-/operator-topology restructuring and search, will be introduced with special emphasis on their integration into automated flows. The third part will present dynamic optimization methods, a new branch of design strategies thought to give DNNs the ability to adapt to the surrounding context. Finally, the fourth part will touch upon issues related to deep learning frameworks and libraries available from the market, showing a comparative analysis over several figures-of- merit, such as performance, quality-of-design, variability and stability. This latter part integrates short live demos by which attendees will learn how to flash DNNs onto embedded CPUs/MCUs by Arm.
Andrea Calimera bio: Calimera is Associate Professor of Computer Engineering at Politecnico di Torino, Torino, Italy. Prior to that, he was Assistant Professor at the same institution. He received an MSc in Electronic Engineering and a PhD in Computer Engineering, both from Politecnico di Torino. His research interests cover the areas of Electronic Design Automation of digital circuits and embedded systems with emphasis on optimization techniques for low-power and reliable ICs, dynamic energy/quality management, logic synthesis for emerging devices, design flows and methodologies for emerging computing paradigms. He was visiting professor in Singapore, first at the School of Electrical and Computer Engineering of the National University of Singapore (in 2015 and 2016) and then at the School of Computer Science and Engineering of the Nanyang Technological University of Singapore (in 2017 and 2018), contributing to research projects in the field of design automation for ultra-low power digital ICs and emerging technologies. Andrea Calimera is member of the International Federation for Information Processing (IFIP) and he has served on the technical program committee of many EDA and VLSI conferences, including the conference on Design and Test in Europe (DATE) and the International Conference on Computer Aided Design (ICCAD). He is member of the IEEE and Associate Editor of the IEEE Transactions on Circuits and Systems II and Associate Editor of the MDPI AI Journal.
Lingchuan Meng bio: Meng is a Principal Software Engineer working on ML model optimization and deployment at Arm Machine Learning Group. He has also contributed to Arm’s NPU design by inventing new algorithms for activation compression and convolution kernels. Prior to joining Arm, he worked at Qualcomm Research where he developed the Snapdragon Math Libraries. Lingchuan received his PhD in Computer Science from Drexel University, Philadelphia, with his contributions to automatic code generation and algorithms for DSP and polynomial arithmetic.