Predictive Data Locality Optimization for Higher-Order Tensor Computations
Automating locality optimization is still an open problem for compiler writers. Compiler-based approaches, guided by analytical cost models have achieved some success in matching high performance libraries on a restricted set of computations such as general matrix multiply (GEMM). On the other hand, library-based approaches may present some open scalability concerns. Recent developments in convolutional neural networks has seen an explosion of models, each with differing combinations of parameters. Manually tuning each of these configurations can take many development months. Further, these operations are called multiple times during machine learning training, which necessitates highly optimized implementations. 2D convolutional operators are unique in that they consist of 7-deep loop nests with different loops carrying reuse for different tensors, making the problem of identifying an optimal loop ordering hard. We devise a machine learning-based compiler which learns a regression model, correlating performance with the loop order. We integrate this model with other traditional compiler analysis for transformations such as loop unrolling and vectorization, relying on the Multi Level Intermediate Representation (MLIR) compiler framework. We achieve an average speedup of 1.67× and 1.41× for 2D convolution forward and weight update kernels respectively. We are also at 0.88× and 0.96× the performance of oneDNN’s best performing implementation which applies additional data layout transformations.
Mon 21 JunDisplayed time zone: Eastern Time (US & Canada) change
16:45 - 19:15 | |||
16:45 60mTalk | Machine Learning for Autotuning Production Machine Learning Compilers MAPS | ||
17:45 30mTalk | Pure, Low-Level Tensor Program Rewriting via Access Patterns (Representation Pearl) MAPS Gus Henry Smith University of Washington, Andrew Liu University of Washington, Steven Lyubomirsky University of Washington, USA, Scott Davidson University of Washington, Joseph McMahan University of Washington, Michael Bedford Taylor University of Washington, Luis Ceze University of Washington, Zachary Tatlock University of Washington, Seattle | ||
18:15 30mTalk | ControlFlag: A Self-supervised Idiosyncratic PatternDetection System for Software Control Structures MAPS | ||
18:45 30mTalk | Predictive Data Locality Optimization for Higher-Order Tensor Computations MAPS Tharindu Patabandi University of Utah, Anand Venkat , Abhishek Kulkarni Intel, Pushkar Ratnalikar Intel Labs, Mary Hall University of Utah, Justin Gottschlich Intel Labs / Penn |