Awesome machine learning for compilers and program optimisation

This repository contains a curated list of awesome research papers, datasets and tools for applying machine learning techniques to compilers and program optimisation.

Papers

Survey and Tutorials

Machine Learning in Compiler Optimisation - Zheng Wang and Michael O'Boyle, Proceedings of the IEEE, 2018
A survey on compiler autotuning using machine learning - Ashouri, Amir H., William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano, ACM Computing Surveys (CSUR), 2018
A survey of machine learning for big code and naturalness - Allamanis, Miltiadis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton, ACM Computing Surveys (CSUR), 2018

Tuning Compiler Options and Passes

A Collaborative Filtering Approach for the Automatic Tuning of Compiler Optimisations - Cereda, Stefano, Gianluca Palermo, Paolo Cremonesi, and Stefano Doni, LCTES 2020.
Autophase: Compiler phase-ordering for hls with deep reinforcement learning. Qijing Huang, Ameer Haj-Ali, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek. FCCM 2019.
Micomp: Mitigating the compiler phase-ordering problem using optimization sub-sequences and machine learning - Amir H. Ashouri, Andrea Bignoli, Gianluca Palermo, Cristina Silvano, Sameer Kulkarni, and John Cavazos. ACM Transactions on Architecture and Code Optimization (TACO) 2017.
Learning to superoptimize programs - Rudy Bunel, Alban Desmaison, M. Pawan Kumar, Philip H.S. Torr, Pushmeet Kohlim. ICLR 2017
Mitigating the compiler optimization phase-ordering problem using machine learning - Sameer Kulkarni and John Cavazos, OOPSLA 2012
MILEPOST GCC: machine learning based research compiler - Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Ayal Zaks, Bilha Mendelson et al., 2008
Rapidly selecting good compiler optimizations using performance counters - John Cavazos, Grigori Fursin, Felix Agakov, Edwin Bonilla, Michael FP O'Boyle, and Olivier Temam. CGO 2007.
Using machine learning to focus iterative optimization - Agakov, Felix, Edwin Bonilla, John Cavazos, Björn Franke, Grigori Fursin, Michael FP O'Boyle, John Thomson, Marc Toussaint, and Christopher KI Williams. CGO 2006.

Instruction-level Optimisation

NeuroVectorizer: end-to-end vectorization with deep reinforcement learning - Ameer Haj-Ali, Nesreen K. Ahmed, Ted Willke, Yakun Sophia Shao, Krste Asanovic, and Ion Stoica. CGO 2020.
Compiler Auto-Vectorization with Imitation Learning - Charith Mendis, Cambridge Yang, Yewen Pu, Saman P. Amarasinghe, Michael Carbin. NeurIPS 2019.
Learning to schedule straight-line code - J. Eliot B. Moss, Paul E. Utgoff, John Cavazos, Doina Precup, Darko Stefanovic, Carla E. Brodley, and David Scheeff. NeurIPS 1998.

Auto-tuning

TVM: An automated end-to-end optimizing compiler for deep learning - Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan et al., OSDI 2018
Cobayn: Compiler autotuning framework using bayesian networks - Amir Hossein Ashouri, Giovanni Mariani, Gianluca Palermo, Eunjung Park, John Cavazos, and Cristina Silvano, ACM Transactions on Architecture and Code Optimization (TACO), 2016.
Autotuning algorithmic choice for input sensitivity - Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May O'Reilly, and Saman Amarasinghe. PLDI 2015
Fast: A fast stencil autotuning framework based on an optimal-solution space model - Yulong Luo, Guangming Tan, Zeyao Mo, and Ninghui Sun. ACM Transactions on Architecture and Code Optimization (TACO), 2015.
GPU performance and power tuning using regression trees - Wenhao Jia, Elba Garza, Kelly A. Shaw, and Margaret Martonosi. SC 2015.
Opentuner: An extensible framework for program autotuning - Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe, PACT 2014

Parallelism Mapping and Task Scheduling

Code Mapping in Heterogeneous Platforms Using Deep Learning and LLVM-IR - Francesco Barchi, Gianvito Urgese, Enrico Macii, and Andrea Acquaviva. DAC 2019.
Improving spark application throughput via memory aware task co-location: A mixture of experts approach - Vicent Sanz Marco, Ben Taylor, Barry Porter, and Zheng Wang. Middleware 2017.
Quasar: resource-efficient and QoS-aware cluster management - Christina Delimitrou, and Christos Kozyrakis. ASPLOS 2014.
Automatic and portable mapping of data parallel programs to opencl for gpu-based heterogeneous systems - Zheng Wang, Dominik Grewe, and Michael O'boyle. ACM Transactions on Architecture and Code Optimization (TACO), 2014.
Automatic and portable mapping of data parallel programs to opencl for gpu-based heterogeneous systems - Zheng Wang, Georgios Tournavitis, Björn Franke, and Michael FP O'boyle. ACM Transactions on Architecture and Code Optimization (TACO), 2014.
Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms - Yuan Wen, Zheng Wang, and Michael FP O'Boyle. HiPC 2015.
Smart, adaptive mapping of parallelism in the presence of external workload - Murali Krishna Emani, Zheng Wang, and Michael O'Boyle. CGO 2013.
Partitioning streaming parallelism for multi-cores: a machine learning based approach - Zheng Wang and Michael O'Boyle. PACT 2010.
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping - Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. MICRO 2009.
Mapping parallelism to multi-cores: a machine learning based approach - Zheng Wang and Michael O'Boyle. PPoPP 2009.

Domain-specific Optimisation

Bridging the gap between deep learning and sparse matrix format selection -Yue Zhao, Jiajia Li, Chunhua Liao and Xipeng Shen. PPoPP 2018.

Languages and Compilation

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines - Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe, PLDI 2013.
PetaBricks: a language and compiler for algorithmic choice - Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe, PLDI 2009.

Cost Models

Learning to Optimize Halide with Tree Search and Random Programs - Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michael Gharbi, Benoit Steiner, Steven Johson, Kayvon Fatahalian, Fredo Durand, Jonathan Ragan-Kelley. ACM Trans Graph, 38(4), 2019.
Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks - Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. ICML 2019.
Absinthe: Learning an Analytical Performance Model to Fuse and Tile Stencil Codes in One Shot - Tobias Gysi, Tobias Grosser, and Torsten Hoefler. PACT 2019.

Learning Program Representation

Compiler-based graph representations for deep learning models of code - Alexander Brauckmann, Andrés Goens, Sebastian Ertel, and Jeronimo Castrillon. CC 2020.
code2seq: Generating sequences from structured representations of code - Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. ICLR 2019.
code2vec: Learning distributed representations of code - Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. POPL 2019.
Neural Code Comprehension: A Learnable Representation of Code Semantics - Tal Ben-Nun, Alice Shoshana Jakobovits, and Torsten Hoefler. NeurIPS 2018.
End-to-end deep learning of optimization heuristics - Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. PACT 2017.
Using graph-based program characterization for predictive modeling - Eunjung Park, John Cavazos, and Marco A. Alvarez. CGO 2011.
Automatic feature generation for machine learning based optimizing compilation - Hugh Leather, Edwin Bonilla, and Michael O'Boyle. CGO 2009.

Enabling ML in Compilers

Synthesizing benchmarks for predictive modeling - Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. CGO 2017.
Minimizing the cost of iterative compilation with active learning - William Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. CGO 2017.

Talks

Saman Amarasinghe, Compiler 2.0: Using Machine Learning to Modernize Compiler Technology. LCTES 2020.

Software

programl - LLVM and XLA IR program representation for machine learning.
NeuroVectorizer - Using deep reinforcement learning (RL) to predict optimal vectorization compiler pragmas (paper).
TVM - Open Deep Learning Compiler Stack for cpu, gpu and specialized accelerators (paper; slides).
clgen - Benchmark generator using LSTMs (paper).
OpenTuner - Framework for building domain-specific multi-objective program autotuners (paper; slides)

Benchmarks and Datasets

BHive - A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models (paper).
cBench - 32 C benchmarks with datasets and driver scripts.
DeepDataFlow - 469k LLVM-IR files and 8.6B data-flow analysis labels for classification labels.
devmap - 650 OpenCL benchmark features and CPU/GPU classification labels.

Conferences

Contributions

See Contributions.md. TL;DR: send me (@zwang4) a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Contributions.md		Contributions.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome machine learning for compilers and program optimisation

Contents

Papers

Survey and Tutorials

Tuning Compiler Options and Passes

Instruction-level Optimisation

Auto-tuning

Parallelism Mapping and Task Scheduling

Domain-specific Optimisation

Languages and Compilation

Cost Models

Learning Program Representation

Enabling ML in Compilers

Talks

Software

Benchmarks and Datasets

Conferences

Contributions

About

Releases

Packages

License

lancasterJie/awesome-machine-learning-in-compilers

Folders and files

Latest commit

History

Repository files navigation

Awesome machine learning for compilers and program optimisation

Contents

Papers

Survey and Tutorials

Tuning Compiler Options and Passes

Instruction-level Optimisation

Auto-tuning

Parallelism Mapping and Task Scheduling

Domain-specific Optimisation

Languages and Compilation

Cost Models

Learning Program Representation

Enabling ML in Compilers

Talks

Software

Benchmarks and Datasets

Conferences

Contributions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages