Neural networks, deep learning papers

Overview
- Deep Learning (2016) Ian Goodfellow, Yoshua Bengio, Aaron Courville
- Deep Learning in Neural Networks: An Overview (2014) Jurgen Schmidhuber

Feedforward Neural Networks (FNN)

The perceptron: a probabilistic model for information storage and organization in the brain (1958) F. Rosenblatt
Multilayer Feedforward Networks are Universal Approximators (1989) K. Hornik
Deep Big Simple Neural Nets Excel on Hand-written Digit Recognition (2010) Dan Claudiu Cireşan, Ueli Meier, Luca Maria Gambardella, Jürgen Schmidhuber
GMDH Group method of data handling (Website, Wiki)
- Polynomial Theory of Complex Systems (1971) Ivakhnenko A.G.
- The Review of Problems Solvable by Algorithms of the Group Method of Data Handling (1995) Ivakhnenko A.G., Ivakhnenko G.A.
Binarized Neural Networks
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 (2016) Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
- How to Train a Compact Binary Neural Network with High Accuracy? (2017) Wei Tang, Gang Hua, Liang Wang

Convolutional Neural Networks (CNN)

One of the papers on convolutional nets - [Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position] (https://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf) (1980) K. Fukushima
A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects Zewen Li, Wenjie Yang, Shouheng Peng, Fan Liu
Flexible, High Performance ConvolutionalNeural Networks for Image Classification (2011) Dan C. Ciresan, Ueli Meier, Jonathan Masci, Luca M. Gambardella, Jurgen Schmidhube

Recurrent Neural Networks (RNN)

Boltzmann machines
- Learning and relearning in Boltzmann machines (1986) G. E. Hinton, T. J. Sejnowski
LSTM
- Long Short-term Memory (1997) S. Hochreiter, J. Schmidhuber
- Framewise Phoneme Classification withBidirectional LSTM and Other Neural NetworkArchitectures (2005) Alex Graves, Jurgen Schmidhuber

Unsupervised

Competitive learning
- Feature Discovery by Competitive Learning (1985) David E. Rumelhart
Autoencoders
- Modular learning in neural networks (1987) D.H. Ballard
- Extracting and composing robust features with denoising autoencoders (2008) P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol
- From Deep Learning book - Autoencoders (ch. 14) (2016) Ian Goodfellow, Yoshua Bengio, Aaron Courville
- An Introduction to Variational Autoencoders (2019) Diederik P. Kingma, Max Welling
- Contractive Auto-Encoders: Explicit Invariance During Feature Extraction (2011) S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio
- Deep AutoRegressive Networks (2014) Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, Daan Wierstra
Denoising Autoencoders
VAE Variational autoencoders
- Auto-Encoding Variational Bayes (2014) Diederik P Kingma, Max Welling
- Tutorial on Variational Autoencoders (2016) Carl Doersch
- Variational Autoencoder for Deep Learning of Images, Labels and Captions (2016) Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, Lawrence Carin
SOM Self-organizing maps
Cresceptron (Max-Pooling layers)
- Cresceptron: A Self-organizing Neural Network Which Grows Adaptively (1992) John (Juyang) Weng, Narendra Ahuja, Thomas S. Huang

Generative Adversarial Networks (GAN)

Generative Adversarial Networks (2014) Ian J. Goodfellow, Jean Pouget-Abadie∗, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair†, Aaron Courville, Yoshua Bengio
Time-series Generative Adversarial Networks (2019) J. Yoon, D. Jarrett, M. van der Schaar
Conditional GAN
- Probabilistic Forecasting of Sensory Data with Generative Adversarial Networks (2019) A. Koochali, P. Schichtel, S. Ahmed, A. Dengel

Bayesian Neural Networks (BNN)

A Practical Bayesian Framework for Backpropagation Networks (1992) David J. C. MacKay
Bayesian Learning for Neural Networks (1995) R.M. Neal
Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks (1995) David J. C. MacKay
Practical Variational Inference for Neural Networks (2011) Alex Graves
Weight Uncertainty in Neural Networks (2015) Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2016) Y. Gal, Z. Ghahramani
Stochastic Gradient Descent as Approximate Bayesian Inference (2017) S. Mandt, M.D. Hoffman, D.M. Blei
Deep neural networks as Gaussian Processes (2018) Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein
Noisy Natural Gradient as Variational Inference (2018) Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam (2018) Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava
Understanding Priors in Bayesian Neural Networks at the Unit Level (2019) Mariia Vladimirova, Jakob Verbeek, Pablo Mesejo, Julyan Arbel
Bayesian Deep Learning and a Probabilistic Perspective of Generalization (2020) Andrew Gordon Wilson, Pavel Izmailov

Weightless Neural Networks (WNN)

Based on Random Access Memory (RAM) nodes
Advances in Weightless Neural Systems (2014) F.M.G. França, M. De Gregorio, P.M.V. Lima, W.R. de Oliveira
WiSARD
PLN Probabilistic Logic Nodes
GSN Goal Seeking Neurons
GRAM

Activation functions

Sigmoid
HardSigmoid
SiLU, dSiLU
Tanh, HardTanh
Softmax
Softplus
Softsign
ReLU Rectified Linear Unit
- Rectified Linear Units Improve Restricted Boltzmann Machines (2010) V. Nair, G.E. Hinton
- Deep Sparse Rectifier Neural Networks (2011) X. Glorot, A. Bordes, Y. Bengio
LReLU Leaky ReLU
- Rectifier Nonlinearities Improve Neural Network Acoustic Models (2013) A.L. Maas, A.Y. Hannun, A.Y. Ng
PReLU Parametric ReLU
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (2015) Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
RReLU Randomized ReLU
- Empirical Evaluation of Rectified Activations in Convolutional Network (2015) Bing Xu, Naiyan Wang, Tianqi Chen, Mu Li
SReLU
ELU
- Fast and Accurate Deep Network Learning by Exponential Linear Units (2015) Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter
PELU
- Parametric Exponential Linear Unit forDeep Convolutional Neural Network (2016) L. Trottier, P. Giguère, B. Chaib-draa
SELU
Maxout
Mish
- Mish: A Self Regularized Non-Monotonic Neural Activation Function (2019) Diganta Misra
Swish
ELiSH
HardELiSH

Inference

Weight guessing
Vanishing gradient problem (Wiki)
Double descent
- Deep Double Descent: Where Bigger Models and More Data Hurt (2019) Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever
BP Back-propagation
- Learning representations by back-propagating errors (1986) David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
- Backpropagation Applied to Handwritten Zip Code Recognition (1989) Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel
Pruning - reduces computational cost, improves generalization
- Optimal Brain Damage (1990) Yann Le Cun, John S. Denker, Sara A. Solla
- Learning both Weights and Connections for Efficient Neural Networks (2015) Song Han, Jeff Pool, John Tran, William J. Dally
- Pruning Convolutional Neural Networks for Resource Efficient Inference (2017) Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz
- Learning Sparse Neural Networks through L0 Regularization (2018) Christos Louizos, Max Welling, Diederik P. Kingma
Pretraining
- Why Does Unsupervised Pre-training Help Deep Learning? (2010) D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, S. Bengio
Dropout
- Improving neural networks by preventing co-adaptation of feature detectors (2012) G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. R. Salakhutdinov
- Adaptive dropout for training deep neural networks (2013) L.J. Ba, B. Frey
- The Dropout Learning Algorithm (2014) P. Baldi, P. Sadowski
- Fast dropout training (2013) S.I. Wang, C.D. Manning

Compression

Knowledge Distillation
- Large neural networks (teacher networks) transfer knowledge to smaller networks (called student networks)
Neural Network Pruning
- Removing unimportant weights
Quantization
- Reducing the number of bits used to store the weights
Software
- KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and Quantization (2020) Het Shah, Avishree Khare, Neelay Shah, Khizir Siddiqui

Ensembles

Neural Network Ensembles (1990) L. K. Hansen, P. Salamon
When Networks Disagree: Ensemble Methods for Hybrid Neural Networks (1993) M.P. Perrone, L.N. Cooper
Neural Network Ensembles, Cross Validation, and Active Learning (1995) A. Krogh, J. Vedelsby
When Ensembling Smaller Models is More Efficient than SingleLarge Models (2020) D. Kondratyuk, M. Tan, M. Brown, B. Gong

Optimization

Star Issue