Optimization

Visualizing Data using t-SNE - January 22, 2020
Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift - January 15, 2020
Cyclical Learning Rates for Training Neural Networks - January 14, 2020
Automatically Inferring Data Quality for spatiotemporal Forecasting - January 10, 2020
Visualizing the Loss Landscape of Neural Nets - January 7, 2020
On the Variance of the Adaptive Learning Rate and Beyond - December 27, 2019
Deep Residual Learning for Image Recognition - December 23, 2019
QSGD - Communication-Efficient SGD via Gradient Quantization and Encoding - December 3, 2019
A Closer Look at Deep Learning Heuristics - Learning Rate Restarts, Warmup and Distillation - December 1, 2019
Don't Decay the Learning Rate, Increase the Batch Size - November 28, 2019
Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour - November 26, 2019
SAGA - A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives - November 23, 2019
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction - November 22, 2019
Scaling SGD Batch Size to 32K for ImageNet Training - November 3, 2019
SGD - General Analysis and Improved Rates - November 2, 2019
ADAM - A Method for Stochastic Optimization - October 30, 2019
SGDR - Stochastic Gradient Descent with warm Restarts - October 27, 2019