- Visualizing Data using t-SNE - January 22, 2020
- Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift - January 15, 2020
- Cyclical Learning Rates for Training Neural Networks - January 14, 2020
- Automatically Inferring Data Quality for spatiotemporal Forecasting - January 10, 2020
- Visualizing the Loss Landscape of Neural Nets - January 7, 2020
- On the Variance of the Adaptive Learning Rate and Beyond - December 27, 2019
- Deep Residual Learning for Image Recognition - December 23, 2019
- QSGD - Communication-Efficient SGD via Gradient Quantization and Encoding - December 3, 2019
- A Closer Look at Deep Learning Heuristics - Learning Rate Restarts, Warmup and Distillation - December 1, 2019
- Don't Decay the Learning Rate, Increase the Batch Size - November 28, 2019
- Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour - November 26, 2019
- SAGA - A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives - November 23, 2019
- Accelerating Stochastic Gradient Descent using Predictive Variance Reduction - November 22, 2019
- Scaling SGD Batch Size to 32K for ImageNet Training - November 3, 2019
- SGD - General Analysis and Improved Rates - November 2, 2019
- ADAM - A Method for Stochastic Optimization - October 30, 2019
- SGDR - Stochastic Gradient Descent with warm Restarts - October 27, 2019