Adaptive Weight Decay Apple Machine Learning Research
We propose adaptive weight decay, which automatically tunes the hyper-parameter for weight decay during each training iteration. For classification problems, we propose changing the value of the weight decay hyper-parameter on the fly based on the strength of updates from the classification loss (i.e., gradient… Read More »Adaptive Weight Decay Apple Machine Learning Research