Stochastic gradient descent using second-order information. Ye, C., Yang, Y., Fermuller, C., & Aloimonos, Y. (2017). On the Importance of Consistency in Training Deep Neural Networks. arXiv pre arXiv:1708.00631.
# Arguments
lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter updates momentum.
decay: float >= 0. Learning rate decay over each update.
nesterov: boolean. Whether to apply Nesterov momentum.