Conditioning of a function = ratio between highest and smallest eigenvalues of its Hessian.

Higher conditioning => harder to minimize the function

Gradient Descent gets faster on function with decreasing conditioning L/mu 👇

Comments