Why is gradient vector the direction for steepest ascent?
Part 1:
Let f(x , y) be a function in 3d space.
Let’s define a vector G whose component are the partial derivatives of the function ‘f’ w.r.t. its components.
Since we have the components of G, we can find its direction, incase we have to.
Part 2:
Let U be a unit vector, that points to the direction of steepest ascent. (Note: U is just a placeholder, we don’t know its actual components yet. )
Now, calculate the directional derivative of the function ‘f’ along this vector.
Explanation: partial derivative along x direction means how much the function will change if we change x by 1 unit (everything else remaining same). Since, going along U vector 1 unit will change x by ‘a’ units, we multiply the change of ‘f’ along ‘x’ direction by ‘a’. (Similar for b) and sum of these changes reflect the change in ‘f’ per unit increase in direction of ‘U’.
Going further:
The directional derivative(Du) becomes the dot product of two vectors G and U. (No use of name “gradient vector” still).
The dot product of two vectors is maximum if they point in same direction and 0 if they are orthogonal.
The vector U could represent any vector along any direction, but for the directional derivative ‘Du’ to be maximum, G and U should have same direction or else the dot product will not be maximum.
Conclusion:
We have, in order for a function to increase maximum at a point, it should point at the direction of a unit vector whose direction is the direction of G. And G is defined as the vector whose components are partial derivatives of ‘f’ w.r.t its parameters.
The real definition of gradient vector is same as how we defined G. So, G is the gradient vector.
So, we have the gradient vector is the direction of steepest ascent.
Note: All of the explanation is what I understand and I cannot say all my understanding is correct. So, please let me know in the comments, if any part of my understanding is incorrect or lacks foundation.





