Suppose this is our loss function. Given a weight \(w\) (shoe size), you get a corresponding loss (feeling). Clearly, the point marked by the red dashed line is the optimal solution (the shoes you ultimately buy).

The shopping assistant or you would choose an initial \(w\) as the first attempt.

After trying, based on your feelings (loss = 6), you get the corresponding feedback (gradient = -7). Then, the assistant with a learning rate of 0.1 will suggest that you move 0.7 units to the right.

Following its guidance, you obtain a new shoe size and continue to try again.

Based on your feedback (gradient), the assistant gives you a new suggestion.

Try again!

Again!

And again!

Until you find the right pair of shoes.