## Introduction

This article aims to provide the reader with a basic understanding of what local maxima are and how they can be avoided when training machine learning models.

Local maxima are points in the parameter space of a function where the function value is greater than the values of all neighbouring points. In other words, they are points where the gradient of the function is zero.

While avoiding local maxima when training machine learning models is not always possible, some methods can be used to minimize their impact. For example, a higher learning rate or a different optimization algorithm can often lead to better results.

## What are local maxima?

A local maximum is a point where the function changes from increasing to decreasing or vice versa. In other words, it’s a “high point” in the graph of the function. A local minimum is a “low point” in the graph.

## How can local maxima be avoided?

There are a few ways that local maxima can be avoided. One way is by using a different start point. Another way is by using a different function. And the last way is by using a different optimization algorithm.

### Identify your problem areas

There are two ways to identify problem areas that may contain local minima. The first is to plot the function and look for places where the graph has a “bump.” The second is to use the first derivative test (or the second derivative test, if you’re calculus-savvy).

To use the first derivative test, find the function’s derivative and set it equal to zero. Find all of the points where this occurs. These will be your critical points. To use the second derivative test, take the derivative of the first derivative and set it equal to zero. Find all of the points where this occurs. These will be your points of inflection.

### Use gradient descent

Local maxima can be avoided using gradient descent, which is an optimization algorithm that finds the optimum solution by iteratively moving in the direction of steepest descent.

### Use conjugate gradient

There are many ways of avoiding local maxima when training neural networks. One popular method is to use a conjugate gradient solver. This solver computes the direction of steepest ascent at each iteration and therefore avoids getting stuck in a local maximum.

### Use Newton’s Method

Newton’s Method is a iterative process that can be used to find the local maxima of a function. The idea behind Newton’s Method is to start with a guess, x0, and then use the derivative of the function, f(x), to find the next point, x1. This process is repeated until the desired accuracy is reached.

## Conclusion

The above discussion shows that local maxima can be avoided using a careful choice of optimization algorithm. In general, algorithms that use derivatives (such as gradient descent) are more likely to find the global optimum, while algorithms that do not use derivatives (such as evolutionary algorithms) are more likely to find a local optimum.