Introduction to Ceres for Non-Linear Optimization

Imagine you're dropped onto a vast, rugged landscape with hills, valleys, ridges, and plains. Your goal is to find the lowest point in a valley (or the highest point on a hill). But there’s a twist : You are blindfolded. This is the essence of non-linear optimization.

Non-linear optimization is the process of finding the minimum or maximum of a function where the objective function or the constraints (or both) are non-linear. Handling Non-linear optimizations manually has its own set of challenges, to name a few :

Multiple Local Optima: Unlike a single bowl-shaped curve, a non-linear function can have many "hills" and "valleys."
- A Local Minimum is a point that is lower than all nearby points, but it might not be the absolute lowest point on the entire landscape.
- The Global Minimum is the absolute lowest point you are trying to find.
  The big challenge is that an algorithm might find a local minimum and get "stuck," thinking it's the best solution, while a much better solution exists elsewhere.
Curvature and Saddle Points: Some points are flat (gradient is zero), but they are neither a minimum nor a maximum. They are like a mountain pass or a saddle. It's tricky for algorithms to know if they should move on or stop here.
Constraints: Most of the time when we are solving real world problems in which we are bound by some constraints on the possible solutions. Manually designed processes have to keep them in account.

However, the Ceres library we are going to discuss, being especially designed for helping non-linear optimizations, can handle all these challenges while maintaining a layer of abstraction and simplicity. You just have to provide the function, the data points and a few more steps and ceres will solve it for you.

This is what makes it extremely useful in applications like 3D Computer Vision and Robotics where it is often used for tasks like Bundle Adjustment. Needless to say, since it was developed by google, and battle-tested over a long time period, it is fast and robust for most usecases.

General Flow of a Ceres Program

Define Your "Cost Function" : You describe what "error" you're trying to minimize.
Provide starting values for all variables you're optimizing
Set Up the Optimization Problem : This includes adding each cost function term to the overall problem and specifying all parameters.
Choose Your Solver Strategy : This means :
- Selecting which algorithm to use (Trust Region, Line Search, etc. we will discuss them as we go)
- Setting convergence criteria.
- Configuring linear solver settings.
Run the Solver : Ceres takes over and begins the iterative process:
- Evaluates current position
- Computes gradients (slopes)
- Determines best direction and step size
- Updates parameters
- Repeats until convergence

In this article, we will perform this exact process on a Gaussian Function, defined by three parameters :

$\alpha$ (amplitude): The height of the curve's peak.
$\beta$ (mean): The position of the center of the peak.
$c$ (std. deviation): The "width" of the curve.

The formula is:

$$f(x; \alpha, \beta, c) = \alpha \cdot \exp\left(-\frac{(x - \beta)^2}{2c^2}\right)$$

Data

For the sake of simplicity we will generate synthetic data. To make sure it is more or less around the distribution in concern, we will use the default_random_engine generator and noise_distribution method from random header.

First we will define the true values we are gonna consider for our synthetic data.. Then use noise_distribution for generating synthetics noise (and store it in a normal_distribution container.

Now, we will use the actual formula on the x values to get the actual y values, but add this gaussian noise to make it imitate real data.

    const double a_true = 10.0;
    const double b_true = 5.0;
    const double c_true = 1.0;

    std::vector<double> data_x;
    std::vector<double> data_y;

    std::default_random_engine generator;
    // Noise with a mean of 0 and std. dev. of 0.2
    std::normal_distribution<double> noise_distribution(0.0, 0.2); 

    for (double x = 0.0; x <= 10.0; x += 0.1) {
        double y_perfect = a_true * std::exp(-(x - b_true) * (x - b_true) / (2.0 * c_true * c_true));
        double noise = noise_distribution(generator);

        data_x.push_back(x);
        data_y.push_back(y_perfect + noise);
    }

Cost Function

As said, now is the time to define the cost functions.

Ceres uses its own type for automatic differentiation : Jet (which you won’t see here). But to help ceres use the same function for calculating the cost (in double) and automatically differentiate it using ceres::Jet we have to create a template for the error calculating operator.

And to bundle the logic (the operator()) and the data, together into a single object, we will make a struct. This turns both into a single functor which AutoDiffCostFunction knows how to use.

struct GaussianResidual {
    GaussianResidual(double x, double y) : x_(x), y_(y) {}

    template <typename T>
    bool operator()(const T* const a, const T* const b,
                    const T* const c, T* residual) const {
        const T x = T(x_);
        const T y = T(y_);
        const T exponent = -(x - b[0]) * (x - b[0]) / (T(2.0) * c[0] * c[0]);
        const T y_predicted = a[0] * ceres::exp(exponent);
        residual[0] = y - y_predicted;
        return true;
    }

private:
    const double x_;
    const double y_;
};

Initializing Variables and Problem Construction

The first part is simple. We have to initialize the values of the variables we are optimizing (since you have to be standing somewhere if you are trying to find the minima of the terrain). For now, we just have to keep this different from the true values we decided ahead, but in real setting, the true values are not known beforehand, and there might be multiple heuristics to decide a good guess based on the application.

    double a = 3.0;
    double b = 2.0;
    double c = 0.5;

Now we have to build a problem, for which we will use the ceres::Problem class.Its job is to build and store a graph that defines your optimization. This graph has two main components:

Parameters (The "Nodes"): These are the variables we want to find (a, b, and c in our example). They are added to the problem implicitly when we add a residual block.
Residual Blocks (The "Edges"): These are the individual cost functions or error terms that connect the parameters (the ones we just discussed). In our example, each of our 101 noisy data points becomes one residual block.

This graph will be used by ceres to later solve the problem by minimizing the sum of the errors from all individual residual blocks as small as possible. So this is what ‘building the problem’ means in context of ceres program flow.

    ceres::Problem problem;

    for (size_t i = 0; i < data_x.size(); ++i) {
        ceres::CostFunction* cost_function =
            new ceres::AutoDiffCostFunction<GaussianResidual, 1, 1, 1, 1>(
                new GaussianResidual(data_x[i], data_y[i])
            );

        // For the second parameter we have specified no loss function (i.e. use standard L2)
        problem.AddResidualBlock(cost_function, nullptr, &a, &b, &c); 
    }


    // Adding Constraints: (We know 'a' (amplitude) and 'c' (width) must be positive.)
    problem.SetParameterLowerBound(&a, 0, 0.0);
    problem.SetParameterLowerBound(&c, 0, 1e-6); // (to avoid division by zero)

Also, the <GaussianResidual, 1, 1, 1, 1> means that we are using the GaussianResidual struct we just created. The first 1 means that the residual has size 1 (a single number). The rest of the 1s denote size of the parameters (single parameter).

Choosing Solver Strategy

It supports several solver strategies, each suited for different types of optimization problems. You don’t have to know everything about them, since that’s the point of using high-level libraries like ceres. you just need to know some level of internal mechanisms and which use cases they are meant for:

Strategy	Description	Use case
Trust Region Strategy	At each iteration, it restricts the step size to a "trust region" (a bounded area in parameter space) where the quadratic model is trusted to approximate the true objective function well. The region is adjusted based on how well the model predicts the actual reduction in the objective.	Preferred for problems with strong non-linearity or when the Hessian matrix is not positive definite.
Line Search Strategy	It searches along a descent direction (e.g., gradient or Newton direction) for a step size that sufficiently reduces the objective function. The step size is determined by satisfying the Wolfe conditions or similar criteria.	Often used when the problem is large and the Hessian is sparse or structured.
Dogleg Strategy	It interpolates between the steepest descent direction and the Newton direction. Chooses a point along this "dogleg" path that minimizes the objective within the trust region.	Useful when the Newton step is too large or not a descent direction.
Levenberg-Marquardt Strategy	Well, we’ve talked about this one a lot in our previous blog since this is actually used in 3D applications like Camera Calibration. It interpolates between the Gauss-Newton method (for small damping) and gradient descent (for large damping) by adjusting a damping parameter.	Useful when the Newton step is too large or not a descent direction.

Right now we will be choosing the Trust Region Strategy (since it is default). Each of these strategies have a few configurations. For an analogy, think of these strategies to be like functions and these configurations to be like their parameters (this is very similar to how we are going to use them anyways). We can choose different sub-strategies to define how the optimization is gonna work. For example, our Trust Region strategy has 3 different components :

Linear Solver Type
Preconditioner Type

The linear solver is used to solve the linear system arising from the Newton step within the trust region. Options include:

DENSE_QR : which uses QR decomposition. Best for small to medium problems (typically < 1000 parameters).
DENSE_NORMAL_CHOMSKY : uses Chomsky factorization. Faster but less stable for ill-conditioned problems.
SPARSE_NORMAL_CHOMSKY : same thing for sparse matrices.
CGNR : Conjugate Gradient on the Normal equations. Memory-efficient for very large problems.

From this, you could’ve already guessed that we are gonna use DENSE_QR. We won’t be using any pre-conditioner type since they are used in Iterative solvers for better convergence.

    ceres::Solver::Options options;
    options.linear_solver_type = ceres::DENSE_QR; 

    // To print a summary of the optimization to the console
    options.minimizer_progress_to_stdout = true;

Now what remains is to just ask ceres to solve. It’s solvin’ time :

    // This is a data object that gets filled after the solver finishes. 
    // It contains all the key statistics about the solve 
    // which can be used for conditional logic.
    ceres::Solver::Summary summary;
    ceres::Solve(options, &problem, &summary);

This completes the whole code. I added some additional cout statements on my own while making this. You will have to build this with something like CMake. This is the expected terminal output (with a few added print statements) :

--- Ceres Gaussian Curve Fit ---
Generated 101 noisy data points.

Initial parameters:
a: 3 (True: 10)
b: 2 (True: 5)
c: 0.5 (True: 1)
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  9.199685e+02    0.00e+00    1.96e+01   0.00e+00   0.00e+00  1.00e+04        0    7.30e-04    8.02e-04
   1  8.856522e+02    3.43e+01    9.28e+00   0.00e+00   8.52e-01  1.54e+04        1    1.48e-03    2.70e-03
   2  8.823675e+02    3.28e+00    2.14e+01   7.08e-01   1.00e+00  4.61e+04        1    1.45e-03    4.16e-03
   3  8.615956e+02    2.08e+01    1.16e+02   1.34e+00   3.91e-01  4.56e+04        1    2.90e-03    7.08e-03
   4  7.530873e+02    1.09e+02    9.32e+01   7.78e+01   1.32e-01  3.26e+04        1    1.46e-03    8.54e-03
   5  7.527508e+02    3.37e-01    1.11e+02   1.39e+01   1.72e-03  1.64e+04        1    3.62e-03    1.22e-02
   6  7.518145e+02    9.36e-01    1.38e+02   1.82e+01   4.68e-03  8.31e+03        1    3.59e-03    1.58e-02
   7  7.517820e+02    3.24e-02    1.38e+02   3.79e-01   1.39e-04  4.16e+03        1    2.88e-03    1.86e-02
   8  7.506548e+02    1.16e+00    1.87e+02   3.02e+01   5.56e-03  2.11e+03        1    2.91e-03    2.16e-02
   9  7.496389e+02    1.02e+00    1.88e+02   5.34e+00   2.12e-03  1.06e+03        1    3.60e-03    2.52e-02
  10  7.487047e+02    9.34e-01    1.89e+02   2.35e+00   1.94e-03  5.35e+02        1    3.63e-03    2.88e-02
  11  7.478047e+02    9.00e-01    1.89e+02   1.46e+00   1.87e-03  2.69e+02        1    3.57e-03    3.24e-02
  12  7.469411e+02    8.64e-01    1.89e+02   1.02e+00   1.79e-03  1.35e+02        1    3.65e-03    3.60e-02
  13  7.461288e+02    8.12e-01    1.89e+02   7.52e-01   1.68e-03  6.79e+01        1    3.62e-03    3.97e-02
  14  7.453968e+02    7.32e-01    1.89e+02   5.54e-01   1.52e-03  3.41e+01        1    3.61e-03    4.33e-02
  15  7.447889e+02    6.08e-01    1.89e+02   3.91e-01   1.27e-03  1.71e+01        1    3.65e-03    4.69e-02
  16  7.443549e+02    4.34e-01    1.89e+02   2.43e-01   9.25e-04  8.56e+00        1    2.88e-03    4.98e-02
  17  7.361931e+02    8.60e+00    1.95e+02   8.29e+00   1.95e-02  4.53e+00        1    2.88e-03    5.27e-02
  18  6.707383e+02    6.55e+01    2.30e+02   2.23e+00   1.06e-01  3.04e+00        1    2.18e-03    5.49e-02
  19  3.579940e+02    3.13e+02    1.96e+02   4.49e+00   5.49e-01  3.04e+00        1    1.46e-03    5.64e-02
  20  1.454938e+02    2.13e+02    7.15e+01   1.47e+00   1.01e+00  9.13e+00        1    1.45e-03    5.78e-02
  21  4.862529e+01    9.69e+01    2.16e+02   2.55e+00   8.25e-01  1.26e+01        1    1.44e-03    5.93e-02
  22  2.342913e+00    4.63e+01    8.25e+00   1.51e+00   9.93e-01  3.78e+01        1    1.43e-03    6.07e-02
  23  1.837245e+00    5.06e-01    5.05e-01   2.37e-01   9.91e-01  1.13e+02        1    1.45e-03    6.22e-02
  24  1.835576e+00    1.67e-03    2.64e-03   1.60e-02   9.98e-01  3.40e+02        1    1.45e-03    6.36e-02

Ceres Solver Report: Iterations: 25, Initial cost: 9.199685e+02, Final cost: 1.835576e+00, Termination: CONVERGENCE
-----------------------------------
True parameters:
a: 10, b: 5, c: 1

Final solved parameters:
a: 9.96887
b: 4.99752
c: 1.00349

As you can see, the final solved parameters are extremely close the true ones, which shows that our optimization worked perfectly.

Introduction to Ceres for Non-Linear Optimization

General Flow of a Ceres Program

Data

Cost Function

Initializing Variables and Problem Construction

Choosing Solver Strategy

Comments

Camera Calibration

Stereo Camera Calibration

More from this blog

TensorRF Explained: A milestone in Novel View Synthesis

Stereo Camera Calibration

Iterative Closest Point Registration with Open3D C++

Introduction to CUDA Programming

Command Palette

General Flow of a Ceres Program

Data

Cost Function

Initializing Variables and Problem Construction

Choosing Solver Strategy

Comments

Camera Calibration

Stereo Camera Calibration

More from this blog