Skip to main content

Command Palette

Search for a command to run...

Stereo Camera Calibration

Updated
16 min read
Stereo Camera Calibration
A

Aspiring Computer Vision engineer, eager to delve into the world of AI/ML, cloud, Computer Vision, TinyML and other programming stuff.

Calibration means using lots of available data to estimate the intrinsic properties of the measuring instrument and correcting the error (or accounting for it) if any. In context of Camera Calibration, these intrinsic parameters are focal lengths, principal points (both often estimated together through the fundamental matrix) and distortion coefficients (radial and tangential).

Stereo means two. Stereo Cameras have two lenses, two mimic the nature of human eyes. This helps them to estimate depth. This article follows from one of my previous articles where we discussed the theory for monocular camera calibration. This is an important precursor, since before we can run the stereo calibration, we have to calibrate each camera individually using monocular calibration methods.

Other than this, it is also important to have some idea of the work-flow of how non-linear optimization is performed using ceres-solver since the goal is to handle most of the logic and mathematics ourselves and use OpenCV’s provided methods as less as possible.

This following project is just going to be the culmination of these two. Also, we will only discuss some major snippets of the code (mostly the ones which are corresponding to the core logic of stereo calibration) and the full code link will be provided at the end of the article.

The dataset we are going to use is composed of chessboard images taken from different angles, 20 for each camera. It is very small (15.5 Mb) and perfect for this solo project, so you can download it from kaggle using the provided link.


The Maths behind Stereo Calibration

To be honest, there isn’t much if one is acquainted with the mathematics involved in monocular calibration. We use 3x3 rotation matrices \(\mathsf{R}\) and 3x1 translation vectors \(\mathsf{t}\) to describe how to move points between different 3D coordinate systems.

  • World-to-Left Pose (\(R_l, t_l\)): This transforms a 3D world point (\(P_w\)) into the left camera's 3D coordinate system (\(P_l\)).

$$P_l = R_l P_w + t_l$$

  • Left-to-Right Pose (\(R_s, t_s\)): This is the stereo extrinsic we want to find. It transforms a point from the left camera's system (\(P_l\)) to the right camera's system (\(P_r\)).

$$P_r = R_s P_l + t_s$$

So the main crux is to assume one of the cameras as the base and transform the other into the base’s coordinates. That is, instead of optimizing the right camera's world pose (\(R_r, t_r\)) directly, we derive it using the left camera's pose and the stereo pose.

This links all the parameters together.

  1. We start with the two equations from above: \(P_l = R_l P_w + t_l\) and \(P_r = R_s P_l + t_s\)

  2. We substitute the first equation into the second:

$$P_r = R_s (R_l P_w + t_l) + t_s$$

  1. We expand this to get the complete transformation from the world (\(P_w\)) to the right camera (\(P_r\)):

$$P_r = (R_s R_l) P_w + (R_s t_l + t_s)$$

This gives us the mathematical definition for the right camera's world pose:

  • Rotation: \(R_r = R_s R_l\) (This is a 3x3 matrix multiplication)

  • Translation: \(t_r = R_s t_l + t_s\) (A matrix-vector multiply plus a vector addition)


The Ceres Cost Function

If you have worked with ceres before or read our previous blog, it should be intuitive that we are gonna need a functor style cost function for ceres to proceed with optimization. We will have to make one for both monocular and stereo calibration, but we will show the stereo one only.

Top-level struct + constructor

struct StereoReprojectionError {
    StereoReprojectionError(const Eigen::Vector3d& world_point,
                           const Eigen::Vector2d& left_pixel,
                           const Eigen::Vector2d& right_pixel)
        : world_point_(world_point), left_pixel_(left_pixel), right_pixel_(right_pixel) {}

There isn’t much to say about this except that the world_point_ is the 3D point in the world (observed / known). The left_pixel_, right_pixel_ are the observed pixel coordinates in left and right images. Similarly, we know that an operator signature must follow :

 template <typename T>
bool operator()(const T* const left_intrinsics, const T* const left_dist_coeffs,
                const T* const right_intrinsics, const T* const right_dist_coeffs,
                const T* const stereo_rvec, const T* const stereo_tvec,
                const T* const left_rvec, const T* const left_tvec, T* residuals) const {

Build intrinsics matrices

    Eigen::Matrix<T,3,3> K_l = Eigen::Matrix<T,3,3>::Zero();
    K_l(0,0)=left_intrinsics[0]; K_l(1,1)=left_intrinsics[1];
    K_l(0,2)=left_intrinsics[2]; K_l(1,2)=left_intrinsics[3];
    K_l(0,1)=left_intrinsics[4]; K_l(2,2)=T(1.0);

    Eigen::Matrix<T,3,3> K_r = Eigen::Matrix<T,3,3>::Zero();
    K_r(0,0)=right_intrinsics[0]; K_r(1,1)=right_intrinsics[1];
    K_r(0,2)=right_intrinsics[2]; K_r(1,2)=right_intrinsics[3];
    K_r(0,1)=right_intrinsics[4]; K_r(2,2)=T(1.0);

    // Packing distortion coefficients
    Eigen::Matrix<T,5,1> dl, dr;
    dl << left_dist_coeffs[0], left_dist_coeffs[1], left_dist_coeffs[2], 
            left_dist_coeffs[3], left_dist_coeffs[4];
    dr << right_dist_coeffs[0], right_dist_coeffs[1], right_dist_coeffs[2], 
            right_dist_coeffs[3], right_dist_coeffs[4];

    // Convert world point to templated type
    Eigen::Matrix<T,3,1> wp(T(world_point_.x()), T(world_point_.y()), T(world_point_.z()));
  • Constructs the left camera intrinsic matrix \(K_l\)​ from the parameter vector. The matrix constructed is:

$$K_l = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}$$

(skew term is often zero in practice). Then we have same pattern for the right camera as well. Then we pack distortion coefficients and convert world points to templated type.

(The distortion model implemented next is the common radial+tangential model, except, the code uses the ordering [k1,k2,k3,p1,p2])

Left camera handling

T left_cam[3];
ceres::AngleAxisRotatePoint(left_rvec, wp.data(), left_cam);
left_cam[0] += left_tvec[0];
left_cam[1] += left_tvec[1];
left_cam[2] += left_tvec[2];

AngleAxisRotatePoint rotates a given input using angle-axis rotation (here left_rvec is the angle axis and wp is the point to be rotated. left_cam is the container for the output). Once the rotation is done, we add the left_tvec translation, thus effectively performing :

$$\mathbf{X}\ell = R\ell \mathbf{X}w + \mathbf{t}\ell$$

Then, we check for points behind camera or zero depth

if (left_cam[2] <= T(std::numeric_limits<double>::epsilon())) {
    residuals[0]=residuals[1]=residuals[2]=residuals[3]=T(std::numeric_limits<double>::max());
    return true;
}

If \(Z_\ell \le \epsilon\) (point in or behind imaging plane), set all residuals to a very large value and return. This effectively penalizes invalid configurations (since these residual later are gonna add up to become the error; because of invalid configurations error will suddenly so optimizer will avoid them).

Now we have to perform normalization and handle distortions :

T xl = left_cam[0]/left_cam[2], yl = left_cam[1]/left_cam[2];

T rl2 = xl*xl + yl*yl;
T rad_l = T(1.0) + dl(0)*rl2 + dl(1)*rl2*rl2 + dl(2)*rl2*rl2*rl2;
T xl_rad = xl*rad_l, yl_rad = yl*rad_l;
T xl_dist = xl_rad + T(2.0)*dl(3)*xl*yl + dl(4)*(rl2 + T(2.0)*xl*xl);
T yl_dist = yl_rad + dl(3)*(rl2 + T(2.0)*yl*yl) + T(2.0)*dl(4)*xl*yl;

$$x = \frac{X_\ell}{Z_\ell}, \quad y = \frac{Y_\ell}{Z_\ell}$$

We perform regular normalization (you must’ve seen this before in homogeneous coordinates) then we calculate the square of radius so that we can use it to further calculate radial distortions using standard formulae :

$$\text{radial} = 1 + k_1 r^2 + k_2 r^4 + k_3 r^6$$

where dl(0)=k1, dl(1)=k2, dl(2)=k3.

Coming to tangential distortion terms :

$$\begin{align} x_{\text{dist}} &= x \cdot \text{radial} + 2 p_1 x y + p_2 (r^2 + 2 x^2) \\ y_{\text{dist}} &= y \cdot \text{radial} + p_1 (r^2 + 2 y^2) + 2 p_2 x y \end{align}$$

where dl(3) is \(p_1\)​ and dl(4) is \(p_2\)​ as discussed.

Next, we have to convert these intrinsics to pixels in order to calculate residuals:

$$\begin{align} u_\ell &= f_x \cdot x_{\text{dist}} + s \cdot y_{\text{dist}} + c_x \\ v_\ell &= f_y \cdot y_{\text{dist}} + c_y \end{align}$$

For residuals :

$$\begin{align} r_{u_\ell} &= u_{\ell_{\text{obs}}} - u_{\ell_{\text{pred}}} \\ r_{v_\ell} &= v_{\ell_{\text{obs}}} - v_{\ell_{\text{pred}}} \end{align}$$

T pred_xl = K_l(0,0)*xl_dist + K_l(0,1)*yl_dist + K_l(0,2);
T pred_yl = K_l(1,1)*yl_dist + K_l(1,2);

residuals[0] = T(left_pixel_.x()) - pred_xl;
residuals[1] = T(left_pixel_.y()) - pred_yl;

Compute stereo (right) camera pose from left pose + stereo transform

// initialize R_stereo and R_left and use stereo_rvec and left_rvec angle axes to rotate them
T R_stereo[9];
ceres::AngleAxisToRotationMatrix(stereo_rvec, R_stereo);

T R_left[9];
ceres::AngleAxisToRotationMatrix(left_rvec, R_left);

// R_right = R_stereo * R_left
T R_right[9];
for (int i = 0; i < 3; ++i) {
    for (int j = 0; j < 3; ++j) {
        R_right[i * 3 + j] = T(0);
        for (int k = 0; k < 3; ++k)
            R_right[i * 3 + j] += R_stereo[i * 3 + k] * R_left[k * 3 + j];
    }
}

What we are primarily doing here is preparing \(R_{\text{stereo}}\) by : \(R_\text{right} = R_\text{stereo} \cdot R_\ell\). If you notice, this is the exact step which we talked about as the crux of the method before. There is no pre-made \(R_{\text{right}}\) as such since we have to adjust for the coordinates in the order : world → left and left → right.

R_stereo is the rotation that maps left-camera coordinates into right-camera coordinates which helps us create a single \(R_{\text{right}}\). To reiterate our initial discussing with current variables, if \(\mathbf{X}_\ell = R_\ell \mathbf{X}_w + \mathbf{t}_\ell\) (from before), then:

$$\mathbf{X}r = R\text{stereo} \mathbf{X}\ell + \mathbf{t}\text{stereo}$$

so the world→right mapping is:

$$\begin{align} \mathbf{X}\ell &= R\ell \mathbf{X}w + \mathbf{t}\ell \\ \mathbf{X}r &= R\text{stereo} \mathbf{X}\ell + \mathbf{t}\text{stereo} \\ \\ \text{After Substitution: } \\ \\ \mathbf{X}r &= R\text{stereo} (R_\ell \mathbf{X}w + \mathbf{t}\ell) + \mathbf{t}\text{stereo} \\ \mathbf{X}r &= (R\text{stereo} R\ell) \mathbf{X}w + (R\text{stereo}\mathbf{t}\ell + \mathbf{t}\text{stereo}) \end{align}$$

Now that this is established, the implementation is pretty simple :

// t_right = R_stereo * left_tvec + stereo_tvec (Just established)
T t_right[3];
t_right[0] = R_stereo[0] * left_tvec[0] + R_stereo[1] * left_tvec[1] + R_stereo[2] * left_tvec[2];
t_right[1] = R_stereo[3] * left_tvec[0] + R_stereo[4] * left_tvec[1] + R_stereo[5] * left_tvec[2];
t_right[2] = R_stereo[6] * left_tvec[0] + R_stereo[7] * left_tvec[1] + R_stereo[8] * left_tvec[2];
t_right[0] += stereo_tvec[0];
t_right[1] += stereo_tvec[1];
t_right[2] += stereo_tvec[2];

// and
// right_cam = R_right * wp + t_right
T right_cam[3];
right_cam[0] = R_right[0] * wp(0) + R_right[1] * wp(1) + R_right[2] * wp(2);
right_cam[1] = R_right[3] * wp(0) + R_right[4] * wp(1) + R_right[5] * wp(2);
right_cam[2] = R_right[6] * wp(0) + R_right[7] * wp(1) + R_right[8] * wp(2);
right_cam[0] += t_right[0];
right_cam[1] += t_right[1];
right_cam[2] += t_right[2];

if (right_cam[2] <= T(std::numeric_limits<double>::epsilon())) {
    residuals[2] = residuals[3] = T(std::numeric_limits<double>::max());
    return true;
}

This is the same pattern as before. Which means, we have to do the same thing for normalizing coordinates, handling distortions and eventually residuals as well.

T xr = right_cam[0]/right_cam[2], yr = right_cam[1]/right_cam[2];
T rr2 = xr*xr + yr*yr;
T rad_r = T(1.0) + dr(0)*rr2 + dr(1)*rr2*rr2 + dr(2)*rr2*rr2*rr2;
T xr_rad = xr*rad_r, yr_rad = yr*rad_r;
T xr_dist = xr_rad + T(2.0)*dr(3)*xr*yr + dr(4)*(rr2 + T(2.0)*xr*xr);
T yr_dist = yr_rad + dr(3)*(rr2 + T(2.0)*yr*yr) + T(2.0)*dr(4)*xr*yr;

T pred_xr = K_r(0,0)*xr_dist + K_r(0,1)*yr_dist + K_r(0,2);
T pred_yr = K_r(1,1)*yr_dist + K_r(1,2);

residuals[2] = T(right_pixel_.x()) - pred_xr;
residuals[3] = T(right_pixel_.y()) - pred_yr;

return true;

These equations identical to earlier, but with right-frame symbols.


Ceres Program Flow

Again, this will be intuitive if you’ve read the previous ceres blog or you are already accustomed to how it is used.

void runStereoOptimization(
    const std::vector<std::vector<Eigen::Vector3d>>& world_points_all_eigen,
    const std::vector<std::vector<Eigen::Vector2d>>& left_image_points_all_eigen,
    const std::vector<std::vector<Eigen::Vector2d>>& right_image_points_all_eigen,
    double left_intrinsics[5], double left_dist_coeffs[5],
    double right_intrinsics[5], double right_dist_coeffs[5],
    double stereo_rvec[3], double stereo_tvec[3],
    std::vector<std::vector<double>>& left_rvecs,
    std::vector<std::vector<double>>& left_tvecs) {

    // Check for potential issues in the data
    for (size_t i = 0; i < left_image_points_all_eigen.size(); ++i) {
        if (left_image_points_all_eigen[i].size() != right_image_points_all_eigen[i].size() ||
            left_image_points_all_eigen[i].size() != world_points_all_eigen[i].size()) {
            std::cerr << "Mismatch in point counts for image pair " << i << std::endl;
            return;
        }
    }

    ceres::Problem problem;
    ceres::LossFunction* loss_function = new ceres::HuberLoss(1.0);

    for (size_t i = 0; i < left_image_points_all_eigen.size(); ++i) {
        for (size_t j = 0; j < left_image_points_all_eigen[i].size(); ++j) {
            ceres::CostFunction* cost_function =
                new ceres::AutoDiffCostFunction<StereoReprojectionError, 4,
                                                5, 5, 5, 5, 3, 3, 3, 3>(
                    new StereoReprojectionError(world_points_all_eigen[i][j],
                                                left_image_points_all_eigen[i][j],
                                                right_image_points_all_eigen[i][j],
                                                i * left_image_points_all_eigen[i].size() + j));
            problem.AddResidualBlock(cost_function, loss_function,
                                     left_intrinsics, left_dist_coeffs,
                                     right_intrinsics, right_dist_coeffs,
                                     stereo_rvec, stereo_tvec,
                                     left_rvecs[i].data(), left_tvecs[i].data());
        }
    }

    problem.SetParameterBlockConstant(left_intrinsics);
    problem.SetParameterBlockConstant(left_dist_coeffs);
    problem.SetParameterBlockConstant(right_intrinsics);
    problem.SetParameterBlockConstant(right_dist_coeffs);

    ceres::Solver::Options options = makeDefaultCeresOptions();
    ceres::Solver::Summary summary;
    ceres::Solve(options, &problem, &summary);

    size_t num_points = 0;
    for (const auto& vec : left_image_points_all_eigen) num_points += vec.size();
    size_t num_residuals = 4 * num_points;                     // left x,y + right x,y
    double final_rms = std::sqrt(2.0 * summary.final_cost / num_residuals);

    std::cout << "Stereo Optimization: " << summary.BriefReport() << "\n"
              << "   Final stereo RMS (px): " << final_rms << "\n";
}

There are 3 main things to notice here.

First is We are using the Huber Loss function. It goes like this :

$$\text{Huber}(e, \delta) = \begin{cases} \frac{e^2}{2} & \text{if } |e| \le \delta \\ \delta(|e| - \frac{\delta}{2}) & \text{if } |e| > \delta \end{cases}$$

It gives quadratic penalties for small errors, which are smooth. But for outliers it gives linear penalties which are smooth rather than heavy. This means it is less sensitive to outlier. For instance :

If you have reprojection errors of [0.5, 1.2, 0.8, 5.0, 0.3] pixels With Huber loss (δ=1.0) :

  • 0.5 → squared penalty (small influence)

  • 1.2 → linear penalty (moderate influence)

  • 5.0 → linear penalty (same slope as 1.2, not 25x worse!)

This prevents a single bad point with 5.0 pixel error from having disproportionate influence on the final solution.

Second, the numbers inside AutoDiffCostFunction. You may already know that they represent the size of the parameters entered. But I will just specify what they are for :

  • 4: Output dimension - number of residuals per cost function (2 for each camera)

  • Parameter block dimensions (the eight 5, 5, 5, 5, 3, 3, 3, 3):

    1. 5: Left camera intrinsics (fx, fy, cx, cy, and skew) {Same for Right Camera later)

    2. 5: Left camera distortion coefficients (k1, k2, p1, p2, k3) {Same for Right Camera later}

    3. 3: Stereo rotation vector and then translation vector

    4. 3: Left camera rotation vector for current image and then translation vector for current image

Third thing to notice is that this time, we’ve blocked some parameters using SetParameterBlockConstant. This tells ceres to not optimize them. That’s because they were the output of monocular calibration of each camera and we trust the results of our own code.

Besides all this, the only thing to note is that this makeDefaultCeresOptions() is a self-defined function which I wrote in a separate utils files, and you will have to actually define ceres options as we normally do. This is the function definition :

inline ceres::Solver::Options makeDefaultCeresOptions() {
    ceres::Solver::Options opt;
    opt.max_num_iterations = 1000;
    opt.function_tolerance = 1e-8;
    opt.gradient_tolerance = 1e-8;
    opt.parameter_tolerance = 1e-8;
    opt.linear_solver_type = ceres::DENSE_SCHUR;
    return opt;
}

You can see the details of the optimization process here. Also, you can see that we are using TRUST_REGION with linear solver type DENSE_SCHUR. You can also use the Levenberg-Marquardt strategy (as we discussed previously) which is also standard for Camera Calibration, but since this dataset is small (and we are getting decent results), we can go with this.

(I won’t explain methods which are parts of the utils file except a few since they are intuitive if you have used the libraries in concern before and know the fundamentals of 3D computer vision.)


The Driver Code

Again, this might be the complete driver code, but all the code presented in this document wouldn’t suffice alone since monocular calibration, and other trivial parts are not explained here. You can check the whole code provided later.

The first thing we will discuss is data loading. Or what to do after data loading to be precise. This is important since this is the only place we are using OpenCV functions for things other than data loading.

It is easy to detect corners for the world, i.e from the chessboard image, since we know the dimensions and corner counts already. But for cameras to detect them in their coordinate system, we will have to use OpenCV’s findChessboardCorners and cornerSubPix methods. You can implement them by just following the official repo’s code but that would be highly redundant since they aren’t part of the calibration mathematics itself.

    // Defined on dataset page
    const cv::Size board_size(11, 7);
    const float square_size_mm = 30.0f;

    // Assuming data loading is done here

    std::vector<std::vector<cv::Point2f>> left_image_points_all_cv, right_image_points_all_cv;
    std::vector<std::vector<cv::Point3f>> world_points_all_cv;

    std::vector<cv::Point3f> object_points_template;
    for (int i = 0; i < board_size.height; ++i)
        for (int j = 0; j < board_size.width; ++j)
            object_points_template.emplace_back(j * square_size_mm,
                                                i * square_size_mm, 0.0f);

        std::vector<cv::Point2f> lcorn, rcorn;
        bool lfound = detectChessboardCorners(left_img,  board_size, lcorn);
        bool rfound = detectChessboardCorners(right_img, board_size, rcorn);

        if (lfound && rfound) {
            left_image_points_all_cv .push_back(lcorn);
            right_image_points_all_cv.push_back(rcorn);
            world_points_all_cv.push_back(object_points_template);
        } else {
            std::cout << "Skipping pair " << i << " (corners not in both)\n";
        }
    }

The detectChessboardCorners method I formed is an abstraction over those two and is included in a separate utils file :

inline bool detectChessboardCorners(const cv::Mat& image, const cv::Size& board_size,
                                    std::vector<cv::Point2f>& corners) {
    cv::Mat gray;
    if (image.channels() == 3) cv::cvtColor(image, gray, cv::COLOR_BGR2GRAY);
    else gray = image.clone();

    bool found = cv::findChessboardCorners(gray, board_size, corners,
        cv::CALIB_CB_ADAPTIVE_THRESH | cv::CALIB_CB_NORMALIZE_IMAGE | cv::CALIB_CB_FAST_CHECK);
    if (found) {
        cv::cornerSubPix(gray, corners, cv::Size(11, 7), cv::Size(-1, -1),
            cv::TermCriteria(cv::TermCriteria::EPS + cv::TermCriteria::COUNT, 30, 0.1));
    }
    return found;
}

Next step is to run the monocular calibration code which will give us the required matrices and vectors for stereo calibration. This is how it will go :

    Eigen::Matrix3d K_left, K_right;
    Eigen::VectorXd dist_left(5), dist_right(5);
    std::vector<Eigen::Matrix3d> R_left_all, R_right_all;
    std::vector<Eigen::Vector3d> t_left_all, t_right_all;

    std::cout << "Calibrating left camera...\n";
    calibrateMonocular(world_points_all_cv, left_image_points_all_cv,
                       K_left, dist_left, R_left_all, t_left_all);

    std::cout << "Calibrating right camera...\n";
    calibrateMonocular(world_points_all_cv, right_image_points_all_cv,
                       K_right, dist_right, R_right_all, t_right_all);

Before we proceed, our world points, and left and right camera points are all OpenCV points (in OpenCV’s format). But ceres needs them to be in Eigen vectors and matrices for optimization. So I used another utility function to convert them named cvPointsToEigen. It goes like this :

// function overloading used to handle both 2d and 3d points
inline std::vector<Eigen::Vector2d> cvPointsToEigen(const std::vector<cv::Point2f>& pts) {
    std::vector<Eigen::Vector2d> out;
    out.reserve(pts.size());
    for (const auto& p : pts) out.emplace_back(p.x, p.y);
    return out;
}

inline std::vector<Eigen::Vector3d> cvPointsToEigen(const std::vector<cv::Point3f>& pts) {
    std::vector<Eigen::Vector3d> out;
    out.reserve(pts.size());
    for (const auto& p : pts) out.emplace_back(p.x, p.y, p.z);
    return out;
}

Now we can begin the real thing.

Driver Code : Stereo Part

We start by making an initial guess for the optimization to improve with iterations. We can get a decent initial guess for the stereo rotation by averaging the relative rotations calculated from the monocular poses.

Rright=Rrel⋅Rleft Rright⋅Rleft⊤=Rrel⋅Rleft⋅Rleft⊤[ Post-multiplying with Rleft⊤ ] Rright⋅Rleft⊤=Rrel⋅Rleft⋅Rleft−1[ Rleft is orthogonal, so Rleft⊤=Rleft−1] ⟹Rrel=Rright⋅Rleft⊤

This is how we will get the relative rotations. Then we will convert them to quaternions for averaging, since quaternion averaging is numerically more stable, before converting them back to matrix.

 Eigen::Matrix3d R_stereo;
    Eigen::Vector3d T_stereo;

    std::vector<Eigen::Matrix3d> R_relative_candidates;
    for (size_t i = 0; i < R_left_all.size(); ++i) {
        const Eigen::Matrix3d& Rl = R_left_all[i];
        const Eigen::Matrix3d& Rr = R_right_all[i];

        Eigen::Matrix3d R_rel = Rr * Rl.transpose();
        R_relative_candidates.push_back(R_rel);
    }

    Eigen::Quaterniond q_avg(0,0,0,0);
    // Use the first quaternion as the reference for alignment
    Eigen::Quaterniond q_first(R_relative_candidates[0]);
    q_avg.coeffs() += q_first.coeffs();

    for (size_t i = 1; i < R_relative_candidates.size(); ++i) {
        Eigen::Quaterniond q(R_relative_candidates[i]);
        // Ensure consistent sign (antipodal "hemisphere" of the quaternion)
        if (q.coeffs().dot(q_first.coeffs()) < 0) {
            q.coeffs() = -q.coeffs();
        }
        q_avg.coeffs() += q.coeffs();
    }
    q_avg.coeffs() /= R_relative_candidates.size();
    q_avg.normalize();
    R_stereo = q_avg.toRotationMatrix();

    T_stereo << 250.0, 0.0, 0.0; 
    double baseline = T_stereo.norm();

As for the translation baseline, a simple 250mm, purely horizontal guess will suffice. There is nothing much to discuss after this as this is simple ceres code. We have to gather all the data in correct containers and data types, then put it all in the runStereoOptimization method we created earlier.

    double left_intrinsics[5]  = {K_left(0,0), K_left(1,1), K_left(0,2), K_left(1,2), K_left(0,1)};
    double right_intrinsics[5] = {K_right(0,0),K_right(1,1),K_right(0,2),K_right(1,2),K_right(0,1)};
    double left_dist_coeffs[5]  = {dist_left(0), dist_left(1), dist_left(2), dist_left(3), dist_left(4)};
    double right_dist_coeffs[5] = {dist_right(0),dist_right(1),dist_right(2),dist_right(3),dist_right(4)};

    Eigen::Vector3d stereo_rvec_e = rotationToRodrigues(R_stereo);
    double stereo_rvec[3] = {stereo_rvec_e(0), stereo_rvec_e(1), stereo_rvec_e(2)};
    double stereo_tvec[3] = {T_stereo(0), T_stereo(1), T_stereo(2)};

    std::vector<std::vector<double>> left_rvecs(left_image_points_all_eigen.size(),
                                                std::vector<double>(3));
    std::vector<std::vector<double>> left_tvecs(left_image_points_all_eigen.size(),
                                                std::vector<double>(3));

    for (size_t i = 0; i < left_image_points_all_eigen.size(); ++i) {
        Eigen::Vector3d rv = rotationToRodrigues(R_left_all[i]);
        left_rvecs[i] = {rv(0), rv(1), rv(2)};
        left_tvecs[i] = {t_left_all[i](0), t_left_all[i](1), t_left_all[i](2)};
    }

    runStereoOptimization(world_points_all_eigen,
                          left_image_points_all_eigen,
                          right_image_points_all_eigen,
                          left_intrinsics, left_dist_coeffs,
                          right_intrinsics, right_dist_coeffs,
                          stereo_rvec, stereo_tvec,
                          left_rvecs, left_tvecs);

    K_left  = intrinsicsToMatrix(left_intrinsics);
    K_right = intrinsicsToMatrix(right_intrinsics);
    dist_left  << left_dist_coeffs[0], left_dist_coeffs[1], left_dist_coeffs[2],
                  left_dist_coeffs[3], left_dist_coeffs[4];
    dist_right << right_dist_coeffs[0],right_dist_coeffs[1],right_dist_coeffs[2],
                  right_dist_coeffs[3],right_dist_coeffs[4];

    R_stereo = rodriguesToRotation(Eigen::Map<Eigen::Vector3d>(stereo_rvec));
    T_stereo = Eigen::Map<Eigen::Vector3d>(stereo_tvec);

(Note : rotationToRodrigues and intrinsicsToMatrix are from utils file).


Conclusion

If you’ve successfully implemented and executed this code, you will get the final RMS to be:

Final stereo RMS (px): 2.30629

I compared it with an straightforward OpenCV stereoCalibrate method and it reported 14.13 px. Now I suspect that the OpenCV result can also be improved if it is optimized a little through arguments, also it surely gonna be more robust, but this is just for a baseline for our little project.

As said before, you can find the whole code here along with the dataset, so you just have the create a build direct and do cmake .. && make.

Code : https://codeberg.org/SilverGrace26/stereo-calibration

If you’ve read this far and this blog has helped you, or there is any critical error or discrepancy, please comment down below.