Iterative Closest Point Registration with Open3D C++

Iterative Closes Point Registration or simply ICP registration is a procedure used to align 3-Dimensional Point Clouds. It is a major step in the complete pipeline of 3D-Reconstruction and is often used for making 3D models from point clouds with different camera angles.

There are certain nuances in ICP depending on the use case, but some fundamental aspects remain the same. First, we decide a source cloud and a target cloud out of the provided two point clouds. The goal is to find the best spatial transformation that moves the source points to match the target points as closely as possible :

ICP procedure :

Initial Alignment: Start with an initial guess (e.g., using centroids or manual alignment).
Closest Point Association: For each point in the source cloud, find the closest point in the target cloud (often with nearest neighbor search).
Error Minimization: Compute a rigid transformation (rotation R and translation t) that minimizes the error between the matched points.
Iteration: Repeat steps 2–3 until convergence (error falls below a threshold or max iterations reached).

https://youtu.be/LcghboLgTiA?si=wSuYN0zpZOOUKYWx

It is fairly easy to understand the procedure, except maybe, for the third point : The Error Metric.

Before we get to that, let’s reiterate an underlying logic. The two point clouds can be said to be existing in two different coordinates. Also, we are sure that the shape and size of the point cloud is not changing during the whole process, nor is it shearing. It is just re-aligning. This means that this is a rigid transformation.

A rigid transformation can be characterized by two things, a rotation and a translation (as said in point 3).

$$p' = \mathbf{R}p + \mathbf{t}$$

where $\mathsf{R}$ is a rotation matrix (orthogonal, $\mathsf{R}^\top \mathsf{R} = \mathsf{I}$) and $\mathsf{t}$ is a translation vector.

You should keep this in mind throughout the process. From all kinds of alignment, local or global, or any registration process mentioned in this document, we are expecting a transformation matrix that looks like this :

$$T = \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

Anyways, from this, we can already guess, that for a point $p_i$ on the source point cloud and the corresponding closest point $q_i$ on the target point cloud, the loss function should be looking something like this :

$$E(R,t) = \frac{1}{N}\sum_{i=1}^{N} ||(Rp_i + t) - q_i||^2$$

(It is just MSE from regular Deep Learning).

Optimizing for this over the iterations leads to the Euclidean Distance between the source and the target’s closest points getting smaller and smaller. If we use this error metric, it is known as point-to-point ICP. However, there is also a different one (as shown in the video above), it is called point-to-plane ICP.

To understand, let’s have a look at these two planes. As you can see, they are parallel but separated by a small vertical distance (distance along normal). Point-to-plane ICP says that “we should value the normal distance between planes more than the euclidean distance between two points”.

Doing this makes it a surface fitting problem instead of just a point correspondence problem, which is also our goal. One more reason for this is that Euclidean distance as a loss penalizes heavily even when the normal distance between the two planes is almost zero. Even slight misalignment in direction can result in derailing the smooth optimization.

Instead of that, point-to-point ICP focuses on optimizing for almost zero normal distance (ideally zero) and tackles tangential alignment through global optimization. This will be more clear if you have a look at the Loss function :

$$E(R,t) = \frac{1}{N}\sum_{i=1}^{N} [\left((Rp_i + t) - q_i\right) \cdot n_i]^2$$

(Since $\mathsf{R}$ and $\mathsf{t}$ are shared across all points, the solution must balance the alignment of all points, including their tangential positions).

We won’t delve into further mathematics, since that deserves separate attention. Besides, this error metric can be handled through non-linear optimization libraries like ceres and g2o.

Both point-to-point and point-to-plane are used in difference circumstances, but point-to-plane, for the above discussed reasons is considered more robust and used more often, provided data has smooth surfaces or well-estimated normals.

Applications of pt-to-pt	Applications of pt-to-plane
Initial alignment of noisy or sparse point clouds	Applications requiring fast convergence with accuracy
Alignment of point clouds without reliable normals	Alignment of point clouds with reliable normals available
Real-time applications	High-precision applications
Rough alignment of multiple point clouds	Fine alignment of smooth surfaces

(Note that the point-to-plane version is actually converging earlier than than point-to-point one in the video).

Before we start with the code, it is important to discuss one more thing essential to our code.

Fast Point Feature Histogram

Fast point feature histogram or FPFH is a feature descriptor just like SIFT and ORB but for 3D points. It describes the local geometric surface around a 3D point by creating a fingerprint of the surface. It's robust to variations in point density and noise.

The key idea is that a point on the bunny's ear in the source cloud should have a very similar FPFH fingerprint to the corresponding point on the bunny's ear in the target cloud.

To create the "fingerprint" for a single point (let's call it $\mathsf{P}$), the algorithm does this:

Neighbors' Relationships (SPFH)
- First, it looks at all the neighbors of $\mathsf{P}$ within a specific radius.
- For each neighbor, it calculates the geometric relationship between their normals (e.g., angles and distances).
- It summarizes all these relationships into a simple histogram. This is called an SPFH (Simple Point Feature Histogram). This SPFH only describes the 1-hop neighborhood of $\mathsf{P}$.
Combine Neighborhoods (FPFH)
- This is the clever part. To get the final FPFH for point $\mathsf{P}$, the algorithm combines the SPFH of $\mathsf{P}$with the SPFHs of all its neighbors.
- This means the final FPFH fingerprint for $\mathsf{P}$ contains information about its neighbors and its neighbors' neighbors (a 2-hop neighborhood). This makes the fingerprint much more descriptive and robust.

From here onward, we will discuss the code for using Open3D for performing ICP registration on Stanford Bunny Dataset (also used in the video).

So when I was trying to do this code for myself, I was getting some substandard results. So I have applied some modifications with a lot of trial and error which resulted in better alignment. We will discuss them as we go.

First thing is to make sure that you’ve Open3D and CMake. You can install it from your package manager o compile it from source if you want to choose certain features and abstain from others. If you do compile from source, please make sure it is discoverable by CMake or enter correct Prefix paths.

Algorithm with Recovery and custom metric

First, we must create the main component, i.e the Alignment loop. We know that we are supposed to minimize the error metric (which we will later use with a squared-root, which makes it RMSE) but we also have to account for Fitness.

$$\text{Fitness} = \frac{\text{No. of Source points with correspondence}}{\text{Total No. of Source points}}$$

A good alignment means both, Fitness >= 0.9 and low RMSE (depends on scale of the images).

The usual process of ICP goes like this :

(This might be overwhelming to look at, but if you look closely this just and expanded version of what we listed in the ICP Procedure section. Also, this will get clearer as we go through the code)

As you can see, a straightforward pipeline has no step for ‘Initial Alignment‘ and assumes that the scans roughly align. However, this version is a bit different as we will see now.

Phase 1: Preparation (Loading & Pre-processing)

Before we can align anything, we must load and prepare all our individual point cloud scans.

    o3d::utility::random::Seed(0);

    double voxel_size = 0.0015;
    double icp_thresh = voxel_size * 1.5;
    double normal_radius = voxel_size * 4.0;
    double final_normal_radius = voxel_size * 2.0;

    o3d::io::ReadPointCloudOption read_opts;
    std::string root_path = "../../Datasets/bunny/data/"; 
    // I wrote this temporarily, you can edit this to the correct path to the your dataset

    std::vector<std::string> file_paths = {
        "bun000.ply", "chin.ply", "bun045.ply", "bun090.ply",
        "bun180.ply", "ear_back.ply", "bun270.ply", "bun315.ply",
        "top2.ply", "top3.ply", 
    };
    // The order in which you read the point clouds does matter. 
    // This is one of the sequences which work decently. 
    // You can find more on the internet or find the best sequence by pairwise matching 
    // in a nested for loop by comparing the same error metric we do later for multi-image.

    std::vector<std::shared_ptr<o3d::geometry::PointCloud>> pcds_processed;

    for (auto& file_path : file_paths) {
        auto pcd = std::make_shared<o3d::geometry::PointCloud>();
        std::string path = root_path + file_path;

        std::cout << "Point Cloud read: " << path << std::endl;

        auto pcd_down = pcd->VoxelDownSample(voxel_size);

        pcd_down->EstimateNormals(
            o3d::geometry::KDTreeSearchParamHybrid(normal_radius, 30));
        pcd_down->OrientNormalsConsistentTangentPlane(30);

        pcds_processed.push_back(pcd_down);
    }

We define key values that control the entire pipeline, such as voxel_size (for downsampling) and icp_thresh (for alignment). Most of these can be defined in some relation to the voxel_size. This is helpful as we won’t have to change everything again and again if we change voxel_size .

As we know, point-to-plane uses normal distance from planes for calculating rmse, but other than that, FPFH too, uses surface normal to understand and estimate local geometry. Therefore the data processing will include :

Downsampling: This is critical for performance and for creating a uniform point density, which helps with the alignment algorithms.
Estimating and Orienting Normals: We calculate the surface normals for every point and make them point "outward" consistently.

Phase 2: Sequential Alignment & Merging

We now take our list of processed clouds and "stitch" them together, one by one. The process for this has been discussed before. The code shows that :

FPFH is used for initial alignment in the ComputeInitialAlignment method.
Recovery Attempts are being made in our version since we are using RunRegistrationWithRecovery.

We have added an additional ‘attempt of recovery’ for each point cloud. The idea is that if an alignment doesn’t meet the conditions (fitness and rmse), we will try again with a lenient initial_threshold. If this still results in a bad alignment, then we will recompute the initial guess from global model. If that doesn’t work, then we will redo the global model guess with more lenient initial_threshold.

This sure increases execution time, but that doesn’t matter for a small dataset of 10 images. In return, we get a very solid result for registration. In return, the pipeline is more robust, and recovers from failure to provide good results.

The rest of the process is exactly similar to the standard one which we have discussed before.

std::shared_ptr<o3d::geometry::PointCloud> AlignAndMergeSequence(
    const std::vector<std::shared_ptr<o3d::geometry::PointCloud>>& pointclouds,
    double icp_thresh, double voxel_size) {

    auto merged_pcd_full = std::make_shared<o3d::geometry::PointCloud>(*pointclouds[0]);
    auto merged_pcd = std::make_shared<o3d::geometry::PointCloud>(*pointclouds[0]);

    Eigen::Matrix4d global_transform = Eigen::Matrix4d::Identity();

    merged_pcd->EstimateNormals(o3d::geometry::KDTreeSearchParamHybrid(voxel_size * 4, 30));

    for (size_t i = 1; i < pointclouds.size(); ++i) {
        const auto& source = *pointclouds[i];
        const auto& target_local = *pointclouds[i-1];
        const auto& target_global = *merged_pcd;

        // Finding Initial Guess
        Eigen::Matrix4d initial_guess_transform;
        try {
            initial_guess_transform = ComputeInitialAlignment(source, target_local, voxel_size);
        } catch (...) {
            // if FPFH fails, we will use Identity matrix as fallback
            initial_guess_transform = Eigen::Matrix4d::Identity();
        }

        // This is the local alignment to refine the initial_guess_tansform
        auto reg_local = o3d::pipelines::registration::RegistrationICP(
            source, target_local, icp_thresh * 2.0, initial_guess_transform,
            o3d::pipelines::registration::TransformationEstimationPointToPlane(),
            o3d::pipelines::registration::ICPConvergenceCriteria(1e-7, 1e-7, 30));

        // we use the local alignment and the global_transform to get the initial_global_guess
        Eigen::Matrix4d initial_global_guess = global_transform * reg_local.transformation_;
        std::cout << "  Local alignment fitness: " << reg_local.fitness_ << std::endl;

        // The following blocks peform alignment against the global point cloud
        auto criteria = o3d::pipelines::registration::ICPConvergenceCriteria(1e-7, 1e-7, 100);
        if (i == pointclouds.size() - 1) {
             criteria = o3d::pipelines::registration::ICPConvergenceCriteria(1e-7, 1e-7, 200);
        }

        auto reg_global = RunRegistrationWithRecovery(
            source, target_global, initial_global_guess, icp_thresh, voxel_size, criteria
        );

        global_transform = reg_global.transformation_;

        std::cout << "Final Alignment | Fitness: " << reg_global.fitness_ 
                  << " | RMSE: " << reg_global.inlier_rmse_ << std::endl;

        auto source_transformed = std::make_shared<o3d::geometry::PointCloud>(source);
        source_transformed->Transform(global_transform);

        *merged_pcd_full += *source_transformed;

        merged_pcd = merged_pcd_full->VoxelDownSample(voxel_size);
        merged_pcd->EstimateNormals(o3d::geometry::KDTreeSearchParamHybrid(voxel_size * 4, 30));
        merged_pcd->OrientNormalsConsistentTangentPlane(30);
    }

    return merged_pcd_full;
}

But to understand the modifications further, it is important to look inside the ComputeInitialAlignment and RunRegistrationWithRecovery methods.

Compute Initial Alignment Method :

Eigen::Matrix4d ComputeInitialAlignment(
    const o3d::geometry::PointCloud &source,
    const o3d::geometry::PointCloud &target,
    double voxel_size,
    double max_corr_dist_multiplier) {

    double fpfh_radius = voxel_size * 10.0;
    auto source_fpfh = o3d::pipelines::registration::ComputeFPFHFeature(source, o3d::geometry::KDTreeSearchParamHybrid(fpfh_radius, 100));
    auto target_fpfh = o3d::pipelines::registration::ComputeFPFHFeature(target, o3d::geometry::KDTreeSearchParamHybrid(fpfh_radius, 100));


    auto checker_edge = o3d::pipelines::registration::CorrespondenceCheckerBasedOnEdgeLength(0.85);

    double max_correspondence_distance = voxel_size * max_corr_dist_multiplier;
    auto checker_dist = o3d::pipelines::registration::CorrespondenceCheckerBasedOnDistance(max_correspondence_distance);

    std::vector<std::reference_wrapper<const o3d::pipelines::registration::CorrespondenceChecker>> checkers;
    checkers.push_back(checker_edge);
    checkers.push_back(checker_dist);

    auto result = o3d::pipelines::registration::RegistrationRANSACBasedOnFeatureMatching(
        source, target, *source_fpfh, *target_fpfh, true, max_correspondence_distance,
        o3d::pipelines::registration::TransformationEstimationPointToPoint(false), 4, checkers,
        o3d::pipelines::registration::RANSACConvergenceCriteria(6000000, 0.999));

    return result.transformation_;
}

Since the FPFH feature requires neighbors, we use KDTreeSearchParamHybrid to maintain a balance between radius and maximum nearest neighbors (whichever comes first). The maximum_corr_dist_multiplier is 3.0 in my code (not shown cause it is used as a default value and thus set in the .hpp file). This helps RANSAC distinguish between inliers and outliers. This is something you can try experimenting with. Other than this, there are two notable things in this method :

Edge Checker
- It examines two pairs of correspondences (e.g., src_A ↔ tgt_A and src_B ↔ tgt_B), measuring the distances between source points (dist_source) and target points (dist_target).
- Basically, it ensures that dist_source ≥ 85% of dist_target AND dist_target ≥ 85% of dist_source.
- Purpose: Validates geometric consistency between edge pairs by comparing distances between corresponding points.
Distance Checker
- This one examines only a single pair of corresponding points (e.g., src_A ↔ tgt_A), then, applies the current RANSAC transformation to the source point, and computes the Euclidean distance between the transformed source point and its target counterpart.
- The validation criteria is if the calculated distance falls below a specified threshold (typically voxel_size × 1.5).
- Purpose: Serves as an inlier verification mainly, by testing how well a transformation aligns individual point pairs. (That’s why it is used after the edge checker).

Run Registration With Recovery Method

o3d::pipelines::registration::RegistrationResult RunRegistrationWithRecovery(
    const o3d::geometry::PointCloud &source,
    const o3d::geometry::PointCloud &target,
    const Eigen::Matrix4d &initial_guess,
    double initial_thresh,
    double voxel_size,
    const o3d::pipelines::registration::ICPConvergenceCriteria& initial_criteria) {

    auto is_good_alignment = [&](const auto& res) {
        return res.fitness_ > 0.90 && res.inlier_rmse_ < voxel_size * 1.2;
    };

    // A smarter scoring function. We want high fitness AND low RMSE.
    // A simple ratio of fitness/rmse prioritizes this.
    auto get_score = [](const auto& res) {
        if (res.inlier_rmse_ < 1e-9) return std::numeric_limits<double>::max();
        return res.fitness_ / res.inlier_rmse_;
    };

    auto reg_result = o3d::pipelines::registration::RegistrationICP(
        source, target, initial_thresh, initial_guess,
        o3d::pipelines::registration::TransformationEstimationPointToPlane(),
        initial_criteria);

    if (is_good_alignment(reg_result)) {
        std::cout << "  Initial alignment successful." << std::endl;
        return reg_result;
    }

    std::cout << "  Initial alignment failed (fitness: " << reg_result.fitness_
              << ", RMSE: " << reg_result.inlier_rmse_ << "). Starting recovery..." << std::endl;

    auto best_result = reg_result;
    double best_score = get_score(best_result);

    std::vector<double> thresholds = { initial_thresh * 2.0, initial_thresh * 2.5, initial_thresh * 3.0 };
    for (double thresh : thresholds) {
        std::cout << "  [Recovery] Retrying with lenient threshold: " << thresh << std::endl;
        reg_result = o3d::pipelines::registration::RegistrationICP(
            source, target, thresh, initial_guess,
            o3d::pipelines::registration::TransformationEstimationPointToPlane(), initial_criteria);

        if (get_score(reg_result) > best_score) {
            best_result = reg_result;
            best_score = get_score(reg_result);
        }
        if (is_good_alignment(reg_result)) {
            std::cout << "  Recovery successful." << std::endl;
            return reg_result;
        }
    }

    std::cout << "  [Recovery] Retrying with new initial guess from global model..." << std::endl;
    try {
        Eigen::Matrix4d new_initial_guess = ComputeInitialAlignment(source, target, voxel_size);
        reg_result = o3d::pipelines::registration::RegistrationICP(
            source, target, initial_thresh, new_initial_guess,
            o3d::pipelines::registration::TransformationEstimationPointToPlane(), initial_criteria);

        if (get_score(reg_result) > best_score) {
            best_result = reg_result;
            best_score = get_score(reg_result);
        }
        if (is_good_alignment(reg_result)) {
            std::cout << "  Recovery successful." << std::endl;
            return reg_result;
        }
    } catch (...) {
        std::cout << "  [Recovery] Global FPFH failed." << std::endl;
    }

    // Final, aggressive "last resort" attempt.
    std::cout << "  [Recovery] Last resort: Aggressive Global FPFH with very lenient ICP..." << std::endl;
    try {
        Eigen::Matrix4d new_initial_guess = ComputeInitialAlignment(source, target, voxel_size, 5.0);
        reg_result = o3d::pipelines::registration::RegistrationICP(
            source, target, initial_thresh * 4.0, new_initial_guess,
            o3d::pipelines::registration::TransformationEstimationPointToPlane(), initial_criteria);

        if (get_score(reg_result) > best_score) {
            best_result = reg_result;
        }
        if(is_good_alignment(reg_result)){
             std::cout << "  Last resort recovery successful!" << std::endl;
             return reg_result;
        }
    } catch (...) {
        std::cout << "  [Recovery] Last resort FPFH failed." << std::endl;
    }


    std::cout << "  All recovery attempts failed. Using best low-error result found." << std::endl;
    return best_result;
}

This RunRegistrationWithRecovery function is designed with keeping robustness as the main goal.:
1. Try 1 (Standard): It first attempts a standard RegistrationICP (Point-to-Plane) with the provided guess and a tight threshold (icp_thresh). If fitness is high and rmse is low, it succeeds and returns.
2. Try 2 (Lenient): If it fails, it retries ICP several times with increasingly larger thresholds (e.g., icp_thresh * 2.0). This can help "catch" an alignment that is slightly too far away.
3. Try 3 (New Global Guess): If it still fails, it discards the local guess and runs ComputeInitialAlignment again, this time between the source and the entire target_global. It then retries ICP with this new guess.
4. Try 4 (Last Resort): As a final attempt, it runs an aggressive global alignment with a very lenient correspondence distance and retries ICP with a very large threshold.

This was it. As said, we are trading robustness for speed, but the effects are minimal for a small dataset. Also, the speed can be increased by increasing the voxel_size (since it affects the net quantity of points to process due to voxel-downsampling). My tests have shown that voxel_size 0.0015 to 0.0035 can work well. Pushing beyond distorts the result and may even lead to entirely incorrect output.

Below is the github link for the full code (which also contains poisson surface reconstruction on the same output) so you can tinker with other numbers and try on different datasets.

Codeberg: https://codeberg.org/SilverGrace26/surface-reconstruction