I am using OpenCV for an optical measurement system. I need to carry out a perspective transformation between two images, captured by a digital camera. In the field of view of the camera I placed a set of markers (which lie in a common plane), which I use as corresponding points in both images. Using the markers' positions I can calculate the homography matrix. The problem is, that the measured object, whose images I actually want to transform is positioned in a small distance from the markers and in parallel to the markers' plane. I can measure this distance.
My question is, how to take that distance into account when calculating the homography matrix, which is necessary to perform the perspective transformation.
In my solution it is a strong requirement not to use the measured object points for calculation of homography (and that is why I need other markers in the field of view).
Please let me know if the description is not precise.

Presented in the figure is the exemplary image.
The red rectangle is the measured object. It is physically placed in a small distance behind the circular markers.
I capture images of the object from different camera's positions. The measured object can deform between each acquisition. Using circular markers, I want to transform the object's image to the same coordinates. I can measure the distance between object and markers but I do not know, how should I modify the homography matrix in order to work on the measured object (instead of the markers).
This question is quite old, but it is interesting and it might be useful to someone.
First, here is how I understood the problem presented in the question:
You have two images I1 and I2 acquired by the same digital camera at two different positions. These images both show a set of markers which all lie in a common plane pm. There is also a measured object, whose visible surface lies in a plane po parallel to the marker's plane but with a small offset. You computed the homography Hm12 mapping the markers positions in I1 to the corresponding markers positions in I2 and you measured the offset dm-o between the planes po and pm. From that, you would like to calculate the homography Ho12 mapping points on the measured object in I1 to the corresponding points in I2.
A few remarks on this problem:
First, notice that an homography is a relation between image points, whereas the distance between the markers' plane and the object's plane is a distance in world coordinates. Using the latter to infer something about the former requires to have a metric estimation of the camera poses, i.e. you need to determine the euclidian and up-to-scale relative position & orientation of the camera for each of the two images. The euclidian requirement implies that the digital camera must be calibrated, which should not be a problem for an "optical measurement system". The up-to-scale requirement implies that the true 3D distance between two given 3D points must be known. For instance, you need to know the true distance l0 between two arbitrary markers.
Since we only need the relative pose of the camera for each image, we may choose to use a 3D coordinate system centered and aligned with the coordinate system of the camera for I1. Hence, we will denote the projection matrix for I1 by P1 = K1 * [ I | 0 ]. Then, we denote the projection matrix for I2 (in the same 3D coordinate system) by P2 = K2 * [ R2 | t2 ]. We will also denote by D1 and D2 the coefficients modeling lens distortion respectively for I1 and I2.
As a single digital camera acquired both I1 and I2, you may assume that K1 = K2 = K and D1 = D2 = D. However, if I1 and I2 were acquired with a long delay between the acquisitions (or with a different zoom, etc), it will be more accurate to consider that two different camera matrices and two sets of distortion coefficients are involved.
Here is how you could approach such a problem:
The steps in order to estimate P1 and P2 are as follows:
Estimate K1, K2 and D1, D2 via calibration of the digital camera
Use D1 and D2 to correct images I1 and I2 for lens distortion, then determine the marker positions in the corrected images
Compute the fundamental matrix F12 (mapping points in I1 to epilines in I2) from the corresponding markers positions and infer the essential matrix E12 = K2T * F12 * K1
Infer R2 and t2 from E12 and one point correspondence (see this answer to a related question). At this point, you have an affine estimation of the camera poses, but not an up-to-scale one since t2 has unit norm.
Use the measured distance l0 between two arbitrary markers to infer the correct norm for t2.
For the best accuracy, you may refine P1 and P2 using a bundle adjustment, with K1 and ||t2|| fixed, based on the corresponding marker positions in I1 and I2.
At this point, you have an accurate metric estimation of the camera poses P1 = K1 * [ I | 0 ] and P2 = K2 * [ R2 | t2 ]. Now, the steps to estimate Ho12 are as follows:
Use D1 and D2 to correct images I1 and I2 for lens distortion, then determine the marker positions in the corrected images (same as 2. above, no need to re-do that) and estimate Hm12 from these corresponding positions
Compute the 3x1 vector v describing the markers' plane pm by solving this linear equation: Z * Hm12 = K2 * ( R2 - t2 * vT ) * K1-1 (see HZ00 chapter 13, result 13.5 and equation 13.2 for a reference on that), where Z is a scaling factor. Infer the distance to origin dm = ||v|| and the normal n = v / ||v||, which describe the markers' plane pm in 3D.
Since the object plane po is parallel to pm, they have the same normal n. Hence, you can infer the distance to origin do for po from the distance to origin dm for pm and from the measured plane offset dm-o, as follows: do = dm ± dm-o (the sign depends of the relative position of the planes: positive if pm is closer to the camera for I1 than po, negative otherwise).
From n and do describing the object plane in 3D, infer the homography Ho12 = K2 * ( R2 - t2 * nT / do ) * K1-1 (see HZ00 chapter 13, equation 13.2)
The homography Ho12 maps points on the measured object in I1 to the corresponding points in I2, where both I1 and I2 are assumed to be corrected for lens distortion. If you need to map points from and to the original distorted image, don't forget to use the distortion coefficients D1 and D2 to transform the input and output points of Ho12.
The reference I used:
[HZ00] "Multiple view geometry for computer vision", by R.Hartley and A.Zisserman, 2000.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With