Modelling Camera Residual Terms Using Reprojection Error and Photometric Error June 26 2017 Pranav Ganti.

1 Modelling Camera Residual Terms Using Reprojection Erro...
Author: Shona Clarke
0 downloads 2 Views

1 Modelling Camera Residual Terms Using Reprojection Error and Photometric ErrorJune Pranav Ganti

2 Outline Overview Geometry Cameras – Part 1 Multi-view GeometryReprojection Error Cameras – Part 2 Photometric Error Application (SVO)

3 Overview Residual Difference between the observed value and estimated value Cost/ Loss function is the function to be minimized Generally a function of the residual Camera residuals Formulation depends on indirect vs direct methods A value to be minimized, which can estimate the camera pose.

4 Geometry | Euclidean SpaceEuclidean geometry describes: Lines Circles Angles Issue: ∞ How do we represent points at infinity?

5 Geometry | Projective SpaceEuclidean Space + ideal points Ideal points: points at infinity. Now, 2 lines always meet in a point! Projective space is derived from Euclidean space by adding infinity Points at infinity can be described using homogeneous coordinates

6 Geometry | Homogeneous CoordinatesHomogeneous coordinates in 𝑹 𝑛 written as an 𝑛+1 vector 𝑅 2 : π‘₯, 𝑦, 1 𝑇 , 𝑅 3 : π‘₯, 𝑦, 𝑧, 1 𝑇 Ideal points: 𝒙, π’š, …, 𝟎 𝑻 What about scaled points? π‘˜π‘₯, π‘˜π‘¦, π‘˜ 𝑇 is an equivalence class of π‘₯ π‘˜ , 𝑦 π‘˜ , 1 𝑇 - (we’ll revisit why later!) Euclidean space can be extended to projective space using homogeneous vectors

7 Geometry | TransformationsEuclidean Transform: Rotation + Translation Affine transform: Rotation + Translation + Stretching (linear scaling) For both Euclidean and Affine transforms, points at infinity remain at infinity What about a projective transform?

8 Geometry | Projective TransformationsWhat properties of an object are preserved? Shape? Angles? Lengths? Distances? Straightness? Projective transformation is any mapping that preserves straight lines.

9 Geometry | Projective Transformations…contProjective transformation is a mapping of the homogeneous coordinates Ideal points are not preserved Points at infinity are mapped to arbitrary points For computer vision, the projective space is convenient Treat 3D space as 𝑃 3 instead of 𝑅 3 Images as 𝑃 2 Useful for practical applications – even though we know points at ∞ are our own construct

10 Cameras, Part 1 | Pinhole CameraAlso known as β€œcamera obscura” First type of camera Light passes through an opening Image is reflected on the other side

11 Cameras, Part 1 | Central ProjectionCameras are a map between the 3D world and 2D image Projection: lose 1 dimension Can be mapped via central projection Ray from 3D point passes through camera center of projection (COP) Intersects image plane If 3D structure is planar, then there is no drop in dimension

12 Cameras, Part 1 | Central Projection

13 Cameras, Part 1 | Central ProjectionFor convenience: can place image plane in front of COP Image Plane 𝒇

14 Cameras, Part 1 | Central ProjectionIn essence, central projection is just mapping 𝑃 3 β†’ 𝑃 2 The camera matrix P is a 3x4 matrix of rank 3 π‘₯, 𝑦, 𝑀 𝑇 =𝑃 𝑋, π‘Œ, 𝑍, 𝑇 𝑇 π‘₯,𝑦, 𝑀 𝑇 are homogeneous coordinates of image space ( 𝑃 2 ) 𝑋, π‘Œ,𝑍,𝑇 𝑇 are homogeneous coordinates of 3D world ( 𝑃 3 )

15 Cameras, Part 1 | Rays and PointsRay passing through COP is a projected point in the image. Therefore, all points on ray can be considered equal. Rays are image points, and we can represent rays as homogeneous coordinates Need calibration to express relative Euclidean geometry between image and world. With a calibrated camera, can back-project 2 points in an image Can then determine angle between two rays

16 Cameras, Part 1 | Matrix DerivationLet’s derive the camera matrix. Assumptions: Center of projection is origin ( 𝑅 3 ) Using pinhole camera model: By similar triangles: 𝑋, π‘Œ, 𝑍 𝑇 β†’ 𝑓𝑋 𝑍 , π‘“π‘Œ 𝑍 , 𝑓

17 Cameras, Part 1 | Pinhole Recall: image plane is located at a distance equivalent to the focal length 𝑓𝑋 𝑍 , π‘“π‘Œ 𝑍 , 𝑓 𝑇 𝑋, π‘Œ, 𝑍, 1 𝑇 0, 0, 0 𝑇

18 Mapping from 𝑃 3 to 𝑃 2 using similar trianglesCameras, Part 1 | Pinhole Mapping from 𝑃 3 to 𝑃 2 using similar triangles 𝑋, π‘Œ, 𝑍, 1 𝑇 𝑓𝑋 𝑍 , π‘“π‘Œ 𝑍 , 𝑓 𝑇 0, 0, 0 𝑇

19 Cameras, Part 1 | Camera MatrixWith the Euclidean the COP: Central projection just becomes a linear map b/w homogenous coordinates Can be written as:

20 Cameras, Part 1 | Camera MatrixThe previous equation assumes image coordinates at the principal point. A more generic mapping is:

21 Cameras, Part 1 | Camera Matrix𝐾 is the camera calibration matrix Can also add a skew parameter Can then express where 𝒙 π‘π‘Žπ‘š is 𝑋, π‘Œ, 𝑍, 1 𝑇 , expressed in a coordinate frame at the COP. s

22 Cameras, Part 1 | Camera MatrixThe world coordinate frame is not always expressed at COP. Example: a moving camera! Coordinate frames related through a rotation and translation

23 Cameras, Part 1 | Camera MatrixThe equation can now be expressed as:

24 Cameras, Part 1 | ProjectionsForward projection: maps a point in 3D space to an image point π‘₯=𝑃𝑋 Back projection: from a point π‘₯ in an image, we can determine the set of points that map to this point. Ray in space passing through the space How can we obtain the back projection?

25 Cameras, Part 1 | Back ProjectionNull space of C is the camera center We know 2 points on each ray: COP (𝑃𝐢=0) Image point (𝑃 + π‘₯), 𝑃 + = 𝑃 𝑇 𝑃 𝑃 𝑇 βˆ’1 Why is 𝑃 + π‘₯ the second point? It projects to x! 𝑃 𝑃 + π‘₯ =𝐼π‘₯=π‘₯ The ray is then the line connecting these two points.

26 Cameras, Part 1 | Lenses Pinhole camera is ideal Not a true representation of a camera Need to correct for distortions Want images as if we were using a pinhole camera Distortion can be radial or tangential

27 Cameras, Part 1 | Lens DistortionBarrel Distortion Pincushion Distortion

28 Cameras, Part 1 | Lens CorrectionLens distortion occurs during initial projection onto image plane π‘₯ , 𝑦 are ideal, π‘₯ 𝑑 , 𝑦 𝑑 are actual π‘Ÿ is Euclidean distance 𝐿( π‘Ÿ ) is the distortion factor. Can be solved for through calibration

29 Multi-view Geometry | Epipolar GeometryMotivation: to search for corresponding points in stereo matching Baseline: Line joining camera centers Epipole: point of intersection b/w baseline and image plane Epipolar line: intersection of an epipolar plane with the image plane Epipolar plane: plane containing the baseline

30 Multi-view Geometry | Epipolar constraints

31 Multi-view Geometry | Fundamental MatrixAlgebraic representation of epipolar geometry 𝐹 represents the mapping from 𝑃 2 →𝑃, through the epipolar lines. Two steps: Map point π‘₯ to π‘₯’ Obtain 𝑙′ from joining π‘₯β€² to 𝑒′

32 Multi-view Geometry | Fundamental MatrixProperties: Correspondence: π‘₯ ′𝑇 𝐹π‘₯=0 Transpose: If 𝐹 is the matrix for camera 𝑃, 𝐹 𝑇 is the corresponding fundamental matrix for camera 𝑃′ Epipolar lines: 𝑙 β€² =𝐹π‘₯, 𝑙= 𝐹 𝑇 π‘₯β€² 𝐹𝑒=0, 𝑒 ′𝑇 𝐹=0 Methods to solve: 7 point algorithm, 8 point algorithm, RANSAC…

33 Multi-view Geometry | Stereo Cameras

34 Reprojection Error Summed squared distance between projections of 𝑋, and measured image points. Euclidean distance In 2 images

35 Reprojection Error | ApplicationsFundamental matrix MLE of 𝐹 (assuming Gaussian noise) minimizes reprojection error π‘₯ , π‘₯β€² are ideal points, and obtained from π‘₯ =𝑃𝑋. Both 𝑃 and 𝑋 can be modified to minimize this error. Recall: , and 𝑅, 𝑑 represent the camera pose in the world frame! Bundle adjustment Similar, except the intrinsic parameters can also be modified.

36 Cameras, Part 2 | PhotositesCamera sensors consist of photosites Quantifies amount of light collected The digitized information is a pixel CCD (charge-coupled device), CMOS (complementary metal-oxide semiconductor)

37 Cameras, Part 2 | Shutter Rolling Shutter Soft Global ShutterHard Global Shutter

38 Cameras, Part 2 | Intensity ImageThe resulting information from the image capture is an intensity image. Allows for use of the entire image, as opposed to just keypoints. Becomes dense, so some direct methods only use patches of interest Intensity image is defined as: Ξ© is image domain Recall previously, images were 𝑅 3 β†’ 𝑅 2

39 Photometric Error | SVO NotationNotation (from SVO) 𝐼 π‘˜βˆ’1 , 𝐼 π‘˜ : intensity images 𝑇 π‘˜,π‘˜βˆ’1 : frame transform 𝑒: image coordinate 𝑝: 3D point 𝑑 𝑒 : depth πœ‹: 𝑅 3 β†’ 𝑅 2 : camera projection model πœ‹ βˆ’1 : inverse π‘˜: camera frame of reference, or timestep π‘˜ πœ‰: twist coordinates, se(3) Relationships

40 Photometric Error | PrinciplesPhotometric error: intensity difference between pixels observing the same point in 2 scenes.

41 Photometric Error | PrinciplesIntensity residual can be computed by: Back-projecting a 2D point from the previous image. Reprojecting it into the current camera view. Looking to minimize negative log-likelihood between camera poses, using intensity residual.

42 Photometric Error | SolvingIntensity residuals are normally distributed The equation is nonlinear in 𝑇 π‘˜, π‘˜βˆ’1 , can be solved via the Gauss-Newton algorithm Incremental update: 𝑇 πœ‰ 𝑇 π‘˜,π‘˜βˆ’1 is an estimate of the relative transformation πœ‰βˆˆπ’”π’†(3)

43 Camera Residual Terms Reprojection error: Binary factor between feature and camera pose Photometric error: Unary factor (at least in SVO) No feature locations to estimate position of.

44 Application | SVO Applications Reprojection Error: Indirect VO/ SLAM Photometric Error: Direct VO/SLAM SVO (Semi-direct Visual Odometry) takes advantage of both. Initial pose estimate using direct Further refinement using indirect methods on keyframes

45 Application | SVO Indirect methods extract features, match them, and then recover camera pose (+structure) using epipolar geometry and reprojection error Pros: Robust matches even with high inter-image motion Cons: Extraction, matching, correspondence…can be quite costly Direct methods estimate camera pose (+structure) directly from intensity values and image gradients. Pros: Can use all information in image. More robust to motion blur, defocus. Can outperform indirect methods. Cons: Can also be costly, due to density.

46 Application | SVO SVO steps: In parallel:Initial pose estimate through minimizing photometric error. Relaxation through feature alignment. Further refinement through reprojection error. In parallel: Determine keyframes, extract features Estimate depth through projection model

47 Application | SVO Results:

48 Application | SVO 2.0 SVO 2.0:

49 References R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge university press, 2003. C. Forster, M. Pizzoli, and D. Scaramuzza, β€œSvo: Fast semi-direct monocular visual odometry,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on, pp. 15–22, IEEE, 2014. C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza, β€œSvo: Semidirect visual odometry for monocular and multicamera systems,” IEEE Transactions on Robotics, 2016.

50 Image References https://i.stack.imgur.com/SitTF.png Retrieved June Retrieved June https://en.wikipedia.org/wiki/Errors_and_residuals https://en.wikipedia.org/wiki/Euclidean_space https://en.wikipedia.org/wiki/Distortion_(optics) https://en.wikipedia.org/wiki/Camera_obscura