Robotics/Sensors/Computer Vision

A digitally acquired image comes in the form of a matrix of vectors. Each of these vectors is called a pixel and represents a specific color. If the matrix is displayed with each cell filled in with the corresponding color, it creates a picture of the scene from the point of view of the camera.

Camera Calibration[edit | edit source]

Cameras are devices which are used to convert 2-d or 3-d reality into 2-d representations. While post processing can re-create 3-d representations from the initial 2-d images, the camera output is a projection of reality on a 2-d plane. Because light is used as the projection mechanism, anything that alters the path or character of the light rays from the object to the sensor plane will affect the fidelity of that initial 2-d representation on the sensor plane.

Cameras are not perfect devices. Because they are built with real-world components having real-world characteristics, their ability to represent reality is limited by the physical properties of their constituent parts. Broadly speaking, there are three kinds of distortions that can occur:

Geometric - Spatial shifting of object representations on the sensor plane due to the design and characteristics of the optical system
Radiometric – Errors in the reported “brightness values” recorded on each pixel due to variations in pixel sensitivity, occurrence of “hot” or dead pixels, and varying pixel full well capacity.
Spectral – Errors in the reported “brightness values” due to the varying response of the sensor to different wavelengths of light.

The focus of this chapter is on Geometric Distortion – its causes and remedies.

Exterior Orientation[edit | edit source]

Exterior Orientation is defined as the position (x,y,z) and angle (tip, tilt, yaw) of the camera relative to the object scene. Even with a perfect camera, distortions due to the exterior orientation will be introduced. These can easily be removed by knowing these six elements of exterior orientation and employing techniques of solid analytical geometry. Solving for unknown exterior orientation elements is covered under the topic of photogrammetry and is generally considered part of the calibration of the entire camera system (which includes its object scene environment). Camera calibration, per se, usually refers to correcting for a camera's interior orientation.

Interior Orientation[edit | edit source]

Interior Orientation is defined as the relative orientation and characteristics of components within the camera proper. These include:

Calibrated focal length – the distance between the rear nodal point of the lens and the image plane when an object at infinite distance is brought to precise focus on the image plane.
Optical properties of the lens system – the ability of the lens to alter normal “pin hole” geometry by clever optical designs. For example, telecentric lenses behave like telephoto lenses in their ability to “compress depth” and eliminate scale errors due to perspective, yet are designed for near-range applications.
Principle point – where the optical axis intersects the sensor plane (not always in the center).
Angle of the optical axis to the sensor plane. Normally, the optical axis is perpendicular to the sensor plane. If it is not, the resulting image will have a characteristic distortion. There are times, however, when this angle is deliberately altered to achieve some other objective. For example, some cameras are capable of employing the “Scheimpflug condition” - whereby the image plane, object plane, and lens plane all intersect upon a common line. Assuming the lens law is observed, this configuration ensures that all points on the object plane will be in focus, even though it is not perpendicular to the optical axis. Although this configuration preserves focus, it introduces additional geometric distortions.
Flatness of the sensor plane – more of an issue with analog film than modern digital sensors.
Lens Distortion – properties of a lens that alter the ideal “pin hole” model. Lens distortion and lens / sensor plane orientation are typically the largest contributors to geometric distortion within interior orientation.

Lens Distortion[edit | edit source]

Lens distortion causes image points to be displaced from their ideal “pin hole model” locations on the sensor plane. These displacements can be further described as:

Radial – displacements either towards or away from the center of the image. Technically, displacements are radial to the principle point location. Radial displacements can be modeled with a polynomial, where the first few terms describe most of the effect.
Tangential – displacements occur at right angles to the radial direction. Typically, tangential displacements are much smaller than radial displacements.
Asymmetric Radial or Tangential – the error functions vary (different radial functions for different radials, different tangential functions for different locations on the image plane)
Random – displacements that cannot be modeled by any mathematical process.

Ideal Pin-hole Model[edit | edit source]

The pin hole model is used to represent the ideal lens. It simply enforces the idea that rays of light travel in straight lines from the object, though the pin hole, to the image (sensor) plane.

Expensive lenses approximate this pin-hole model behavior.

File:Pinhole geometry 1.gif

Goals of Camera Calibration[edit | edit source]

The goal of camera calibration is to correct the image displacements which occur due to elements of the camera's interior orientation. There are two general approaches used for camera calibration:

Model-based approaches – specific elements of the error are modeled and corrected for.
Mapping-based approaches – the focus is on generating a comprehensive reality-to-image (or image-to-reality) mapping function, without regard to understanding the underlying contributing causes.

Model-based Approaches[edit | edit source]

With a Model-based approach, one attempts to identify a few predominant factors contributing to error, model them, measure them, and correct for them. For each contributing factor, a mathematical equation is proposed to model the error. For example, radial lens distortion can be modeled with a four-term polynomial of the form:

delta r = (k1 * r) + (k2*r^3) + (k3*r^5)+ (k4*r^7)+ (k5*r^9) + .............

Usually, the first two or three terms are sufficient to completely describe the radial error. Note that this is a model of the error, not the actual error. Models can approximate the error, but never fully correct for error effects.

After determining the appropriate model (assume we pick the first three terms above to model radial lens distortion), the next step is to determine the values of the coefficients which best model the observed error. This can be accomplished in one of two ways:

Explicit Approach – Use of targets with known (x,y), (x,y,z), or angular positions. Precision calibration frames, field approaches, stellar observations, or collimator banks can be used to generate targets of known position.
- Pros: Precise
- Cons: Expensive and time consuming
Implicit Approach – Use of objects with known geometric properties, but no known positional or angular orientation. Examples include the checkerboard approach, where the property that all the control points lie in a plane is exploited. Similarly, the Plumb-line approach exploits the fact that a series of physical plumb lines hung from a straight bar not only define a plane, but also produce a series of straight parallel lines which could be imaged. If the lines in the resulting image are not straight, the error can be attributed to lens distortion.
- Pros:
  - Relatively simple, fast, cheap
  - Simple models can usually account for most of the error
- Cons:
  - Not as precise as explicit approach, but usually good enough

Summary of Model-based Approaches[edit | edit source]

Pros

Relatively simple, fast, cheap
Only a few factors influence geometry – mostly optical configuration and lens distortion
Simple models account for most of the error

Cons

Can only remove errors represented by terms in the model
Unknown causes of error are ignored, or simply result in higher residuals

Mapping-based Approaches[edit | edit source]

With a mapping-based approach, no attempt is made to understand the individual contributing causes of error. The entire focus is on generating a comprehensive reality-to-image (or image-to-reality) mapping function. Simple rubber-sheeting would be an example of such a transformation.

For example, imagine setting up a high precision x-y plotter in front of a camera, oriented so it is perpendicular to the camera's optical axis. Next, a pin-hole light source is mounted on the plotter pen holder in such a way as it can be moved to any location in its plotter-based x-y plane. Further, imagine that for every possible position of the light source, we can capture the row and column of the single pixel illuminated.

For a VGA format image (640 columns by 480 rows), there are 307,200 pixels. If we were to drive the plotter to each of 307,200 positions, and record what real world (x,y) coordinate mapped to each and every pixel, we would have achieved the building of an explicit mapping function.

With this approach, all potential causes of errors come out in the wash – whether they're known or not. All that matters is having the explicit image to reality mapping preserved.

In reality, nobody bothers to separately illuminate 307,200 pixels. However, the process can be approximated by collecting similar measurements on several hundred patches (16 * 16 pixels in size, for example) and employing a piece-wise transformation for each patch. This process could be further automated by driving the plotter to the hundreds of control point locations.

Summary of Mapping-based Approaches[edit | edit source]

Pros

Can handle any and all distortions – even those unanticipated.
Very precise – sub-pixel accuracy
Only needs to be done once - if nothing moves in the mean time
Lends itself to automated approaches

Cons

Lots of explicit control required. Control point generator must be a “trusted source” - plotter errors would be passed on.
Expensive and time consuming

Summary of Camera Calibration[edit | edit source]

By employing either a model-based or mapping-based approach to camera calibration, most of the image displacement errors caused by elements of interior camera orientation can be removed prior to further processing.

Image Segmentation[edit | edit source]

The purpose of image segmentation is to split a source image into multiple destination images or Regions of Interest based on certain criteria. For example, it may be beneficial to find a single part out of a bin. For navigation systems, it may prove useful to extract only floor lines from an image.

Algorithm: Region Growing[edit | edit source]

Start by finding a single set pixel. Search all the pixels around it. For every set pixel around one that is set, search all the pixels around it, and so on. This algorithm is not very efficient in terms of computational power, but it does extract regions with no post-processing required.

Algorithm: Edge Detection[edit | edit source]

Begin by searching for disparities in the image. Once a disparity surpasses a certain minimum size and threshold in luminosity or other characteristic, it is an edge. After all the edges have been found, look for regions bounded by edges. The example image has very well-defined edges, making it simple to process. However, many real world images have smoother gradients, making their edges harder to detect. Edge detection is also vulnerable to many types of noise, which will disrupt the edge detection.

Algorithm: MultiScaling[edit | edit source]

MultiScaling is a useful technique whenever multiple scales of an image can be obtained. Many camera image processors can emit a thumbnail along with the main image. This low-resolution thumbnail can be used as a plan for searching the main image. Specifically, any areas in the low-resolution image lacking in pixels likely represent empty or very sparse areas in the full image. If we are willing to sacrifice detection of small blobs for speed, MultiScaling is an efficient approach.

Algorithm: Sequential Searches[edit | edit source]

The goal of a sequential search is to examine each and every pixel once and only once. When new pixels are found, they are compared against previously found groups of pixels, and inserted into that group. If a pixel is found to be in two groups, the groups must be combined.

References[edit | edit source]

Manual of Photogrammetry 4^th ed June 1980 ASPRS Pubns ISBN-10: 0937294012
Elements of Photogrammetry by Paul Wolf McGraw-Hill 1974 ISBN 0-07-071337-5
Learning OpenCV by Bradski & Kaehler O'Reilly 2008 ISBN 978-0-596-51613-0
Robot Vision By Berthold Klaus Paul Horn ISBN 0-262-08159-8

External links[edit | edit source]

people.csail.mit.edu/bkph/articles/Tsai_Revisited.pdf
http://www.ast.cam.ac.uk/~wfcsur/technical/astrometry/
http://www.vision.caltech.edu/bouguetj/calib_doc/
http://www.kwon3d.com/theory/calib.html
www.umiacs.umd.edu/~ramani/cmsc828d/lecture9.pdf
http://www.cs.cmu.edu/afs/cs/usr/rgw/www/TsaiCode.html
http://research.microsoft.com/~zhang/Calib/
www.photogrammetry.ethz.ch/general/persons/fabio/Remondino_Fraser_ISPRSV_2006.pdf
http://research.graphicon.ru/calibration/gml-c-camera-calibration-toolbox-5.html
www.debevec.org/Thesis/debevec-phdthesis-1996_ch4_calib.pdf
www.youtube.com/watch?v=S-IPz71VxGo&feature=user

A field of study known as computer vision has formed around looking for patterns in matrices of this type that correspond to certain objects, such as faces.

For a comprehensive overview of computer vision see: