Pixels and Resolution
|
A picture is worth a thousand words, but requires
a larger file size. |
Image pixels:
In a colour display or a printed image, a pixel (picture element)
is a dot or small area of the picture which is large enough to
carry the three attributes hue, colour saturation,
and brightness but too small to carry any detail. An electronically
produced picture is made up of an array of pixels and is normally
viewed from a distance such that the underlying structure is
no longer visible. |
|
In practice, image pixels are usually composite: the pixels of
a TV picture being composed of areas of red, green, and blue
illumination; and the pixels in a printed image being built up
from groups of cyan, yellow, magenta, and black ink spots. Any
structure within a pixel will average to the required combination
of brightness and colour when viewed from a suitable distance. |
 |
Pixel data:
In the context of photography, the information used to create
a displayed image is usually extracted from a data file. The
data file therefore contains, or can be decompressed in some
way to provide, a set of numbers representing each pixel in the
image. In an uncompressed RGB image file, a pixel is represented
by three numbers, these being related to the brightness values
for red, green, and blue at a particular point in a video display.
In an uncompressed CMYK image file, a pixel is represented by
four numbers, these being related to the amounts of cyan, magenta,
yellow, and black ink to be applied to a particular point in
a printed picture. Other representations are also possible, and
we can, of course, convert between representations for such purposes
as viewing CMYK files on a monitor screen, or printing RGB files
onto paper.
The maximum amount of detail which can be recorded in an image
file is dependent on the number of pixels, but the number of
pixels should not be used as a measure of resolution. The relationship
between pixels and actual sensor resolution will be discussed
shortly, but the most obvious way in which pixel count can become
divorced from recorded detail lies in the fact that files can
be re-sized. If a file is re-sized to reduce the number of pixels
in it, detail may be lost, but if a file is re-sized to increase
the number of pixels, new detail cannot be created (although
subjective improvements can be obtained). |
Camera pixels:
Digital cameras also have pixels. They must have because the
manufacturers say they do, and the pixel number is usually assumed
to be a measure of resolution. It is therefore tempting to think
that if the pixel size of a file produced by a camera is set
to be the same as the number of pixels in the camera sensor,
then the native resolution of the camera will be carried through
to the final image. Unfortunately, this will not be the case,
for two very important reasons:
The resolution
of the camera sensor is not related to the resolution of the
lens.
A camera does
not necessarily have identifiable RGB pixels.
The pixels in the camera output are created by interpolation
from raw sensor data and so cannot be assumed to be directly
related to the individual sensor elements. Furthermore, cameras
do not have to have their light sensors arranged as RGB triads,
and so it does not always make sense to think of a single colour
sample as a third of a pixel. This paradox has gradually led
to a consensus within the camera industry, which is that the
pixels of cameras should be defined differently to the pixels
of files and images. Specifically:
The number
of pixels stated for a digital camera is the total number of
light sensing elements (red + green + blue).
This number is, of course, considerably larger than the number
of RGB pixels corresponding to a notional 'native' resolution,
but if a single number is to be used as an indicator of camera
quality, it is the only measure which gives a reasonably fair
comparison between the various different sensor architectures. |
Filter-mosaic cameras
Most digital cameras use a two-dimensional array of light sensors
and colour filters arranged in a pattern known as a 'Bayer mosaic'
(US Pat. No. 3971065). The filter mosaic has twice as many green
filters as it has of red and blue, i.e., an 8M pixel camera has
4M green, 2M blue, and 2M red sensors. |
 |
|
With this architecture it is not possible to identify physically
discrete RGB pixels, but the sensor data can be processed to
produce an image which has about twice as much luminance (brightness)
resolution as it has colour resolution, luminance resolution
being subjectively more important. The RGB pixel equivalent of
the luminance resolution is about equal to the number of green
sensors* if the raw image processing
is done well; so an output file size which is 'just large enough'
to capture all of the genuine detail that the camera can produce
will have up to about half as many RGB pixels as there are camera
(single colour) pixels. |
|
The normal maximum output file size for a Bayer mosaic camera
however, usually has as nearly as many RGB pixels as there are
camera pixels, RGB pixels being obtained (notionally, ignoring
processing) by going to every intersection between four camera
pixels and taking the adjacent light values (a red, a blue, and
the average of two green samples). Hence the output RGB pixels
are unique, but share samples with other pixels; and the maximum
file size should not be taken to be indicative of sensor resolution. |
 |
|
* The recipe for
simulating white light using the three additive primary colours
is 0.59G + 0.3R +0.11B. Hence the green channel, being the main
contributor, carries more luminance information than the other
colour channels. |
3-Chip cameras:
In a 3-chip camera, light from the lens is split into three,
filtered, and sent to 3 monochrome imaging chips, one each for
red, green, and blue. The native luminance resolution of such
a camera is approximately equal to the number of sensing elements
in the green channel. The colour resolution will be slightly
less than the luminance resolution because there is a limit on
the accuracy with which the three colour images can be superimposed.
The 3-chip imaging method is preferred as a superior alternative
to the Bayer mosaic for high-quality video cameras (i.e., relatively
low resolution but fast readout cameras). When used in stills
cameras however, it is the optical splitter prism rather than
the pixel number which places limitations on the ultimate sensor
resolution obtainable. Consequently, in the high-quality stills
field, as pixel numbers have increased, single chip imaging methods
have come to be preferred. |
|
The Foveon X3® sensor: |
|
The
Foveon X3 has its blue, green, and red sensor arrays in different
layers of the chip, with the sensing elements stacked in perfect
registration. Hence the sensor does have identifiable RGB pixels,
even though the number of 'camera' pixels will normally be reported
as the total of red + green + blue sensor elements. |

Illustration by kind courtesy of Foveon Inc.
All rights reserved. |
The X3 architecture eliminates many of the interpolation artifacts
associated with filter mosaic cameras, and gives the same resolution
in all 3 colour channels. The sensitivity, and hence signal to
noise ratio, is also greater than that of a comparable mosaic
sensor, because the entire area of the array is used in sampling
all three colours. The output file size which just large enough
to capture all of the detail the sensor can produce has the same
number of pixels as there are light sensing elements in any of
the layers; but due to the equal luminance and colour resolutions,
and freedom from artifacts, the potential detail rendering capability
is at least as good as and generally better than that of a mosaic
sensor of the same size having the same number of camera (single
colour) pixels.
For more information, see: http://www.x3f.info/technotes/x3pixel/pixelpage.html
and: www.foveon.com/X3_tech.html
Line-scan cameras (scanners):
In a typical scanner, a portion of the image is focused onto
a strip-sensor, and the sensor is moved in small steps in order
to acquire the whole image. The sensor usually has 3 strips (one
each for red, green, and blue), each consisting of a linear array
of photosensors. An RGB image is built up line-by-line by stepping
the 3 different colour recording strips into position in turn.
Since each dot is sampled three times, generally with near-perfect
registration (in single-pass scanners at least), luminance and
colour resolutions are practically identical. If the scanner
software is set to give an output file at the scanner's native
(i.e., optical) resolution, each pixel in the output is directly
traceable to a point in the image (neglecting the various processing
operations which may be carried out between image acquisition
and file creation). |
Sensor comparisons:
In the real world, fine detail does not have to be in black and
white. Consequently, imaging devices which give equal resolution
in luminance and colour (X3 and scanners) have the potential
to give higher quality colour pictures for a given number of
'native' RGB pixels than imaging devices which have reduced colour
resolution (mosaic and 3-chip). Hence we cannot even use 'equivalent
native RGB pixels', or 'file size just big enough to capture
all of the information' as an objective measure of comparative
resolution, because the information captured varies in quality.
We can illustrate this point by comparing two hypothetical 10.2M
pixel cameras, one based on the Foveon X3 and one using a Bayer
mosaic: |
|
Sensor |
Camera pixels |
Luminance resolution
(equivalent native RGB pixels) |
Overall resolution
(equivalent native RGB pixels) |
|
X3 |
10.2M |
3.4M |
3.4M |
|
Mosaic |
10.2M |
£5.1M |
£2.55M |
If we exclude subjective criteria, the X3 obviously gives the
best resolution, but if we accept that detail is less important
in areas of high colour saturation, the two image capturing methods
are about the same. Thus defining the number of camera pixels
as the total number of light sensing elements gives a simple
method of comparison, and has the virtue that the pixels so defined
are related to physical light sensors. The problem with
simple comparisons however, is that they are sometimes far too
simple. One major limitation of the mosaic system is that it
produces interpolation artifacts (false detail), and cameras
often incorporate a blur filter in order to reduce such artifacts
to an acceptable level. A blur filter reduces resolution in the
raw camera output, and although this loss can be restored by
a process known as deconvolution, deconvolution is often too
computationally intensive to be carried out in the camera. Also,
the X3 captures more of the incident light, and so for comparable
sensor technology, collecting area, and light intensity will
give a better signal to noise ratio. One of the many ways in
which camera pixels become divorced from output pixels is that
the raw sensor data is usually subjected to a noise reduction
process. Such processing also reduces resolution slightly and
may also generate artifacts , and so low sensor noise is a definite
advantage.
On the issue of whether overall (full colour) resolution is truly
less important than luminance resolution, we can at this point
allow some consideration of lenses to creep into the discussion.
All lenses exhibit at least some minor chromatic aberration,
this being due to their inability to provide exactly the same
amount of magnification at all wavelengths in the visible spectrum.
The problem manifests itself as colour fringing in off-centre
picture detail, and is of particular relevance to underwater
photographers because it is unavoidable when using an air-corrected
lens behind an air-water boundary. It can however be substantially
reduced using software, by a process known as radial correction
(see lens correction article);
but by a parallel argument, it is also true that maximum resolution
usually cannot be obtained unless radial correction is
applied. As we move away from the centre of an image projected
by a lens, we expect any detail with a strong luminance component
(e.g., a small white spot) to be turned into a detail with strong
colour components (i.e., red, green, and blue spots either separate
or partially overlapping). In the correction process, which involves
re-mapping the red and blue images to superimpose them properly
on the green image, fine colour detail is converted into fine
luminance detail. Consequently, if we fail to preserve the same
amount of detail in all colour channels of the image recording
device, we lose some of the ability to extract fine luminance
detail from the data. |

|
From a software point of view, radial correction for lenses working
in air requires only a knowledge of the lens focal length (zoom)
setting, and to a lesser extent, the focus setting. This information
is easily recorded at the time of picture taking, and so radial
correction will eventually become part of the built-in camera
firmware or the raw-file conversion software. At the time of
writing (Nov 2004), the author has been made aware informally
that HP intends to include radial correction in its forthcoming
products, and other manufacturers can be expected to do the same.
Underwater photographers however, will still need to apply their
own corrections. |
Sensor resolution:
Resolution (resolving power) is the ability to record or represent
detail. A physically meaningful measure of resolution can be
given in terms of the maximum number of parallel lines which
can be reproduced over a given distance; a grid of lines being
the worst-case test. In a monochrome camera sensor, or a colour
imaging system which has superimposed R G and B pixels, we can
relate resolution in lines to the number of sensing elements
in a strip perpendicular to those lines by observing that lines
cannot be resolved unless there are enough sensing elements to
register all of the alternations between light and dark. Therefore,
the limit of resolution is reached at the point where there are
just two sensor elements for every line. This situation is represented
in the diagram below, which shows what happens when a test pattern
of lines is projected onto a sensor at a magnification which
makes the spacing between the lines equal to the distance between
two sensor elements. |

|
If the system is operating at something close to the maximum
resolution of the lens, the lens will blur the detail and project
a pattern of smoothly alternating light intensity onto a row
of light sensors. Each sensor records the average intensity of
the light collected over its area, and so the output will be
a row of alternating dark grey and light grey pixels. Thus, although
there is some loss of contrast caused by the lens in this case,
the output is a faithful reproduction of the original detail.
Notice however, that in the diagram above the brightness peaks
were lined up so that they fell in the middle of their respective
sensors. When the pattern is moved sideways by one half of the
pixel spacing an entirely different result is obtained: |

Now the peaks and troughs of the projected image lie on the boundaries
between the sensor elements. Hence if one sensor element records
the average of the transition from lightest to darkest, the one
next to it will record the average from darkest to lightest,
and the result will be a mid-grey in either case. Hence if we
move the camera or the test card, the contrast in the detail
recorded at maximum resolution will vary between zero and some
maximum value. Thus we may deduce that two pixels for every line
of resolution is an absolute minimum, and represents the point
at which all contrast in the resulting image may just be extinguished.
Also, we must note that there is no rule to say that the lines
in the resolution test pattern must be horizontal or vertical,
and if we want to state the resolution for lines of arbitrary
orientation, we must take the worst case, which is the number
of pixels per unit length along a line at 45° to the horizontal
or vertical. The diagonal of a 1 ´
1 square is Ö2. Therefore, if
a monochrome sensor (or a colour sensor having superimposed R
G and B pixels) has p sensor elements per mm in the horizontal
and vertical directions, the absolute maximum resolution in lines/mm
will be p/(2Ö2).
Unfortunately, we cannot use this simple notion of image capture
to deduce the resolving power of a Bayer mosaic sensor, except
to note that its luminance resolution will be, at best,
about the same as that of a monochrome camera with half as many
pixels. We cannot assume moreover, that best performance will
always be obtained, it depends on the sophistication (and hence
the cost) of the optical system and the software which converts
the raw data into RGB pixels. In particular, a blur filter is
required in order to ensure that no detail in the image projected
onto the sensor is smaller than the area of 4 camera pixels (if
a detail is too small it will fall selectively on a single colour
pixel and be recorded incorrectly as colour information), and
intensive calculation is required in order to reconstruct the
full resolution image from the raw recorded data. |
Format size:
The sensor resolution, of course, does not tell us what the overall
resolution of a camera will be. The final limiting factor is
lens resolution, which is beyond the scope of this article but
well covered elsewhere. See for example, the excellent website
of Norman Koren
www.normankoren.com
and particularly the tutorials on image sharpness and MTF
www.normankoren.com/Tutorials/MTF.html
We can however offer a general rule based on format size:
The resolving power of lenses is not infinitely scalable. Therefore
when a lens is used to project an image onto a surface, the maximum
amount of detail which can be preserved is a function of the
size of the image. Consequently, if a large format camera and
a small format camera are both used to produce a picture of the
same size, the large format camera will give the greatest resolution
in the final picture (assuming a sufficient number of pixels
and well designed lenses in both cases).
The reason is that a larger format permits a longer focal length
of the lens for a given field of view; and the aperture size
of the lens, for a given illuminating power, is a function of
the focal length. When correctly stated, an aperture value should
be written as " f/n " (focal length divided by a number).
Now, if we consider a 50mm lens operated at say f/16, the diameter
of the aperture is 50/16 = 3.125mm. If we use a small format
digital camera, we might need to use a lens with a focal length
of about 10mm to get the same view as would be obtained with
a 50mm (normal perspective) lens on the 35mm (36 x 24mm) format.
The 10mm lens operated at f/16 however will have an aperture
diameter of 10/16 = 0.625mm. This places it at a disadvantage
compared to the larger format camera, because light undergoes
a process known as diffraction when it interacts with
structures of comparable size to its wavelength. Diffraction
causes the light to spread out when it encounters obstructions,
and the result is a loss of resolution in photographic images.
Diffraction is the reason why lenses do not give progressively
more resolution as the aperture size is reduced, they give instead
a peak in resolution at intermediate apertures, and lose resolution
progressively as the aperture becomes smaller. The point is that,
for a given field of view, a small format camera with its short
focal-length lens will start to suffer from loss of resolution
due to diffraction at a wider relative aperture than a large
format camera with its long focal-length lens, i.e., while both
cameras might be operating at the same relative aperture,
say f/16, the small format camera will have a physical aperture
of (say) 0.6mm, while the large format camera will have a physical
aperture of 3.1mm. In fact, the diffraction problem is sufficiently
pronounced with small-format compact cameras that most are not
provided with the facility to use apertures smaller than f/8,
whereas 35mm format, and near-35mm format digital cameras can
be used at apertures as small as f/22 or even f/32 with suitable
lenses. |
|
Format |
Dimensions / mm |
Format diagonal
(image circle)* |
Normal perspective
focal length
= width of format ´
Ö2 |
|
35mm film full frame |
36 ´ 24 |
43.27mm |
50.9mm (50mm) |
|
APS |
25.1 ´ 16.7 |
30.15mm |
35.5mm (35mm) |
|
35mm film, cine |
24 ´ 18 |
30mm |
33.9mm (35mm) |
|
Canon EOS D30, D60, 10D |
22.7 ´ 15.1 |
27.26mm |
32.1mm |
|
Four
Thirds (4/3) |
18 ´ 13.5 |
22.5mm |
25.5mm |
|
1" |
12.8 ´ 9.6 |
16mm |
18mm |
|
2/3" |
8.8 ´ 6.6 |
11mm |
12.4mm |
|
1/1.8" |
7.18 ´ 5.32 |
8.94mm |
10.2mm (10mm) |
|
1/2" |
6.4 ´ 4.8 |
8mm |
9.1mm |
|
1/2.7" |
5.3 ´ 4.0 |
6.64mm |
7.5mm |
|
1/3" |
4.8 ´ 3.6 |
6mm |
6.8mm |
|
1/3.2" |
4.5 ´ 3.4 |
5.7mm |
6.4mm |
|
1/3.6" |
4 ´ 3 |
5mm |
5.7mm |
|
1/4" |
3.2 ´ 2.4 |
4mm |
4.5mm |
|
* focal length multiplier = 43.266615
/ (format diagonal) |
 |
1/1.8" Bayer mosaic CCD sensor.
Widely used format for digital compact cameras.
Illustration shows a 4M pixel device as used in the Olympus C-4040.
Format (imaging area) dimensions: 7.18 ´
5.32 mm.
Focal length multiplier (for 35mm format equivalent lens) is
4.84 |
DWK, Nov 2004.
(Thanks to Dick Lyon of Foveon Inc. for helpful advice and suggestions
in the writing this article).
© Cameras Underwater 2004
© 1998-2004 Foveon, Inc. Foveon X3 and the X3 logo are registered
trademarks of Foveon Inc. |
|