【图像与视频处理】笔记

Image Restoration

The task of image restoration is to recover a clean image from its corrupted observation.

Low-Light Image Enhancement

Gamma correction use a power law formula to images for pixel-wise enhancement with $I_{out} = A×I_{in}^γ$

Introduction

Image Types

Reflection Images
sense radiation that has been reflected from the surfaces of objects. The information extracted is primarily an object’s shape, texture, color, reflectivity,
- -most visible optical images，radar images, sonar images, electron microscope images.
Emission Images
the objects being imaged are self-luminous. The information may reveal the internal structure of an object.
thermal or infrared images, MRI images
Absorption Images
yield information about the internal structure of objects. The radiation passes through objects and is absorbed or partially absorbed.
X-ray images, certain types of sonic images.

Sampling Image

Sampling is the process of converting a continuous-space (or continuous-space/time) signal into a discrete-space (or discrete-space/time) signal.

The number of rows and columns in a sampled image is also often selected to be a power of 2.

to simplify computer addressing of the samples
to make algorithms, such as discrete Fourier transforms, efficient

Images are nearly always rectangular.

Quantization Image

Quantization is the process of converting a continuous-valued image, which has a continuous range (set of values that it can take), into a discrete-valued image, which has a discrete range.

Gray Scale

The gray level of a quantized image pixel is one of a finite set of numbers, which is the gray level range $(0, 2^B-1)$

B=1(binary images);
B=8, Each gray level occupies a byte, 8-bit depths
color images (Multivalued images) require 24 bits per pixel

Color Image

RGB(Red, Green, Blue), color cameras, display systems
YIQ(luminance, in-phase chromatic, quadratic chromatic), broadcast television

Storage

The storage required for a single monochromatic digital still image that has (row x column) dimensions N * M and B bits of gray-level resolution is N * M * B bits.

Video

Video quantization is essentially the same as image quantization. However, video sampling involves taking samples along a new and different (time) dimension.

The human eye asks the refresh rate more than 50 frames/s

Analog video systems, such as television and monitors, represent video as a one-dimensional electrical signal and Progressively scan line by line from top to bottom.

For High-resolution computer monitors , the scan rate is 1/72 s/frame, and the refresh rate 72

Digital video is obtained either by sampling an analog video signal V(t), or by directly sampling the 3D space-time intensity distribution that is incident on a sensor.

2D spatial intensity array
3D space-time array

The data volume of digital video is usually described in terms of bandwidth or bit rate (Kilo-/Mega-/Giga- bits/s, bps). 100Mbps, Cable:1Gbps

Digital video can be compressed very effectively because of the redundancy inherent in the data, and because of an increased understanding of what components in the video stream are actually visible

Basic Image Processing

Notion

Only monochromatic images are considered.

Image f(n), n=(n1, n2), N * M (rows, columns), n1=0~~N-1, n2=0~~M-1.

The image f(n) is assumed to be quantized to K levels {0, . . . , K - 1).

Basic Gray-Level Image Processing

Operations Type

Point Operation

Point operations are defined as functions of pixel intensity only, not considering spatial information, such as a pixel’s location and the values of its neighbors

Arithmetic operation

Arithmetic operations between images of the same spatial dimensions, not considering spatial information, for noise reduction and change or motion detection.

Geometric operation

Geometric operations are functions of spatial position only, such as image translation, rotation, distortion, bend or video morph.

Image Histogram

The histogram $H_f$ of the digital image $f$ is a plot or graph of the frequency of occurrence of each gray level in $f$

The histogram $H_f$ contains no spatial information.
The histogram supplies a method of determining an image’s gray-level distribution.

AOD

AOD (average optical density) is the basic measure of an image’s overall average brightness or gray level.

AOD is a meter for estimating the center of an image’s gray-level distribution. $$ AOD(f)=\frac{1}{MN}\sum_{n_1=0}^{N-1}\sum_{n_2=0}^{M-1}f(n_1,n_2) \newline=\frac{1}{MN}\sum_{k=0}^{K-1}kH_f(k) $$

# linear Point Operation

Additive Image Offset

$$ g(n)=f(n)+L \newline h_g(k) = h_f(k-L) $$

Calibrate images to a given average brightness level. $$ g(n)=f(n)-L+\frac{K}{2} $$

Multiplicative Image Scaling

$$ g(n)=\lfloor Pf(n)+0.5\rfloor $$

multiply and rounding

Image Negative

$$ g(n)=K-1 - f(n) $$

Full-scale Histogram Stretch

full-scale histogram stretch, or contrast stretch, expands the image histogram to fill the entire available gray-scale range.

Nonlinear Point Operations

Logarithmic Point Operation

$$ g(n)=FSHS(\lfloor log(1 + f(n)) \rfloor) $$

Larger (brighter) gray levels are compressed much more severely than smaller gray levels.

dim objects in the original are now allocated a much larger percentage of the grayscale range, hence improving their visibility.

Histogram Equalization

Histogram equalization, or histogram flattening, to make an image fill the available gray-scale range, and be uniformly distributed over that range.

The idealized goal is a flat histogram. An image with a perfectly flat histogram contains the largest possible amount of information or complexity.

Steps:

get histogram of image
$H_f(k), k \in [0, K-1]$
get relative frequency (normalized histogram)
$p_f(k) = \frac{H_f(k)}{MN}, k \in [0, K-1]$
get absolute frequency (cumulative histogram)
$P_f(k) = \sum_{r=0}^kp_f(r), k \in [0, K-1]$
replace k with k'
$k^{’}=FSHS[P_f(k)], k \in [0, K-1]$

eg.

Arithmetic Operations between Images

Image sum
image difference
Pointwise image product
Pointwise image quotient

Geometric Image Operations

Image Translation
Image Rotation
Image Zoom
- nearest neighbor interpolation
- bilinear interpolation

Basic Binary Image Processing

Image Thresholding

Thresholding is most commonly and effectively applied to images that can be characterized as having bimodal histograms.

Region Labeling

A simple but powerful tool for identifying and labeling the various objects in a binary image is a process called region labeling, blob coloring, or connected component identification.

It is useful since once they are individually labeled, the objects can be separately manipulated, displayed or modified

Region Counting

A simple application of region labeling is the measurement of object area.

This can be accomplished by defining a vector c with elements c(k) that are the pixel area (pixel count) of region k

Minor Region Removal

Logical Operations

NOT
AND
OR
XOR
MAJ
returns value “1” if and only if a majority of (xl , . . . , xn ) equal “1”

dilation filter

$$ g(n)=OR[Bf(n)] $$

expands the foreground, removing bays of too-narrow width, and removing small holes

erosion filter

shrinks the foreground, removes fingers of too-narrow width, removes “l”-valued small objects. $$ g(n)=AND[Bf(n)] $$

relationship

$$ dilation(f,B)=NOT(erosion[NOT(f),B]) \newline erosion(f,B)=NOT(dilation[NOT(f),B]) $$

Erode and dilate filters have the effect of changing the sizes of objects, as well as smoothing them.

Erode and dilate shrink and expand the sizes of “l”- valued objects in a binary image. However, they are not inverse operations of one another.

They are approximate inverses in the sense that if they are performed in sequence on the same image with the same window B, and the object and holes that are not eliminated will be returned to their approximate sizes.

open filter and close filter

size-preserving smoothing morphologic operators $$ open(f,B)=erosion[dilation(f,B),B] \newline close(f,B)=dilation[erosion(f,B),B] $$ The open and close filters are biased filters in the sense that they remove one type of “noise” (either extraneous WHITE or BLACK features), but not both.

It is worth noting that the close and open filters are again in fact, the same filters, in the dual sense. $$ open(f,B)=NOT(close[NOT(f),B]) \newline close(f,B)=NOT(open[NOT(f),B]) $$

close-open filter and open-close filter

unbiased smoothing morphologic operators $$ close-open(f,B)=close[open(f,B),B] \newline open-close(f,B)=open[close(f,B),B] $$ If the filters are properly alternated as in the construction of the close-open and open-close filters, then the dual filters become increasingly similar. However, the smoothing power can most easily be increased by simply taking the window size to be larger.

Once again, the close-open and open-close filters are dual filters under complementation.

majority filter

binary median filter, filter has similar attributes as the close-open and open-close filters:

it removes too-small objects, holes, gaps, bays and peninsulas (both “1”-valued and “0”-valued small features)
it also does not generally change the size of objects or of background.

The majority filter is less biased than any of the other morphologic filters, since it does not have an initial erode or dilate operation to set the bias.

The majority filter is a power, unbiased shape smoother. However, for a given filter size, it does not have the same degree of smoothing power as close-open or open-close.

Morphologic Boundary Detection

$$ boundary(f,B)=XOR[f,dilation(f,B)] $$

Representation & Compression

run-length coding seeks to exploit the redundancy of long run lengths or runs of constant value “1” or “0” in the binary data. – for the coding/compression of binary images containing large areas of constant value “1” and “0”.
chain coding, is appropriate for binary images containing binary contours. – The chain code is also an information-rich, highly manipulable representation for shape analysis

Linear Image Filter

Linear system theory and linear filtering play a central role in digital image and video processing.

modifying, improving, or representing digital visual data are expressed in terms of linear systems concepts.
Linear filters are used for image/video contrast improvement, denoising, and sharpening, target matching and feature enhancement

Definitions

Linear System

with the properties of superposition and homogeneity. $$ x_1(t)+x_2(t)\to y_1(t)+y_2(t) \space for \space any \space x_1(t), x_2(t) \newline ax_1(t)\to ay_1(t) \space for \space any \space a $$

Linear time invariance (LTI) system

$$ x(t)\to y(t) \newline x(t-T)\to y(t-T) $$

Two dimensional System

A two dimensional System is a process of image transformation.

Two-dimensional system is linear and shift invariance ( LSI )

The system L is linear if

for any $g_1(m,n)=L[f_1(m,n)]$ and $g_2(m,n)=L[f_2(m,n)]$, $ag_1(m,n)+bg_2(m,n) = L[af_1(m,n)+bf_2(m,n)]$ for any a and b

The system L is shift invariance if

for any( p , q) $g(m-p,n-q)=L[f(m-p,n-q)]$

Filtering System

A filtering system is a system that removes redundant or unwanted information from an information stream.

Linear filtering system, which means the filtering process between the input and output is linear operation.

In image processing, the filtering process represents the process of image enhancement, including the image/video contrast improvement, denoising, and sharpening, target matching and feature enhancement.

linear filtering in spatial domain
linear filtering in frequency domain

linear image enhancement – means a process of smoothing irregularities or noise that has somehow corrupted the image, while modifying the original image information as little as possible. – Sharping the image to highlight the details

linear image enhancement

means a process of smoothing irregularities or noise that has somehow corrupted the image, while modifying the original image information as little as possible.
Sharping the image to highlight the details

Type

Spatial domain, operating directly on the pixels of an image
Frequency domain, operating on the Fourier transform of an image, rather than on the image itself.

linear spatial filter

Filtering operations that are performed directly on the pixels of an image, and the computations performed on the pixels of the neighborhoods are linear.

The linear operations consist of multiplying each pixel in the neighborhood by a corresponding coefficient and summing the results to obtain the response at each point.

If the neighborhood is of size m×n, m×n coefficients are required, they are arranged as a matrix, called a filter, mask, filter mask, kernel, template, or window.

Moving Average Filter

Its output at a given position is the average of all pixels covered by the filter, thus it is used to blur the image or to reduce the noise.

Noise Reduction

The noise is usually modeled as an additive noise or as a multiplicative noise. We will assume a zero-mean additive white noise model.

We model the observed noisy image f as a sum of an original image o and a noise image q, $f=o+q$

The goal of enhancement is to recover an image g that resembles o as closely as possible by reducing q

Given an image f to be filtered and a window (filter mask) B, then the moving average-filtered image g is given by $$ g(n)=AVG[Bf(n)] $$ Since the average is a linear operation, it is also true that $$ g(n)=AVG[Bo(n)]+AVG[Bq(n)] $$ Because the noise process q is assumed to be zero mean, then the last term will tend to zero as the filter window is increased.

Thus, the moving average filter has the desirable effect of reducing zero-mean image noise toward zero.

However, the filter also affects the original image information. The moving average filter will blur the image, especially as the window span is increased.

Balancing this tradeoff is often a difficult task.

Sharping spatial filter

The principal objective of sharpening is to highlight transitions in intensity. Highlight the details, enhance the blurred image

Image blurring can be accomplished in the spatial domain by pixel averaging in a neighborhood. Because averaging is analogous to integration, it is logical to conclude that sharpening can be accomplished by spatial differentiation.

Fundamentally, the strength of response of a derivative operator is proportional to the degree of intensity discontinuity of the image at the point at which the operator is applied.

Thus, image differentiation enhances edges and other discontinuities (such as noise) and deemphasizes areas with slowly varying intensities.

Second-order derivative enhances fine detail much better than the first-order derivates.

The Laplacian

Isotropic filters are rotation invariant, in this sense that rotating the image and then applying the filter gives the same result as applying the filter to the image first and then rotating the result.

The simplest isotropic derivative operator is the Laplacian.

If we need to sharping an image while preserving the background features, we can simply add the Laplacian image to the original.

Laplacian contains both positive and negative values, and all the negative values are clipped at 0 by the display.

Thus it need to be scaled, a typical way is to add to it its minimum value to bring the new minimum to zero and then scale the result to the full[0, L-1] intensity range.

Linear frequency filter

In frequency domain, the operations is performed on the Fourier transform of an image.

Despite the computational efficiency of the spatial domain techniques, some image processing tasks are more easier or more meaningful to implement in the frequency domain.

Definitions

Impulse Response

the output of a system when its input is unit impulse function.

For a discrete-time systems, impulse response is generally expressed in sequence h[n]. The corresponding discrete input signal, i.e. the unit impulse function satisfies Kronecker delta function.

Two-dimensional impulse function

The impulse response of a two-dimensional input-output system L is

the response of system L, at spatial position (m, n), to an impulse located at spatial position ( p , q)
if the system L is space invariant, then $h(m-p,n-q) = L[\delta(m-p,n-q)]$

Discrete-space image

Any discrete-space image f may be expressed in terms of the impulse function. $$ f(m,n) = \sum_{p=-\infty}^\infty \sum_{q=-\infty}^\infty f(m-p,n-q)\delta(p,q) \newline= \sum_{p=-\infty}^\infty \sum_{q=-\infty}^\infty f(p,q)\delta(m-p,n-q) $$

Frequency Response

The discrete-space Fourier transform (DSFT) of the system impulse response.

According to the Fourier transform, the convolution in the space domain equals the product in the frequency domain.

$$ g(m,n) = f(m,n)*h(m,n) $$ The output of the system L can be expressed in terms of the frequency response by $$ G(u,v) = F(u,v)H(u,v) $$

Principal

Images we see are all in spatial domain, we can’t recognize the images in frequency domain, so if we need to process the image in frequency domain, we need

Transform the image to the frequency domain using Fourier transform
Filter in the frequency domain
Transform the image back to the spatial domain using inverse Fourier transform

Type

low-pass
bandpass
high-pass
oriented

For a given filter type, different degrees of smoothing (sharping) can be obtained by adjusting the filter bandwidth.

A narrower bandwidth low-pass filter will reject more of the high-frequency noise – but it may also degrade the image content by attenuating important high-frequency image details. This is a tradeoff that is difficult to balance.

Smoothing

Smoothing(blurring) is achieved in the frequency domain by high-frequency attenuation(by lowpass filtering)

Ideal

ideal low-pass filter (ideal LPF) was designed explicitly with no sidelobes in frequency domain by forcing the frequency response to be zero outside of a given radial cutoff frequency.

From the figure and the equation, we know that all frequencies on or inside a circle are passed without attenuation, whereas all frequencies outside the circle are completely attenuated(filtered out).

The point of transition between H(u,v) = 1 and H(u,v) = 0 is called cutoff frequency.

drawbacks

truncating in the frequency domain causes ringing in the space domain, which creates more of a problem because of the edge response of the ideal LPF.

Butterworth

The transfer function of a Butterworth lowpass filter(BLPF) of order n, and with cutoff frequency at a distance D0 from the origin.

The cutoff frequency defines as the point for which H(u,v) is down to 50% from its maximum value of 1.0.

Unlike the ILPF, the BLPF transfer function doesn’t have a sharp discontinuity that gives a clear cutoff between passed and filtered frequencies.

A BLPF of order 1 has no ringing in the spatial domain.

Ringing increases as a function of filter order.

BLPF of order 2 are a good compromise between effective lowpass filtering and acceptable ringing.

Gaussian

Filter sidelobes in either the space or frequency domain contribute a negative effect to the responses of noise-smoothing linear image enhancement filters.

Frequency domain sidelobes lead to noise leakage.
Space domain sidelobes lead to ringing artifacts.

Gaussian filter is a filter with sidelobes in neither domain.

impulse response

frequency response

Sharpening

Edges and other abrupt changes in intensities are associated with high- frequency components.

Image sharpening can be achieved in the frequency domain by high-pass filtering – which attenuates the low frequency components without disturbing high-frequency information in the Fourier transform.

A high-pass filter is obtained from a given lowpass filter using equation $$ H_{HP}(u,v)=1-H_{LP}(u,v) $$ That is, when low-pass filter attenuates frequencies, the high-pass filter pass them, and vice versa.

If we get the result of high-pass filter, then we can enhance an image by

Ideal

The IHPF sets to zero all frequencies inside the circle, and pass all frequencies outside the circle.

As ILPF, IHPF has the same ringing properties(for the truncating function in frequency domain)

Butterworth

Butterworth high-pass filters smoother than IHPFs.

As BLPF, the lower the order is, the less the effect of ringing of BHPF

Selective Filtering

Filters that operate over the entire frequency rectangle are called bandreject or bandpass filters
Filters that process specific bands of frequencies or small region are called notch filters

Bandreject filters

ideal, Butterworth, Gaussian

The bandreject filter could be used to reduce the cyclicity noise.

Bandpass filters

A bandpass filter is obtained from a bandreject in the same manner that we obtained a highpass filter from a lowpass filter.

Notch filters

Notch filters are the most useful of the selective filters.

A notch filter rejects(or passes) frequencies in a predefined neighborhood about the center of the frequency rectangle.

Notch reject filters are constructed as products of highpass filters whose centers have been translated to the centers of the notches.

Notch filters also used to reduce the cyclicity noise.

Although the bandreject filter also used to reduce the cyclicity noise, but it also attenuate the other part except the noise.

The notch filters only affect the noise.

Limitation

The removal of broadband noise from most images by means of linear filtering is impossible without some degradation (blurring) of the image information content.

Due to the fact that complete frequency separation between signal and broadband noise is rarely practicable.

Nonlinear Filter

Nonlinear methods effectively preserve edges and details of images, whereas methods using linear operators tend to blur and distort them.
Additionally, nonlinear image enhancement tools are less susceptible to noise.

Noise Model

The principal sources of noise in digital images arise during image acquisition and/or transmission

White Gaussian noise

The probability density function is Gaussian, and the frequency spectrum of noise is uniform.

Because of its mathematical tractability in both the spatial and frequency domain, Gaussian noise models are used frequently in practice.

Salt & pepper noise

Salt & pepper noise also called as impulse noise, the probability of impulse noise is given by

If b>a, then intensity b will appear as a light dot in image, this light dot called salt noise, intensity a will appear as a dark dot, called pepper noise.

Order-Statistic Filter

Order-statistic filters are nonlinear spatial filters whose response is based on ordering(ranking) the pixels contained in the image area encompassed by the filter, and then replacing the value of the center pixel with the value determined by the ranking result.

Max Filter

This filter is useful for reducing pepper noise (dark dot). The value of the center is replaced by the max.

May also remove some dark pixels from the borders of the dark objects.

Min Filter

This filter is useful for reducing salt noise (bright dot). The value of the center is replaced by the min.

May also remove some white points around the border of light objects.

Median Smoother

Recursive

Running medians can be extended to a recursive mode by replacing the “causal” input samples in the median smoother by previously derived output samples. The output of the recursive median smoother is given by

With the same amount of operations, recursive median smoothers have better noise attenuation capabilities than their non recursive counterparts

Given N samples x1 , . . , xN, the sample mean and sample median minimize the expression for p = 2 and p = 1, respectively.

The sample mean is given by the sample whose sum of square distance to all samples in the set is the smallest.

The median of an odd number of samples emerges as the sample whose sum of absolute distances to all other samples in the set is the smallest.

The analogy between the sample mean and median extends into the statistical domain of parameter estimation,

the sample mean is the maximum likelihood (ML) estimator of location of a constant parameter in Gaussian noise.
the sample median is the maximum likelihood (ML) estimator of location of a constant parameter in salt & pepper noise.

Weighted Median Smoother

Although the median is a robust estimator that possesses many optimality properties, the performance of running medians is limited by the fact that it is temporally blind. That is, all observation samples are treated equally regardless of their location within the observation window.

positive real-valued weights

Center Weighted Median Smoothers

The CWM smoother is realized by allowing only the center observation sample to be weighted. Thus, the output of the CWM smoother is given by

Weighted Median Smoother with negative weight

Positive weights WM Smoother has low-pass type filtering characteristics.

A large number of engineering applications require bandpass or high-pass frequency filtering characteristics.

there is a logical way to generalize the median to an equivalently rich class of weighted median filters that admit both positive and negative weights.

steps and example

Vector Weighted Median Filters

The weighted median filtering operation of a color image can be achieved in a number of ways, two of which we summarize below.

Marginal WM filter
Vector WM filter

The simplest approach to WM filtering a color image is to process each component independently by a scalar WM filter.

A drawback associated with this method is that different components can be strongly correlated and, if each component is processed separately, this correlation is not exploited.
In addition, since each component is filtered independently, the filter outputs can combine to produce colors not present in the original image.
The advantage of marginal processing is the computational simplicity

Application

Image Noise Cleaning
Image Zooming
Image Sharpening
Edge Detection

Image Compression

Lossless Coding

represent an image signal with the smallest possible number of bits without loss of any information
speed up transmission and minimizing storage requirements

foundation

Redundancy - correlation among the image:
Spatial correlation among neighbor pixels
Temporal correlation among video frames
Spectral correlation between image samples

Standards for lossless compression:

Lossless JPEG standard
Facsimile compression standards
JBIG compression standard

$$ template $$

$template$