In this paper, we describe some central mathematical problems in medical imaging. The subject has been undergoing rapid changes driven by better hardware and software. Much of the software is based on novel methods utilizing geometric partial differential equations in conjunction with standard signal/image processing techniques as well as computer graphics facilitating man/machine interactions. As part of this enterprise, researchers have been trying to base biomedical engineering principles on rigorous mathematical foundations for the development of software methods to be integrated into complete therapy delivery systems. These systems support the more effective delivery of many image-guided procedures such as radiation therapy, biopsy, and minimally invasive surgery. We will show how mathematics may impact some of the main problems in this area, including image enhancement, registration, and segmentation.
Keywords: Medical imaging, artificial vision, smoothing, registration, segmentationMedical imaging has been undergoing a revolution in the past decade with the advent of faster, more accurate, and less invasive devices. This has driven the need for corresponding software development, which in turn has provided a major impetus for new algorithms in signal and image processing. Many of these algorithms are based on partial differential equations and curvature driven flows, which will be the main topics of this survey paper.
Mathematical models are the foundation of biomedical computing. Basing those models on data extracted from images continues to be a fundamental technique for achieving scientific progress in experimental, clinical, biomedical, and behavioral research. Today, medical images are acquired by a range of techniques across all biological scales which go far beyond the visible light photographs and microscope images of the early 20th century. Modern medical images may be considered to be geometrically arranged arrays of data samples which quantify such diverse physical phenomena as the time variation of hemoglobin deoxygenation during neuronal metabolism or the diffusion of water molecules through and within tissue. The broadening scope of imaging as a way to organize our observations of the biophysical world has led to a dramatic increase in our ability to apply new processing techniques and to combine multiple channels of data into sophisticated and complex mathematical models of physiological function and dysfunction.
A key research area is the formulation of biomedical engineering principles based on rigorous mathematical foundations in order to develop general-purpose software methods that can be integrated into complete therapy delivery systems. Such systems support the more effective delivery of many image-guided procedures such as biopsy, minimally invasive surgery, and radiation therapy.
In order to understand the extensive role of imaging in the therapeutic process and to appreciate the current usage of images before, during, and after treatment, we focus our analysis on four main components of image-guided therapy (IGT) and image-guided surgery (IGS): localization, targeting, monitoring, and control. Specifically, in medical imaging we have four key problems:
Segmentation - automated methods that create patient-specific models of relevant anatomy from images;
Registration - automated methods that align multiple data sets with each other; Visualization - the technological environment in which image-guided procedures can be displayed;Simulation - softwares that can be used to rehearse and plan procedures, evaluate access strategies, and simulate planned treatments.
In this paper, we will consider only the first two problem areas. However, it is essential to note that in modern medical imaging, we need to integrate these technologies into complete and coherent image-guided therapy delivery systems and validate these integrated systems using performance measures established in particular application areas.
We should note that in this survey we touch upon only those aspects of the mathematics of medical imaging reflecting the personal tastes (and prejudices) of the authors. Indeed, we do not discuss a number of very important techniques, such as wavelets, which have had a significant impact on imaging and signal processing; see [59] and the references therein. Several articles and books are available which describe various mathematical aspects of imaging processing, such as [66] (segmentation), [83] (curve evolution), and [70], [87] (level set methods).
Finally, it is extremely important to note that all the mathematical algorithms which we sketch lead to interactive procedures. This means that in each case there is a human user in the loop (typically a clinical radiologist) who is the ultimate judge of the utility of the procedure and who tunes the parameters either on- or off-line. Nevertheless, there is a major need for further mathematical techniques which lead to more automatic and easier-to-use medical procedures. We hope that this paper may facilitate a dialogue between the mathematical and medical imaging communities.
We briefly outline the subsequent sections of this paper.
Section 3 reviews some of the key imaging modalities. Each one has certain advantages and disadvantages, and algorithms tuned to one device may not work as well on another device. Understanding the basic physics of the given modality is often very useful in forging the best algorithm.
In Section 4, we describe some of the relevant issues in computer vision and image processing for the medical field as well as sketch some of the partial differential equation (PDE) methods that researchers have proposed to deal with these issues.
Section 5 is the heart of this survey paper. Here we describe some of the main mathematical and engineering problems connected with image processing in general and medical imaging in particular. These include image smoothing, registration, and segmentation (see Sections 5.1, 5.2, and 5.3). We show how geometric partial differential equations and variational methods may be used to address some of these problems as well as illustrate some of the various methodologies.
Finally, in Section 6, we summarize the survey as well as point out some possible future research directions.
In 1895, Roentgen discovered X-rays and pioneered medical imaging. His initial publication [82] contained a radiograph (i.e., an X-ray generated photograph) of Mrs. Roentgen’s hand; see Figure 3.1(b) . For the first time, it was possible to visualize non-invasively (i.e., not through surgery) the interior of the human body. The discovery was widely publicized in the popular press, and an “X-ray mania” immediately seized Europe and the United States [30], [80]. Within only a few months, public demonstrations were organized, commercial ventures created and innumerable medical applications investigated; see Figure 3.1(a) .The field of radiography was born with a bang! 1
X-ray radiography at the end of the 19th century.
Today, medical imaging is a routine and essential part of medicine. Pathologies can be observed directly rather than inferred from symptoms. For example, a physician can non-invasively monitor the healing of damaged tissue or the growth of a brain tumor and determine an appropriate medical response. Medical imaging techniques can also be used when planning or even while performing surgery. For example, a neurosurgeon can determine the “best” path in which to insert a needle and then verify in real time its position as it is being inserted.
Imaging technology has improved considerably since the end of the 19th century. Many different imaging techniques have been developed and are in clinical use. Because they are based on different physical principles [41], these techniques can be more or less suited to a particular organ or pathology. In practice they are complementary in that they offer different insights into the same underlying reality. In medical imaging, these different imaging techniques are called modalities.
Anatomical modalities provide insight into the anatomical morphology. They include radiography, ultrasonography or ultrasound (US, Section 3.2.1), computed tomography (CT, Section 3.2.2), and magnetic resonance imagery (MRI, Section 3.2.3). There are several derived modalities that sometimes appear under a different name, such as magnetic resonance angiography (MRA, from MRI), digital subtraction angiography (DSA, from X-rays), computed tomography angiography (CTA, from CT), etc.
Functional modalities, on the other hand, depict the metabolism of the underlying tissues or organs. They include the three nuclear medicine modalities, namely, scintigraphy, single photon emission computed tomography (SPECT) and positron emission tomography (PET, Section 3.2.4), as well as functional magnetic resonance imagery (fMRI). This list is not exhaustive, as new techniques are being added every few years as well [13]. Almost all images are now acquired digitally and integrated in a computerized picture archiving and communication system (PACS).
In our discussion below, we will give only very brief descriptions of some of the most popular modalities. For more details, a very readable treatment (together with the underlying physics) may be found in the book [43].
In this modality a transmitter sends high-frequency sound waves into the body, where they bounce off the different tissues and organs to produce distinctive patterns of echoes. These echoes are acquired by a receiver and forwarded to a computer that translates them into an image on a screen. Because ultrasound can distinguish subtle variations among soft, fluid-filled tissues, it is particularly useful in imaging the abdomen. In contrast to X-rays, ultrasound does not damage tissues with ionizing radiation. The great disadvantage of ultrasonography is that it produces very noisy images. It may therefore be hard to distinguish smaller features (such as cysts in breast imagery). Typically quite a bit of image preprocessing is required. See Figure 3.2(a) .
Examples of different image modalities.
In computed tomography (CT), a number of 2D radiographs are acquired by rotating the X-ray tube around the body of the patient. (There are several different geometries for this.) The full 3D image can then be reconstructed by computer from the 2D projections using the Radon transform [40]. Thus CT is essentially a 3D version of X-ray radiography and therefore offers high contrast between bone and soft tissue and low contrast between different soft tissues. See Figure 3.2(b) . A contrast agent (some chemical solution opaque to the X-rays) can be injected into the patient in order to artificially increase the contrast among the tissues of interest and so enhance image quality. Because CT is based on multiple radiographs, the deleterious effects of ionizing radiation should be taken into account (even though it is claimed that the dose is sufficiently low in modern devices so that this is probably not a major health risk issue). A CT image can be obtained within one breath hold, which makes CT the modality of choice for imaging the thoracic cage.
This technique relies on the relaxation properties of magnetically excited hydrogen nuclei of water molecules in the body. The patient under study is briefly exposed to a burst of radio-frequency energy, which, in the presence of a magnetic field, puts the nuclei in an elevated energy state. As the molecules undergo their normal, microscopic tumbling, they shed this energy into their surroundings in a process referred to as relaxation. Images are created from the difference in relaxation rates in different tissues. This technique was initially known as nuclear magnetic resonance (NMR), but the term “nuclear” was removed to avoid any association with nuclear radiation. 2 MRI utilizes strong magnetic fields and non-ionizing radiation in the radio frequency range and according to current medical knowledge, is harmless to patients. Another advantage of MRI is that soft tissue contrast is much better than with X-rays, leading to higher-quality images, especially in brain and spinal cord scans. See Figure 3.2(c) . Refinements have been developed, such as functional MRI (fMRI), which measures temporal variations (e.g., for detection of neural activity), and diffusion MRI, which measures the diffusion of water molecules in anisotropic tissues such as white matter in the brain.
The patient is injected with radioactive isotopes that emit particles called positrons (anti-electrons). When a positron meets an electron, the collision produces a pair of gamma ray photons having the same energy but moving in opposite directions. From the position and delay between the photon pair on a receptor, the origin of the photons can be determined. While MRI and CT can only detect anatomical changes, PET is a functional modality that can be used to visualize pathologies at the much finer molecular level. This is achieved by employing radioisotopes that have different rates of intake for different tissues. For example, the change of regional blood flow in various anatomical structures (as a measure of the injected positron emitter) can be visualized and relatively quantified. Since the patient has to be injected with radioactive material, PET is relatively invasive. The radiation dose however is similar to a CT scan. Image resolution may be poor and major preprocessing may be necessary. See Figure 3.2(d) .
Medical imaging needs highly trained technicians and clinicians to determine the details of image acquisition (e.g., choice of modality, of patient position, of an optional contrast agent, etc.), as well as to analyze the results. The dramatic increase in availability, diversity, and resolution of medical imaging devices over the last 50 years threatens to overwhelm these human experts.
For image analysis, modern image processing techniques have therefore become indispensable. Artificial systems must be designed to analyze medical datasets either in a partially or even a fully automatic manner. This is a challenging application of the field known as artificial vision (see Section 4.1). Such algorithms are based on mathematical models (see Section 4.2). In medical image analysis, as in many practical mathematical applications, numerical simulations should be regarded as the end product. The purpose of the mathematical analysis is to guarantee that the constructed algorithms will behave as desired.
Artificial Intelligence (AI) was initiated as a field in the 1950’s with the ambitious (and so-far unrealized) goal of creating artificial systems with human-like intelligence. 3 Whereas classical AI had been mostly concerned with symbolic representation and reasoning, new subfields were created as researchers embraced the complexity of the goal and realized the importance of sub-symbolic information and perception. In particular, artificial vision [32, 44, 39, 91] emerged in the 1970s with the more limited goal to mimic human vision with man-made systems (in practice, computers).
Vision is such a basic aspect of human cognition that it may superficially appear somewhat trivial, but after decades of research the scientific understanding of biological vision remains extremely fragmentary. To date, artificial vision has produced important applications in medical imaging [18] as well as in other fields, such as Earth observation, industrial automation, and robotics [91].
The human eye-brain system evolved over tens of millions of years, and at this point no artificial system is as versatile and powerful for everyday tasks. In the same way that a chess-playing program is not directly modelled after a human player, many mathematical techniques are employed in artificial vision that do not pretend to simulate biological vision. Artificial vision systems will therefore not be set within the natural limits of human perception. For example, human vision is inherently two dimensional. 4 To accommodate this limitation, radiologists must resort to visualizing only 2D planar slices of 3D medical images. An artificial system is free of that limitation and can “see” the image in its entirety. Other advantages are that artificial systems can work on very large image datasets, are fast, do not suffer from fatigue, and produce repeatable results. Because artificial vision system designers have so far been unsuccessful in incorporating high-level understanding of real-life applications, artificial systems typically complement rather than replace human experts.
Many mathematical approaches have been investigated for applications in artificial vision (e.g., fractals and self-similarity, wavelets, pattern theory, stochastic point process, random graph theory; see [42]). In particular, methods based on partial differential equations (PDEs) have been extremely popular in the past few years [20, 35]. Here we briefly outline the major concepts involved in using PDEs for image processing.
As explained in detail in [17], one can think of an image as a map I:𝔇 → ℭ; i.e., to any point x in the domain 𝔇, I associates a “color” I(x) in a color space ℭ. For ease of presentation we will mainly restrict ourselves to the case of a two-dimensional grayscale image which we can think of as a function from a domain 𝔇 = [0, 1] × [0, 1] ⊂ ℝ 2 to the unit interval ℭ = [0, 1].
The algorithms all involve solving the initial value problem for some PDE for a given amount of time. The solution to this PDE can be either the image itself at different stages of modification or some other object (such as a closed curve delineating object boundaries) whose evolution is driven by the image.
For example, introducing an artificial time t, the image can be deformed according to
∂ I ∂ t = F [ I ] ,where I(x, t):𝔇 × [0, T) → ℭ is the evolving image, ℱ is an operator which characterizes the given algorithm, and the initial condition is the input image I0. The processed image is the solution I(x, t) of the differential equation at time t. The operator ℱ usually is a differential operator, although its dependence on I may also be nonlocal.
Similarly, one can evolve a closed curve Γ ⊂ 𝔇 representing the boundaries of some planar shape (Γ need not be connected and could have several components). In this case, the operator ℱ specifies the normal velocity of the curve that it deforms. In many cases this normal velocity is a function of the curvature κ of Γ and of the image I evaluated on Γ. A flow of the form
∂ Γ ∂ t = F ( I , κ ) Nis obtained, where N is the unit normal to the curve Γ.
Very often, the deformation is obtained as the steepest descent for some energy functional. For example, the energy
ℰ(I) = ½∫∥∇I∥ 2 dxdyand its associated steepest descent, the heat equation,
correspond to the classical Gaussian smoothing (see Section 5.1.1).
The use of PDEs allows for the modelling of the crucial but poorly understood interactions between top-down and bottom-up vision. 5 In a variational framework, for example, an energy ℰ is defined globally, while the corresponding operator F will influence the image locally. Algorithms defined in terms of PDEs treat images as continuous rather than discrete objects. This simplifies the formalism, which becomes grid independent. On the other hand, models based on nonlinear PDEs may be much harder to analyze and implement rigorously.
Medical images typically suffer from one or more of the following imperfections:
low resolution (in the spatial and spectral domains), high level of noise, low contrast, geometric deformations, presence of imaging artifacts.These imperfections can be inherent to the imaging modality (e.g., X-rays offer low contrast for soft tissues, ultrasound produces very noisy images, and metallic implants will cause imaging artifacts in MRI) or the result of a deliberate trade-off during acquisition. For example, finer spatial sampling may be obtained through a longer acquisition time. However, that would also increase the probability of patient movement and thus blurring. In this paper, we will be interested only in the processing and analysis of images, and we will not be concerned with the challenging problem of designing optimal procedures for their acquisition.
Several tasks can be performed (semi)-automatically to support the eye-brain system of medical practitioners. Smoothing (Section 5.1) is the problem of simplifying the image while retaining important information. Registration (Section 5.2) is the problem of fusing images of the same region acquired from different modalities or putting in correspondence images of one patient at different times or of different patients. Finally, segmentation (Section 5.3) is the problem of isolating anatomical structures for quantitative shape analysis or visualization. The ideal clinical application should be fast, robust with regards to image imperfections, simple to use, and as automatic as possible. The ultimate goal of artificial vision is to imitate human vision, which is intrinsically subjective.
Note that for ease of presentation, the techniques we present below are applied to two-dimensional grayscale images. The majority of them, however, can be extended to higher dimensions (e.g., vector-valued volumetric images).
Smoothing is the action of simplifying an image while preserving important information. The goal is to reduce noise or useless details without introducing too much distortion so as to simplify subsequent analysis.
It was realized that the process of smoothing is closely related to that of pyramiding, which led to the notion of scale space. This was introduced by Witkin [97] and formalized in such works as [52, 46]. Basically, a scale space is a family of images S t | t ≥ 0> in which S 0 = I is the original image and S t , t > 0, represent the different levels of smoothing for some parameter t. Larger values of t correspond to more smoothing.
In Alvarez et al. [2], an axiomatic description of scale space was proposed. These axioms, which describe many of the methods in use, require that the process T t which computes the image S t = T t [I] from I should have the following properties:
Causality/Semigroup: T 0 [I] ≡ I and for all t, s ≥ 0, T t [T s [I]] = T t+s [I]. (In particular, if the image S t has been computed, all further smoothed images S s | s ≥ t> can be computed from S t , and the original image is no longer needed.)
Generator: The family S t = T t [I] is the solution of an initial value problem ∂tS t = A[S t ], in which A is a non-linear elliptic differential operator of second order.
Comparison Principle: If S 1 0 ( x ) ≤ S 2 0 ( x ) for all x ∈ 𝔇, then T t [ S 1 0 ] ≤ T t [ S 2 0 ] pointwise on 𝔇. This property is closely related to the Maximum Principle for parabolic PDEs.
Euclidean invariance: The generator A and the maps T t commute with Euclidean transformations 6 acting on the image S 0 .
The requirement that the generator A of the semigroup be an elliptic differential operator may seem strong and even arbitrary at first, but it is argued in [2] that the semigroup property, the Comparison Principle, and the requirement that A act locally make this axiom quite natural. One should note that already in [52], it is shown that in the linear case a scale space must be defined by the linear heat equation. (See our discussion below.)
If I:𝔇 → ℭ is a given image which contains a certain amount of noise, then the most straightforward way of removing this noise is to approximate I by a mollified function S; i.e., one replaces the image function I by the convolution S σ = Gσ * I, where
Gσ(x) = (2πσ 2 ) n∕2 e −∥x∥ 2 ∕2σ 2is a Gaussian kernel with covariance the diagonal matrix σ 2 Id. This mollification will smear out fluctuations in the image on scales of order σ and smaller. This technique had been in use for quite a while before it was realized 7 by Koenderink [52] that the function S 2σ = G2σ * I satisfies the linear diffusion equation
∂ S t ∂ t = Δ S t , S 0 = I .Thus, to smooth the image I the diffusion equation (5.2) is solved with initial data S 0 = I. The solution S t at time t is then the smoothed image.
The method of smoothing images by solving the heat equation has the advantage of simplicity. There are several effective ways of computing the solution S t from a given initial image S 0 = I, e.g., using the fast Fourier transform. Linear Gaussian smoothing is Euclidean invariant and satisfies the Comparison Principle. However, in practice one finds that Gaussian smoothing blurs edges. For example, if the initial image S 0 = I is the characteristic function of some smoothly bounded set Ω ⊂ 𝔇 so that it represents a black and white image with no gray regions, then for all but very small t > 0 the image S t will resemble the original image in which the sharp boundary ∂Ω has been replaced with a fuzzy region of varying shades of gray. (See Section 5.3.1 for a discussion on edges in computer vision.)
Figure 5.1(a) is a typical MRI brain image. Specular noise is usually present in such images, and so edge-preserving noise removal is essential. The result of Gaussian smoothing implemented via the linear heat equation is shown in Figure 5.1(b) . The edges are visibly smeared. Note that even though 2D slices of the 3D image are shown to accommodate human perception, the processing was actually performed in 3D and not independently on each 2D slice.
Linear smoothing smears the edges.
We now discuss several methods which have been proposed to avoid this edge blurring effect while smoothing images.
Perona and Malik [74] replaced the linear heat equation with the non-linear diffusion equation
∂ S ∂ t = div < g ( ∣ ∇ S ∣ ) ∇ S >= ∑ i , j a i j ( ∇ S ) ∇ i j 2 S a i j ( ∇ S ) = g ( ∣ ∇ S ∣ ) δ i j + g ′ ( ∣ ∇ S ∣ ) ∣ ∇ S ∣ ∇ i S ∇ j S .Here g is a non-negative function for which limp→∞g(p) = 0. The idea is to slow down the diffusion near edges, where the gradient |∇S| is large. (See Sections 5.3.1 and 5.3.2 for a description of edge detection techniques.)
The matrix aij of diffusion coefficients has two eigenvalues: one, λ ∥ , for the eigenvector ∇S, and one, λ ⊥ , for all directions perpendicular to ∇S. They are
λ ∥ = g(∣∇S∣) + g ′ (∣∇S∣)∣∇S∣, and λ ⊥ = g(∣∇S∣).While λ ⊥ is always non-negative, λ ∥ can change sign. Thus the initial value problem is ill-posed if sg′(s) + g(s) < 0, i.e., if sg(s) is decreasing, and one actually wants g(s) to vanish very quickly as s → ∞ (e.g., g(s) = e −s ). Even if solutions S t could be constructed in the ill-posed regime, they would vary strongly and unpredictably under tiny perturbations in the initial image S 0 = I, while it is not clear if the Comparison Principle would be satisfied.
In spite of these objections, numerical experiments [74] have indicated that this method actually does remove a significant amount of noise before edges are smeared out very much.
Alvarez, Lions and Morel [3] proposed a class of modifications of the Perona-Malik scheme which result in well-posed initial value problems. They replaced (5.3) with
∂ S ∂ t = h ( ∣ G σ ∗ ∇ S ∣ ) ∣ ∇ S ∣ div ∇ S ∣ ∇ S ∣ ,which can be written as
∂ S ∂ t = h ( ∣ G σ ∗ ∇ S ∣ ) ∣ ∇ S ∣ ∑ i , j b i j ( ∇ S ) ∇ i j 2 S , b i j ( ∇ S ) = ∣ ∇ S ∣ 2 δ i j − ∇ i S ∇ j S ∣ ∇ S ∣ 3 .Thus the stopping function g in (5.3) has been set equal to g(s) = 1/s, and a new stopping function h is introduced. In addition, a smoothing kernel Gσ which averages ∇S in a region of order σ is introduced. One could let Gσ be the standard Gaussian (5.1), but other choices are possible. In the limiting case σ ↘ 0, in which Gσ * ∇S simply becomes ∇S, a PDE is obtained.
Level set methods for the implementation of curvature driven flows were introduced by Osher and Sethian [71] and have proved to be a powerful numerical technique with numerous applications; see [70, 87] and the references therein.
Equation (5.4) can be rewritten in terms of the level sets of the image S. If S is smooth, and if c is a regular value of S t :𝔇 → ℭ (in the sense of Sard’s Theorem), then Γ t (c) = x ∈ 𝔇∣S t (x) = c> is a smooth curve in 𝔇. It is the boundary of the region with gray level c or less. As time goes by, the curve Γ t (c) will move about, and as long as it is a smooth curve one can define its normal velocity V by choosing any local parametrization Γ:[0, 1] × (t0, t1) → 𝔇 and declaring V to be the normal component of ∂tΓ.
If the normal component is chosen to be in the direction of ∇S (rather than −∇S), then
V = − ∂ t S ∣ ∇ S ∣ .The curvature of the curve Γ t (c) (also in the direction of ∇S) is
κ = − div ∇ S ∣ ∇ S ∣ = − S y 2 S x x − 2 S x S y S x y + S x 2 S y y ( S x 2 + S y 2 ) 3 ∕ 2 .Thus (5.4) can be rewritten as
V = h(∣Gσ ∗ ∇S∣)κ,which in the special case h ≡ 1 reduces to the curve shortening equation 8
So if h ≡ 1 and if S:𝔇 × [0, T) → ℭ is a family of images which evolve by (5.4), then the level sets Γ t (c) evolve independently of each other.
This leads to the following simple recipe for noise removal: given an initial image S 0 = I, let it evolve so that its level curves (S t ) −1 (c) satisfy the curve shortening equation (5.7). For this to occur, the function S should satisfy the Alvarez-Lions-Morel equation (5.4) with h ≡ 1, i.e.,
∂ S ∂ t = ∣ ∇ S ∣ div ∇ S ∣ ∇ S ∣ = S y 2 S x x − 2 S x S y S x y + S x 2 S y y S x 2 + S y 2 .It was shown by Evans and Spruck [25] and Chen, Giga and Goto [21] that, even though this is a highly degenerate parabolic equation, a theory of viscosity solutions can still be constructed.
The fact that level sets of a solution to (5.8) evolve independently of each other turns out to be desirable in noise reduction, since it eliminates the edge blurring effect of the linear smoothing method. E.g., if I is a characteristic function, then S t will also be a characteristic function for all t > 0.
The independent evolution of level sets also implies that besides obeying the axioms of Alvarez, Lions and Morel [2] mentioned above, this method also satisfies their axiom on:
Gray scale invariance: For any initial image S 0 = I and any monotone function ϕ:ℭ → ℭ, one has T t [ϕ ∘ I] = ϕ ∘ T t [I].
One can easily verify that (5.8) formally satisfies this axiom, and it can in fact be rigorously proven to be true in the context of viscosity solutions. See [21, 25].
There are several variations on curve shortening which will yield comparable results. Given an initial image I:𝔇 → ℭ, one can smooth it by letting its level sets move with normal velocity given by → some function of their curvature,
instead of directly setting V = κ as in curve shortening. Using (5.6), one can turn the equation V = Φ(κ) into a PDE for S. If Φ:ℭ → ℭ is monotone, then the resulting PDE for S will be degenerate parabolic, and existence and uniqueness of viscosity solutions has been shown [21, 2].
A particularly interesting choice of Φ leads to affine curve shortening [1, 2, 86, 84, 85, 9],
(where we agree to define (−κ) 1/3 = −(κ) 1/3 ).
While the evolution equation (5.9) is invariant under Euclidean transformations, the affine curve shortening equation (5.10) has the additional property that it is invariant under the action of the Special Affine group SL(2, ℝ). If Γ t is a family of curves evolving by (5.10) and A ∈ SL(2, ℝ), b ∈ ℝ 2 , then Γ ~ t = A ⋅ Γ t + b also evolves by (5.10).
Non-linear smoothing filters based on mean curvature flows respect edges much better than Gaussian smoothing; see Figure 5.1(c) for an example using the affine filter. The affine smoothing filter was implemented based on a level set formulation using the numerics suggested in [4].
Rudin et al. presented an algorithm for noise removal based on the minimization of the total first variation of S, given by
(See [54] for the details and the references therein for related work on the total variation method.) The minimization is performed under certain constraints and boundary conditions (zero flow on the boundary). The constraints they employed are zero mean value and given variance σ 2 of the noise, but other constraints can be considered as well. More precisely, if the noise is additive, the constraints are given by
∫𝔇Sdx = ∫𝔇Idx, ∫𝔇(S−I) 2 dx = 2σ 2 .Noise removal according to this method proceeds by first choosing a value for the parameter σ and then minimizing ∫∣∇S∣ subject to the constraints (5.12). For each σ > 0 there exists a unique minimizer S σ ∈ BV(𝔇) satisfying the constraints. 9 The family of images S σ | σ > 0> does not form a scale space and axioms does not satisfy the set forth by Alvarez et al. [2]. Furthermore, the smoothing parameter σ does not measure the “length scale of smoothing” in the way the parameter t in scale spaces does. Instead, the assumptions underlying this method of smoothing are more statistical. One assumes that the given image I is actually an ideal image to which a “noise term” N(x) has been added. The values N(x) at each x ∈ 𝔇 are assumed to be independently distributed random variables with average variance σ 2 .
The Euler-Lagrange equation for this problem is
div ( ∇ S ∣ ∇ S ∣ ) = λ ( S − I ) ,where λ is a Lagrange multiplier. In view of (5.6) we can write this as κ = −λ(S-I); i.e., we can interpret (5.13) as a prescribed curvature problem for the level sets of S. To find the minimizer of (5.11) with the constraints given by (5.12), start with the initial image S 0 = I and modify it according to the L 2 steepest descent flow for (5.11) with the constraint (5.12), namely
∂ S ∂ t = div ( ∇ S ∣ ∇ S ∣ ) − λ ( t ) ( S − I ) ,where λ(t) is chosen so as to preserve the second constraint in (5.12). The solution to the variational problem is given when S achieves steady state. This computation must be repeated ab initio for each new value of σ.
Image registration is the process of bringing two or more images into spatial correspondence (aligning them). In the context of medical imaging, image registration allows for the concurrent use of images taken with different modalities (e.g., MRI and CT) at different times or with different patient positions. In surgery, for example, images are acquired before (pre-operative), as well as during (intra-operative) surgery. Because of time constraints, the real-time intra-operative images have a lower resolution than the pre-operative images obtained. Moreover, deformations which occur naturally during surgery make it difficult to relate the high-resolution pre-operative image to the lower-resolution intra-operative anatomy of the patient. Image registration attempts to help the surgeon relate the two sets of images.
Registration has an extensive literature. Numerous approaches have been explored, ranging from statistics to computational fluid dynamics and various types of warping methodologies. See [58, 93] for a detailed description of the field as well as an extensive set of references, and [37, 76] for recent papers on the subject.
Registration typically proceeds in several steps. First, one decides how to measure similarity between images. One may include the similarity among pixel intensity values, as well as the proximity of predefined image features such as implanted fiducials or anatomical landmarks. 10 Next, one looks for a transformation which maximizes similarity when applied to one of the images. Often this transformation is given as the solution of an optimization problem where the transformations to be considered are constrained to be of a predetermined class C. In the case of rigid registration (Section 5.2.1), C is the set of Euclidean transformations. Soft tissues in the human body typically do not deform rigidly. For example, physical deformation of the brain occurs during neurosurgery as a result of swelling, cerebrospinal fluid loss, hemorrhage and the intervention itself. Therefore a more realistic and more challenging problem is elastic registration (Section 5.2.2), where C is the set of smooth diffeomorphisms. Due to anatomical variability, non-rigid deformation maps are also useful when comparing images from different patients.
Given some similarity measure S on images and two images I and J, the problem of rigid registration is to find a Euclidean transformation T*x = Rx + a (with R ∈ SO(3, ℝ) and a ∈ ℝ 3 ) which maximizes the similarity between I and T*(J), i.e.,
T ∗ = maximizer of S(I, T(J)) over all Euclidean transformations T.A number of standard multidimensional optimization techniques are available to solve (5.15).
Many similarity measures have been investigated [73], e.g., the normalized cross correlation
S ( I , J ) = ( I , J ) L 2 ∥ I ∥ L 2 ∥ J ∥ L 2 .Information-theoretic similarity measures are also popular. By selecting a pixel x at random with uniform probability from the domain 𝔇 and computing the corresponding grey value I(x), one can turn any image I into a random variable. If we write pI for the probability density function of the random variable I and pIJ for the joint probability density of I and J, then one can define the entropy of the difference and the mutual information of two images I and J:
S d ( I , J ) = inf < ∑ c p K ( c ) log p K ( c ) ∣ K = I − s J , s ∈ R > S mi ( I , J ) = ∑ a , b p I J ( a , b ) log p I J ( a , b ) p I ( a ) p J ( b ) .The normalized cross correlation (5.16) and the entropy of the difference (5.17) are maximized when the intensities of the two images are linearly related. In contrast, the mutual information measure (5.18) assumes only that the co-occurrence of the most probable values in the two images is maximized at registration. This relaxed assumption explains the success of mutual information in registration [57, 77].
In this section, we describe the use of optimal mass transport for elastic registration. Optimal mass transport ideas have already been used in computer vision to compare shapes [27] and have appeared in econometrics, fluid dynamics, automatic control, transportation, statistical physics, shape optimization, expert systems, and meteorology [79]. We outline below a method for constructing an optimal elastic warp as introduced in [27, 62, 10]. We describe in particular a numerical scheme for implementing such a warp for the purpose of elastic registration following [38]. The idea is that the similarity between two images is measured by their L 2 Kantorovich-Wasserstein distance, and hence finding “the best map” from one image to another then leads to an optimal transport problem.
In the approach of [38] one thinks of a grayscale image as a Borel measure 11 μ on some open domain 𝔇 ⊂ ℝ d , where, for any E ⊂ 𝔇, the “amount of white” in the image contained in E is given by μ(E). Given two images (𝔇0, μ0) and (𝔇1, μ1) one considers all maps u:𝔇0 → 𝔇1 which preserve the measures, i.e., which satisfy 12
and one is required to find the map (if it exists) which minimizes the Monge-Kantorovich transport functional (see the exact definition below). This optimal transport problem was introduced by Gaspard Monge in 1781 when he posed the question of moving a pile of soil from one site to another in a manner which minimizes transportation cost. The problem was given a modern formulation by Kantorovich [49], and so now is known as the Monge-Kantorovich problem.
More precisely, assuming that the cost of moving a mass m from x ∈ ℝ d to y ∈ ℝ d is m. Φ(x, y), where Φ:ℝ d × ℝ d → ℝ+ is called the cost function, Kantorovich defined the cost of moving the measure μ0 to the measure μ1 by the map u:𝔇0 → 𝔇1 to be
M(u) = ∫𝔇0Φ(x, u(x))dμ0(x).The infimum of M(u) taken over all measure preserving u:(𝔇0, μ0) → (𝔇1, μ1) is called the Kantorovich-Wasserstein distance between the measures μ0 and μ1. The minimization of M(u) constitutes the mathematical formulation of the Monge-Kantorovich optimal transport problem.
If the measures μi and Lebesgue measure dℒ are absolutely continuous with respect to each other so that we can write dμi = mi(x)dℒ for certain strictly positive densities mi ∈ L 1 (𝔇i, dℒ), and if the map u is a diffeomorphism from 𝔇0 to 𝔇1, then the mass preservation property (5.19) is equivalent with
m0(x) = det(Du(x)) ⋅ m1(u(x)) (for almost all x ∈ 𝔇0).The Jacobian equation (5.21) implies that if a small region in 𝔇0 is mapped to a larger region in 𝔇1, there must be a corresponding decrease in density in order to comply with mass preservation. In other words, expanding an image darkens it.
The L 2 Monge-Kantorovich problem corresponds to the cost function Φ(x, y) = ½∣x−y∣ 2 . A fundamental theoretical result for the L 2 case [12, 29, 51] is that there is a unique optimal mass-preserving u and that this u is characterized as the gradient of a convex function p, i.e., u = ∇p. General results about existence and uniqueness may be found in [6] and the references ∇ therein.
To reduce the Monge-Kantorovich cost M(u) of a map u 0 :𝔇0 → 𝔇1, in [38] the authors consider a rearrangement of the points in the domain of the map in the following sense: the initial map u 0 is replaced by a family of maps u t for which one has u t ∘ s t = u 0 for some family of diffeomorphisms s t :𝔇0 → 𝔇0. If the initial map u 0 sends the measure μ0 to μ1, and if the diffeomorphisms s t preserve the measure μ0, then the maps u t = u 0 ∘ (s t ) −1 will also send μ0 to μ1.
Any sufficiently smooth family of diffeomorphisms s t :𝔇0 → 𝔇0 is determined by its velocity field, defined by ∂ts t = v t ∘ s t .
If u t is any family of maps, then, assuming u t #μ0 = μ1 for all t, one has
d d t M ( u t ) = ∫ D 0 〈 Φ x ( x , u t ( x ) ) , v t ( x ) 〉 d μ 0 ( x ) .The steepest descent is achieved by a family s t ∈ Diff μ 0 1 ( D 0 ) , whose velocity is given by
v t = − 1 m 0 ( x ) P ( Φ x ( x , u t ( x ) ) ) .Here 𝒫 is the Helmholtz projection, which extracts the divergence-free part of vector fields on 𝔇0; i.e., for any vector field w on 𝔇 one has w = 𝒫[w] + ∇p.
From u 0 = u t ∘ s t one gets the transport equation
∂ u t ∂ t + v t ⋅ ∇ u t = 0where the velocity field is given by (5.23). In [8] it is shown that the initial value problem (5.23), (5.24) has short time existence for C 1,α initial data u 0 , and a theory of global weak solutions in the style of Kantorovich is developed.
For image registration, it is natural to take Φ(x, y) = ½∣x−y∣ 2 and 𝔇0 = 𝔇1 to be a rectangle. Extensive numerical computations show that the solution to the unregularized flow converges to a limiting map for a large choice of measures and initial maps. Indeed, in this case, we can write the minimizing flow in the following “nonlocal” form:
∂ u t ∂ t = − 1 m 0 < u t + ∇ ( − Δ ) − 1 div ( u t ) >⋅ ∇ u t .The technique has been mathematically justified in [8] to which we refer the reader for all of the relevant details.
Typically in elastic registration, one wants to see an explicit warping which smoothly deforms one image into the other. This can easily be done using the solution of the Monge-Kantorovich problem. Thus, we assume now that we have applied our gradient descent process as described above and that it has converged to the Monge-Kantorovich mapping u ~ M K .
Following the work of [10, 62] (see also [29]), we consider the following related problem:
inf ∫ ∫ 0 1 m ( t , x ) ∥ v ( t , x ) ∥ 2 d t d xover all time varying densities m and velocity fields v satisfying
∂ m ∂ t + div ( m v ) = 0 , m(0, ⋅ ) = m0, m(1, ⋅ ) = m1.It is shown in [10, 62] that this infimum is attained for some mmin and vmin and that it is equal to the L 2 Kantorovich-Wasserstein distance between μ0 = m0dℒ and μ1 = m1dℒ. Further, the flow X = X(x, t) corresponding to the minimizing velocity field vmin via
X(x, 0) = x, Xt = vmin ∘ Xis given simply as
X ( x , t ) = x + t ( t ~ M K ( x ) − x ) .Note that when t = 0, X is the identity map and when t = 1, it is the solution u ~ M K to the Monge-Kantorovich problem. This analysis provides appropriate justification for using (5.28) to define our continuous warping map X between the densities m0 and m1.
This warping is illustrated on MR heart images, Figure 5.2 . Since the images have some black areas where the intensity is zero, and since the intensities define the densities in the Monge-Kantorovich registration procedure, we typically replace m0 by some small perturbation m0 + ε for 0 < ε ≪ 1 to ensure that we have strictly positive densities. Full details may be found in [98]. We should note that the type of mixed L 2 /Wasserstein distance functionals which appear in [98] were introduced in [11].
Optimal Warping of Myocardium from Diastolic to Systolic in Cardiac Cycle. These static images become much more vivid when viewed as a short animation. (Available at http://www.bme.gatech.edu/groups/minerva/publications/papers/medicalBAMS2005.html).
Specifically, two MR images of a heart are given at different phases of the cardiac cycle. In each image the white part is the blood pool of the left ventricle. The circular gray part is the myocardium. Since the deformation of this muscular structure is of great interest, we mask the blood pool and consider only the optimal warp of the myocardium in the sense described above. Figure 5.2(a) is a diastolic image, and Figure 5.2(f) is a systolic image. 13 These are the only data given. Figures 5.2(b) to Figure 5.2(e) show the warping using the Monge-Kantorovich technique between these two images. These images offer a very plausible deformation of the heart as it undergoes its diastole/systole cyle. This is remarkable considering the fact that they were all artificially created by the Monge-Kantorovich technique. The plausibility of the deformation demonstrates the quality of the resulting warping.
When looking at an image, a human observer cannot help seeing structures which often may be identified with objects. However, digital images as the raw retinal input of local intensities are not structured. Segmentation is the process of creating a structured visual representation from an unstructured one. The problem was first studied in the 1920s by psychologists of the Gestalt school (see Kohler [53] and the references therein) and later by psychophysicists [48, 95]. 14 In its modern formulation, image segmentation is the problem of partitioning an image into homogeneous regions that are semantically meaningful, i.e., that correspond to objects we can identify. Segmentation is not concerned with actually determining what the partitions are. In that sense, it is a lower level problem than object recognition.
In the context of medical imaging, these regions have to be anatomically meaningful. A typical example is partitioning an MRI image of the brain into the white and gray matter. Since it replaces continuous intensities with discrete labels, segmentation can be seen as an extreme form of smoothing/information reduction. Segmentation is also related to registration in the sense that if an atlas 15 can be perfectly registered to a dataset at hand, then the registered atlas labels are the segmentation. Segmentation is useful for visualization, 16 allows for quantitative shape analysis, and provides an indispensable anatomical framework for virtually any subsequent automatic analysis.
Indeed, segmentation is perhaps the central problem of artificial vision, and accordingly many approaches have been proposed (for a nice survey of modern segmentation methods, see the monograph [66]). There are basically two dual approaches. In the first, one can start by considering the whole image to be the object of interest and then refining this initial guess. These “split and merge” techniques can be thought of as somewhat analogous to the top-down processes of human vision. In the other approach, one starts from one point assumed to be inside the object and adds other points until the region encompasses the object. Those are the “region growing” techniques and bear some resemblance to the bottom-up processes of biological vision.
The dual problem to segmentation is that of determining the boundaries of the segmented homogeneous regions. This approach has been popular for some time, since it allows one to build upon the well-investigated problem of edge detection (Section 5.3.1). Difficulties arise with this approach because noise can be responsible for spurious edges. Another major difficulty is that local edges need to be connected into topologically correct region boundaries. To address these issues, it was proposed to set the topology of the boundary to that of a sphere and then deform the geometry in a variational framework to match the edges. In 2D, the boundary is a closed curve, and this approach was named snakes (Section 5.3.2). Improvements of the technique include geometric active contours (Section 5.3.3) and conformal active contours (Section 5.3.4). All these techniques are generically referred to as active contours. Finally, as described in [66], most segmentation methods can be set in the elegant mathematical framework proposed by Mumford and Shah [68] (Section 5.3.7).
Consider the ideal case of a bright object 𝒪 on a dark background. The physical object is represented by its projections on the image I. The characteristic function 1𝒪 of the object is the ideal segmentation, and since the object is contrasted on the background, the variations of the intensity I are large on the boundary ∂ 𝒪. It is therefore natural to characterize the boundary ∂ 𝒪 as the locus of points where the norm of the gradient |∇I| is large. In fact, if ∂ 𝒪 is piecewise smooth, then |∇I| is a singular measure whose support is exactly ∂ 𝒪.
This is the approach taken in the 60’s and 70’s by Roberts [81] and Sobel [90], who proposed slightly different discrete convolution masks to approximate the gradient of digital images. Disadvantages with these approaches are that edges are not precisely localized and may be corrupted by noise. See Figure 5.3(b) is the result of a Sobel edge detector on a medical image. Note the thickness of the boundary of the heart ventricle as well as the presence of “spurious edges” due to noise. Canny [14] proposed to add a smoothing pre-processing step (to reduce the influence of the noise) as well as a thinning post-processing phase (to ensure that the edges are uniquely localized). See [26] for a survey and evaluation of edge detectors using gradient techniques.