Dual Spaces, the Schwartz Space and Distribution Theory, and the Dirac Delta Function
Dual spaces of normed linear spaces, weak and weak* convergence. Distribution theory, test functions, the Schwartz space, and the Dirac delta as a distribution.
In this chapter, we study dual spaces, distribution theory, and the Dirac delta function. The key motivation is that important objects like the impulse function do not live in the usual spaces of real-valued functions, yet they are indispensable in signal processing, circuit analysis, control, and communications. By working with test functions and defining the impulse as a functional (an element of a dual space), we place these objects on rigorous footing. Along the way, we develop the Schwartz space, weak convergence, and approximate identity sequences -- all essential tools for Fourier analysis later in the course.
Introduction and Motivation
To gain some insight and intuition on what this chapter entails, consider the function where . This function does not have a pointwise limit (in ) as . However, with an arbitrary continuous function, the integral has a well-defined limit, which is zero (see the Riemann-Lebesgue Lemma, presented as Theorem). In this sense, can be viewed to admit a limit which is equivalent to the constant function with value . That is, in the sense that when we take the integral , for in a set of test functions. This motivates the notion of weak convergence (and weak convergence).
In our course, one important application which arises while studying linear systems as well as Laplace and Fourier transforms is with regard to the use of the impulse (or Dirac delta) function. Such functions do not live in the set of -valued functions, and hence many operations such as integration become ill-stated. However, the Dirac delta function is such an important and crucial object that one has to know how to work with it even in the most elementary applications in signal processing, circuit analysis, control, and communications, in addition to many other areas of engineering and applied mathematics. We will see that the appropriate way to study the impulse function is to always work under an integral with test functions, not unlike what we discussed above.
A complete understanding of Fourier transforms is possible through an investigation building on distribution theory: A distribution is a continuous and linear function on a sufficiently large space of test functions. The space of test functions we will consider, the Schwartz space, will prove to be very useful in arriving at several additional technical results with significant implications.
The above will motivate us to introduce dual spaces, weak convergence concepts, and distribution theory. There will be some additional useful properties: Every distribution is differentiable, and the differentiation is continuous. Most importantly, a function whose Fourier transform is not defined as a function might have a transform in a distributional sense.
It may not be immediately evident that the study of such a theory is needed in engineering practice. However, the patient student will realize the importance of this topic, and versatility in introduces, both in the context of Fourier transformation theory, as well as while studying optimization, control, ordinary and partial differential equations and their applications in continuum mechanics, and probability and beyond.
Dual Space of a Normed Linear Space
Let be a linear functional on a normed linear space (thus mapping to ). We say is bounded (in the operator norm) if there is a constant such that for all . The smallest such is called the norm of and is denoted by , also given by:
Let us define the dual space of as the set of linear and bounded functions on to or , and let us denote this space by . The space is called the (topological) dual space of . This is equivalent to the space of all continuous and linear functions, as continuity and boundedness imply each other:
A linear functional on a normed linear space is bounded if and only if it is continuous.
Intuition: For linear functionals, "bounded" and "continuous" are the same thing. If a linear map does not blow up relative to the size of its input, it is automatically continuous, and vice versa. This means we can freely switch between these two characterizations when working with dual spaces.
The space is a linear space, under pointwise addition and scalar multiplication of functions in it. Furthermore, is itself a normed space with the norm given above in .
Exercise
Show that is a Banach space.
The dual space is a Banach space, even if itself is not Banach.
A key result for identifying the dual spaces for or spaces is Holder's inequality (see Theorem): Let or possibly . Then,
where .
(i) Every linear bounded function , , is representable uniquely in the form, with :
where is in with .
(ii) Furthermore, every vector in defines such a vector (as above) in with
(iii) This also applies to spaces.
Intuition: The Riesz Representation Theorem tells us that every continuous linear functional on (or ) can be written as a "dot product" with some fixed element from the conjugate space (or ). In other words, the dual of is . This is the infinite-dimensional generalization of the fact that every linear functional on is an inner product with some fixed vector.
The Riesz Representation Theorem tells us that while studying spaces such as or , we can use an inner-product like (but not really an inner-product in the way we defined Hilbert spaces) expression to represent the set of all linear functions on by:
where is a continuous linear function on , but this is equivalent to the function having an inner-product like expression with . Thus, every vector in is identified with some vector .
Likewise, for a discrete-time signal:
is a linear function on .
Thus, if for , we can show that the dual space of is representable by elements in where .
In the special case of we have the space , which has its dual space as itself.
The following is a general result for Hilbert spaces.
Every linear bounded function on a Hilbert space admits a representation of the form:
for some .
Intuition: In a Hilbert space, the dual space is the space itself. Every continuous linear functional is secretly just an inner product with some fixed element. This is why is so special in signal processing: signals and the functionals that measure them live in the same space.
We say that and are aligned if
Some observations beyond the scope of our course follow.
(i) The dual space of or is more complicated (due to the fact that such functions do not converge to zero as the index goes unbounded), and will not be considered in this course. On the other hand, let be the set of functions which decay to zero. The dual of this space is (associated with, in the sense of the representation result presented earlier) .
(ii) The dual of can be associated with the space of signed measures with bounded total variation. Likewise, let denote the space of continuous functions which satisfy . The dual of this space is (associated with) the space of finite signed measures with bounded total variation.
(iii) Those of you who will take further courses on probability will study the concept of weak convergence of probability measures. A sequence of probability measures converges to some probability measure weakly if for every in (that is the set of continuous and bounded functions on ):
If we had replaced with here, note that this would coincide with the weak-convergence of (to be studied in the following). Nonetheless, in probability theory the convergence stated above is so important that this is simply called weak convergence.
Strong, Weak, and Weak* Convergence
Earlier, we discussed that in a normed space , a sequence of vectors converges to a vector if
A sequence in a normed space converges strongly (or converges in norm) to if
Intuition: Strong convergence is the most natural notion of convergence in a normed space -- it says that the "distance" between and shrinks to zero. It is called "strong" to distinguish it from the weaker convergence notions (weak and weak) that follow. Strong convergence implies weak convergence, but not vice versa.
A sequence in is said to converge weakly to if
for all .
Intuition: Weak convergence says that even though the sequence might not get close to in norm, every continuous linear measurement you can make on eventually agrees with the measurement on . It is a much less demanding notion of convergence -- two signals can be "weakly close" even if they look quite different pointwise.
Exercise
Let . Show that if
then
We note however that, weak convergence does not imply strong convergence.
A related convergence notion, one that we will adopt while studying distributions, is that of weak convergence, defined next.
A sequence in is said to converge in the weak sense to if
for all .
Intuition: Weak convergence is about a sequence of functionals (dual elements) converging: converges to in the weak sense if, for every fixed input , the numbers converge to . This is exactly the notion of convergence used for distributions: a sequence of distributions converges if it converges when applied to every test function.
We note that such a convergence notion is very useful in the study of solutions of differential equations (ordinary and partial), optimal control theory, and probability theory as well, even though we will not be able to discuss these in our course.
Distribution Theory
A distribution is a linear and continuous -valued function (that is, a functional) on a space of test functions. Thus, a distribution can be viewed to be an element of the dual space of a linear space of test functions (even though we will see that the linear space of test functions does not need to form a normed linear space).
Studying distributions and sets of test functions present many benefits for our course. For example, the delta function has a natural representation as a distribution. Furthermore, Fourier analysis will be observed to be a bijective mapping from a space of test functions to another one, and this space of test functions is rich enough to approximate many functions that we encounter in applications sufficiently well. Furthermore, we will define the Fourier transform first on a space of test functions and extend it from this space to larger spaces, such as .
Space and of Test Functions
Let denote a set of test functions from to , which are smooth (infinitely differentiable) and which have bounded support sets. Such functions exist, for example
is one such function.
We say a sequence of functions in converges to the null element if a) For every , there exists a compact, continuous-time domain such that the support set of is contained in (we define the support for a function to be the closure of the set of points ). b) For every , and there exists an such that for all , , where (that is, all the derivatives of converge to zero uniformly on ).
In applications we usually encounter functions with unbounded support. Hence, a theory based on the above test functions might not be satisfactory. Furthermore, the Fourier transform of a function in is not in the same space (a topic to be discussed further). As such, we will find it convenient to slightly extend the space of test functions.
An infinitely differentiable function is in the Schwartz function space, denoted with , if for each and for each
where .
Intuition: A Schwartz function is one that is infinitely smooth and decays to zero faster than any polynomial (along with all its derivatives). The space is larger than (compact-support functions) but still small enough to be well-behaved. It is the ideal space of test functions for Fourier analysis because, as we will see, the Fourier transform maps bijectively onto itself.
For example the function is a Schwartz function.
One can equip with a notion of convergence generated by a countable number of semi-norms:
for . That is, we say, a sequence of functions in converges to another one if
With the above, we could define a metric by working with the above seminorms: for , let us define a metric between the two vectors as:
where is a countable enumeration of the pairs .
The Schwartz space of functions equipped with such a metric will be a complete space. Furthermore, differentiation operator becomes a continuous operation in , under this metric; a topic which we will discuss further.
As had been discussed before (slightly generalizing Theorem), a functional from is continuous if and only if for every convergent sequence in , , we have . We note that checking sequential continuity is typically easier than continuity, since in the space , it is not convenient to compute the distance between two vectors given the quite involved construction of the metric in .
A distribution is a linear, continuous functional on the space of test functions .
Intuition: A distribution is a generalized function. Instead of assigning a value at each point, it assigns a number to each test function. Regular functions give rise to distributions through integration, but distributions also include "singular" objects like the Dirac delta that cannot be represented as ordinary functions. Think of a distribution as something that is only meaningful "under an integral sign" against test functions.
Thus, a distribution is an element of the dual space of (that is, ), even though is not defined as a normed space, but as a metric space which is nonetheless a linear space.
General Distributions and Singular Distributions
Distributions can be regular and singular. Regular distributions can be expressed as an integral of a test function and a locally integrable function (that is a function which has a finite absolute integral on any compact domain on which it is defined). For example if is a real-valued integrable function on , and the distribution given by
is a regular distribution on , represented by the function .
A tempered function, is one which satisfies small growth, that is, for some , :
Intuition: A tempered function is one that does not grow faster than some polynomial. Any tempered function can represent a regular distribution via integration against test functions, because the rapid decay of Schwartz functions compensates for the polynomial growth.
Any tempered function can represent a regular distribution.
Singular distributions do not admit such a representation. For example the Dirac delta distribution , defined for all :
does not admit a representation in the form . Even when there is no function which can be used to represent a singular distribution, one occasionally represents a singular distribution as if such a function exists and call the representing function a singular or a generalized function. The informal expression is a common example for this, where is the generalized impulse function which takes the value at , and zero elsewhere.
The map defines a distribution on . This distribution is called the Dirac delta distribution.
Intuition: The Dirac delta is a perfectly well-defined object when viewed as a distribution: it simply evaluates a test function at the origin. It is linear (evaluating a linear combination at zero gives the linear combination of evaluations) and continuous (if test functions converge, their values at zero converge). The Dirac delta only becomes problematic if you try to treat it as an ordinary function.
Exercise
Show that the relation
defines a distribution .
Equivalence and Convergence of Distributions
Two distributions and are equal if
A sequence of distributions converges to a distribution if
Intuition: Convergence of distributions is exactly weak convergence in the dual space . A sequence of distributions converges if and only if it converges pointwise on every test function. This is a very permissive notion of convergence, which is exactly why objects like the Dirac delta can be obtained as limits of ordinary functions.
Observe that the above notion is identical to the weak convergence notion discussed earlier in Definition.
Let for ,
a) For any real-valued function , define
Then is a distribution on for every .
b) We have that
Conclude that, the sequence of regular distributions , represented by a real-valued, integrable function , converges to the Dirac delta distribution on the space of test functions in .
In fact, we can find many other functions which can define regular distributions whose limit is the delta distribution. This motivates the following section.
Approximate Identity Sequences
Let be a sequence such that
- , .
- , .
- , .
Such sequences are called approximate identity sequences.
Intuition: An approximate identity sequence is a family of functions that become more and more concentrated around the origin while maintaining unit area. As , they "squeeze" all their mass into an infinitesimally small region near . In the limit, they behave like the Dirac delta: integrating them against any test function yields . They provide a concrete, constructive way to approach the delta distribution through ordinary functions.
We have seen one example in . The result discussed generalizes to any approximate identity sequence:
Distributions represented by approximate identity sequences converge to the Dirac delta distribution as .
Intuition: No matter how you construct your approximate identity sequence (rectangles, Gaussians, sinc-like functions, etc.), the associated distributions always converge to the Dirac delta. This universality is what makes the delta distribution a robust and natural object -- it is the unique limit of any sequence of non-negative unit-area functions concentrating at the origin.
Examples of Approximate Identity Sequences
Let for ,
Such a sequence is an example of an approximate identity sequence.
Observe that for , if we define
it follows that is a distribution on and we can show that
We thus conclude that, the sequence of regular distributions , represented by the real-valued, integrable function , converges to the Dirac delta distribution .
Another very important example for such approximate identity sequences is the following Gaussian sequence of functions given by
Observe that each element of this sequence lives in , which will be very consequential.
Consider the sequence
where is so that . We have that
Intuition: The sequence concentrates more and more sharply around as grows, because achieves its maximum value of at and is strictly less than everywhere else on . After normalizing to have unit integral, this becomes an approximate identity sequence.
One further very useful sequence, which does not satisfy the non-negativity property above, but nonetheless satisfies the convergence property (to ) is the following sequence:
For any , with as in
Intuition: Even though the sinc function takes negative values and thus is not a true approximate identity sequence, it still converges to the Dirac delta when integrated against Schwartz functions. This particular sequence is intimately connected to the Fourier transform and plays a central role in sampling theory.
Convolution and its use in approximations
The convolution of two functions (whenever this integration is well-defined) is defined as:
The convolution can be defined for any pair of functions which are in . The convolution of two functions in is also in .
A very useful result is the following.
If is an approximate identity sequence, then,
for every continuous and bounded function , uniformly on compact sets .
Intuition: Convolving a function with an approximate identity sequence "smooths" it, and as the smoothed version converges back to the original function. This is the precise sense in which the Dirac delta is the identity element for convolution: convolving with leaves a function unchanged, and approximate identities approximate this behavior.
Note that with defined as in , is always infinitely differentiable, and one may conclude the following:
The space of smooth functions are dense in the space of continuous functions with a compact support under the supremum norm.
Intuition: Any continuous function with compact support can be approximated arbitrarily well (in supremum norm) by a smooth function. This is achieved by convolving with a Gaussian approximate identity, which always produces an infinitely differentiable result.
Completeness of complex exponentials in
Using Theorem, with
(which is an approximate identity sequence as shown in Proposition, when is picked so that ), we can prove the following:
The family of complex exponentials in :
forms an orthonormal sequence which is complete.
Intuition: This theorem justifies the Fourier series: the complex exponentials form a complete orthonormal basis for , meaning every square-integrable periodic function can be represented as a (possibly infinite) sum of these exponentials with no information lost. There is no "missing direction" in the space that the exponentials cannot reach.
This sequence is used for the Fourier expansion of functions in ; see the relevant section.
Some Operations on Distributions [Optional]
While studying several properties of distributions, one typically first starts with a generalized distribution and tries to extend the properties to singular distributions.
One important property of distributions is that every distribution has a derivative. Furthermore, we will also be taking the Fourier transform of distributions, but the derivative, once again, will have a meaning as a distribution; that is it will only have a meaning when it is applied to a class of test functions.
The derivative of a distribution is defined as:
Intuition: The derivative of a distribution is defined by "shifting" the derivative onto the test function (with a sign change), which is just integration by parts without boundary terms (since Schwartz functions vanish at infinity). This means every distribution -- even singular ones like the Dirac delta -- is differentiable, a stark contrast to ordinary calculus where most functions are not differentiable everywhere.
We can check if this definition is consistent with a distribution represented by a regular function. Consider and note that through integration by parts
Given the definition of a distributional derivative, we show that the distributional derivative of the distribution represented by the unit step function is the Dirac delta distribution: Let denote the step function: that is ( being the indicator function). Define for ,
We can verify that the Dirac delta distribution is the derivative of the step distribution above:
for all (in the above equation, allows us to use that ).
This is an important relationship in engineering applications; for example, the step function often models a turn-on event for a switch in circuit theory and its derivative is often approximated to be the Dirac delta function (to be cautiously interpreted).
Convolution of Distributions
Let be a distribution given by .
The convolution of with would be:
We can interpret this as a distribution in the following sense. Let be the shifting operator and be the inversing operator. Then,
This then motivates the following: The convolution of a function in and a distribution is defined by:
where, as before, is the inversing operator and is the shifting operator.
For any distribution and in , is an infinitely differentiable function and can be used to represent a regular distribution.
Intuition: Convolving any distribution (even a singular one) with a Schwartz function always produces a smooth, well-behaved function. This is a powerful regularization property: the smoothness of the test function "wins out" and tames any singularity.
Let be two regular distributions represented by , respectively. The convolution of is given by the relation, whenever this is well-defined:
with
It should be observed that, with the above definition:
that is the delta distribution is the identity element in distributions under the operation of convolution.
Let be the approximate identity sequence given by , so that . Then, it can be shown that for any singular , is a smooth function, and can be used to represent a regular distribution such that for any (by linearity and by continuity of ). Accordingly, for any singular distribution, there exists a sequence of regular distributions which converges to the singular distribution.
Fourier Transform of Schwartz Functions
We will continue the discussion of Schwartz functions in the context of Fourier transforms. One appealing aspect of Schwartz functions is that the Fourier transform of a Schwartz function lives in the space of Schwartz functions. In fact, the Fourier transform on the space of Schwartz functions is both onto and one to one (hence a bijection). This will be proven later. Since the space of continuous functions is dense in the space of square integrable functions, and is dense in the space of continuous functions under the supremum norm by Theorem, we will use the bijection property of the Fourier transform on to define the Fourier transform of square integrable functions.
Appendix
Optional: Application to Optimization Problems and the Generalization of the Projection Theorem
The duality results and Holder's inequality are important in applications to optimization problems. The geometric ideas we reviewed in the context of the projection theorem apply very similarly to such spaces, where the inner-product is replaced by the duality pairings. Let us make this more explicit: Let for a subspace ,
(i) Let be an element in a real normed space and let denote its distance from a subspace . Then,
If the infimum is achieved, then the maximum on the right is achieved for some such that is aligned with .
(ii) In particular, if satisfies
there must be a non-zero vector such that for all and is aligned with .
Intuition: This theorem generalizes the projection theorem from Hilbert spaces to general normed spaces using duality. The distance from a point to a subspace can be computed by maximizing a dual functional over unit-norm elements in the annihilator . This is the foundation of duality in optimization: the "primal" problem (minimizing distance) equals the "dual" problem (maximizing a linear functional).
Let be a subspace in a real normed space . Let be at a distance from . Then, (i)
where the minimum on the left is achieved for . (ii) If the supremum on the right is achieved for some , then is aligned with .
Intuition: This is the "dual version" of the previous theorem: now we are measuring the distance of a dual element from the annihilator , and this equals the supremum of over unit-norm elements in . Together with Theorem, these results establish the symmetric duality between primal and dual optimization problems.
An Application: Constrained Dual Optimization Problems
Consider the following constrained optimization problem:
Observe that if is any vector satisfying the constraints, then
where denotes the space spanned by and is some vector satisfying the constraints.
From Theorem, we have that
Now, any vector in is of the form where is a matrix and is a column vector. Thus,
where the last equality follows because satisfies the constraints and that
Thus, the optimal solution to the constrained problem can be written as
where the optimal is aligned with the optimal .
In the following, we present another approach to arrive at the above.
where the optimal is aligned with . In the above, follows from Sion's minimax theorem. In the above, serves as a Lagrangian multiplier.
Exercises
Exercise
Does there exist a sequence of functions in such that a sequence of distributions represented by on the set of Schwartz functions converges to zero in a distributional sense, but does not converge to zero (that is, in the norm). That is, does there exist a sequence of functions in such that
is not zero, but
If there exists one, give an example. If there does not exist one, explain why.
Exercise
(a) Let be a mapping from to (extended to possibly include ) given by:
Let be given by:
Is continuous on at ?
(b) Let be the space of Schwartz functions. Let be a mapping given by:
where
Is a distribution on ? That is, is continuous and linear on ?
Exercise
Let be a mapping defined by:
Is continuous on ? Prove your argument.
Hint: The function is in , for any .
Exercise
Let for ,
For , define
Show that is a distribution on . Show that
Conclude that, the sequence of regular distributions , represented by a real-valued, integrable function , converges to the delta distribution on the space of test functions .
Exercise
Let be the space of Schwartz functions. Let be a mapping given by:
where
Is a distribution on ? That is, is continuous and linear on ?