Home/Chapter 3

Dual Spaces, the Schwartz Space and Distribution Theory, and the Dirac Delta Function

Dual spaces of normed linear spaces, weak and weak* convergence. Distribution theory, test functions, the Schwartz space, and the Dirac delta as a distribution.

In this chapter, we study dual spaces, distribution theory, and the Dirac delta function. The key motivation is that important objects like the impulse function do not live in the usual spaces of real-valued functions, yet they are indispensable in signal processing, circuit analysis, control, and communications. By working with test functions and defining the impulse as a functional (an element of a dual space), we place these objects on rigorous footing. Along the way, we develop the Schwartz space, weak convergence, and approximate identity sequences -- all essential tools for Fourier analysis later in the course.


Introduction and Motivation

To gain some insight and intuition on what this chapter entails, consider the function sin(nt)\sin(nt) where nNn \in \mathbb{N}. This function does not have a pointwise limit (in tt) as nn \to \infty. However, with ff an arbitrary continuous function, the integral sin(nt)f(t)dt\int \sin(nt)f(t)dt has a well-defined limit, which is zero (see the Riemann-Lebesgue Lemma, presented as Theorem). In this sense, sin(nt)\sin(nt) can be viewed to admit a limit which is equivalent to the constant function with value 00. That is, sin(nt)0(t)\sin(nt) \to \underline{0}(t) in the sense that when we take the integral sin(nt)f(t)dtsin(nt)0(t)dt\int \sin(nt)f(t)dt \to \int \sin(nt)\underline{0}(t)dt, for ff in a set of test functions. This motivates the notion of weak convergence (and weak^* convergence).

In our course, one important application which arises while studying linear systems as well as Laplace and Fourier transforms is with regard to the use of the impulse (or Dirac delta) function. Such functions do not live in the set of R\mathbb{R}-valued functions, and hence many operations such as integration become ill-stated. However, the Dirac delta function is such an important and crucial object that one has to know how to work with it even in the most elementary applications in signal processing, circuit analysis, control, and communications, in addition to many other areas of engineering and applied mathematics. We will see that the appropriate way to study the impulse function is to always work under an integral with test functions, not unlike what we discussed above.

A complete understanding of Fourier transforms is possible through an investigation building on distribution theory: A distribution is a continuous and linear function on a sufficiently large space of test functions. The space of test functions we will consider, the Schwartz space, will prove to be very useful in arriving at several additional technical results with significant implications.

The above will motivate us to introduce dual spaces, weak convergence concepts, and distribution theory. There will be some additional useful properties: Every distribution is differentiable, and the differentiation is continuous. Most importantly, a function whose Fourier transform is not defined as a function might have a transform in a distributional sense.

It may not be immediately evident that the study of such a theory is needed in engineering practice. However, the patient student will realize the importance of this topic, and versatility in introduces, both in the context of Fourier transformation theory, as well as while studying optimization, control, ordinary and partial differential equations and their applications in continuum mechanics, and probability and beyond.


Dual Space of a Normed Linear Space

Let ff be a linear functional on a normed linear space XX (thus mapping XX to R\mathbb{R}). We say ff is bounded (in the operator norm) if there is a constant MM such that f(x)Mx|f(x)| \leq M\|x\| for all xXx \in X. The smallest such MM is called the norm of ff and is denoted by f\|f\|, also given by:

f:=supx:x0f(x)x.\|f\| := \sup_{x:\|x\| \neq 0} \frac{|f(x)|}{\|x\|}. \qquad \text{}

Let us define the dual space of XX as the set of linear and bounded functions on XX to R\mathbb{R} or C\mathbb{C}, and let us denote this space by XX^*. The space XX^* is called the (topological) dual space of XX. This is equivalent to the space of all continuous and linear functions, as continuity and boundedness imply each other:

TheoremBounded iff Continuous

A linear functional on a normed linear space is bounded if and only if it is continuous.

Remark.

Intuition: For linear functionals, "bounded" and "continuous" are the same thing. If a linear map does not blow up relative to the size of its input, it is automatically continuous, and vice versa. This means we can freely switch between these two characterizations when working with dual spaces.

The space XX^* is a linear space, under pointwise addition and scalar multiplication of functions in it. Furthermore, XX^* is itself a normed space with the norm given above in .

Exercise

Show that (X,)(X^*, \|\cdot\|) is a Banach space.

Remark.

The dual space (X,)(X^*, \|\cdot\|) is a Banach space, even if XX itself is not Banach.

A key result for identifying the dual spaces for lp(Z+;R)l_p(\mathbb{Z}_+; \mathbb{R}) or Lp(R+;R)L_p(\mathbb{R}_+; \mathbb{R}) spaces is Holder's inequality (see Theorem): Let 1p,q<1 \leq p, q < \infty or possibly \infty. Then,

iZ+xiyixpyq,\sum_{i \in \mathbb{Z}_+} x_i y_i \leq \|x\|_p \|y\|_q,

where 1p+1q=1\frac{1}{p} + \frac{1}{q} = 1.

TheoremRiesz Representation Theorem

(i) Every linear bounded function F:lp(Z+;R)RF : l_p(\mathbb{Z}_+; \mathbb{R}) \to \mathbb{R}, 1p<1 \leq p < \infty, is representable uniquely in the form, with x={xi,iZ+}lp(Z+;R)x = \{x_i, i \in \mathbb{Z}_+\} \in l_p(\mathbb{Z}_+; \mathbb{R}):

F(x)=i=0ηixi,F(x) = \sum_{i=0}^{\infty} \eta_i x_i,

where η={ηi}\eta = \{\eta_i\} is in lq(Z+;R)l_q(\mathbb{Z}_+; \mathbb{R}) with 1p+1q=1\frac{1}{p} + \frac{1}{q} = 1.

(ii) Furthermore, every vector η\eta in lq(Z+;R)l_q(\mathbb{Z}_+; \mathbb{R}) defines such a vector FF (as above) in lp(Z+;R)l_p(\mathbb{Z}_+; \mathbb{R})^* with

F=ηq.\|F\| = \|\eta\|_q. \qquad \text{}

(iii) This also applies to Lp(R+;R)L_p(\mathbb{R}_+; \mathbb{R}) spaces.

Remark.

Intuition: The Riesz Representation Theorem tells us that every continuous linear functional on lpl_p (or LpL_p) can be written as a "dot product" with some fixed element from the conjugate space lql_q (or LqL_q). In other words, the dual of lpl_p is lql_q. This is the infinite-dimensional generalization of the fact that every linear functional on Rn\mathbb{R}^n is an inner product with some fixed vector.

The Riesz Representation Theorem tells us that while studying spaces such as Lp(R+;R)L_p(\mathbb{R}_+; \mathbb{R}) or lp(N;R)l_p(\mathbb{N}; \mathbb{R}), we can use an inner-product like (but not really an inner-product in the way we defined Hilbert spaces) expression to represent the set of all linear functions on XX by:

,y:Xxx,y=Rx(t)y(t)dtR\langle \cdot, y \rangle : X \ni x \mapsto \langle x, y \rangle = \int_{\mathbb{R}} x(t)y(t)dt \in \mathbb{R}

where ,y\langle \cdot, y \rangle is a continuous linear function on XX, but this is equivalent to the function yLq(R+;R)y \in L_q(\mathbb{R}_+; \mathbb{R}) having an inner-product like expression with xXx \in X. Thus, every vector in Lp(R+;R)L_p(\mathbb{R}_+; \mathbb{R})^* is identified with some vector yLq(R+;R)y \in L_q(\mathbb{R}_+; \mathbb{R}).

Likewise, for a discrete-time signal:

,y:Xxx,y=i=1x(i)y(i)R,\langle \cdot, y \rangle : X \ni x \mapsto \langle x, y \rangle = \sum_{i=1}^{\infty} x(i)y(i) \in \mathbb{R},

is a linear function on XX.

Thus, if X=Lp(R+;R)X = L_p(\mathbb{R}_+; \mathbb{R}) for 1p<1 \leq p < \infty, we can show that the dual space of XX is representable by elements in Lq(R+;R)L_q(\mathbb{R}_+; \mathbb{R}) where 1p+1q=1\frac{1}{p} + \frac{1}{q} = 1.

In the special case of p=2p = 2 we have the space L2(R+;R)L_2(\mathbb{R}_+; \mathbb{R}), which has its dual space as itself.

The following is a general result for Hilbert spaces.

TheoremRiesz Representation Theorem for Hilbert Spaces

Every linear bounded function ff on a Hilbert space HH admits a representation of the form:

f(x)=x,yf(x) = \langle x, y \rangle

for some yHy \in H.

Remark.

Intuition: In a Hilbert space, the dual space is the space itself. Every continuous linear functional is secretly just an inner product with some fixed element. This is why L2L_2 is so special in signal processing: signals and the functionals that measure them live in the same space.

We say that xXx \in X and xXx^* \in X^* are aligned if

x,x=xx.\langle x, x^* \rangle = \|x\|\|x^*\|.

Remark.

Some observations beyond the scope of our course follow.

(i) The dual space of l(Z+;R)l_\infty(\mathbb{Z}_+; \mathbb{R}) or L(R+;R)L_\infty(\mathbb{R}_+; \mathbb{R}) is more complicated (due to the fact that such functions do not converge to zero as the index goes unbounded), and will not be considered in this course. On the other hand, let c0Γ(Z+;R)c_0 \in \Gamma(\mathbb{Z}_+; \mathbb{R}) be the set of functions which decay to zero. The dual of this space is (associated with, in the sense of the representation result presented earlier) l1(Z;R)l_1(\mathbb{Z}; \mathbb{R}).

(ii) The dual of C([a,b];R)C([a,b]; \mathbb{R}) can be associated with the space of signed measures with bounded total variation. Likewise, let C0(R;R)C_0(\mathbb{R}; \mathbb{R}) denote the space of continuous functions ff which satisfy limxf(x)=0\lim_{|x| \to \infty} f(x) = 0. The dual of this space is (associated with) the space of finite signed measures with bounded total variation.

(iii) Those of you who will take further courses on probability will study the concept of weak convergence of probability measures. A sequence of probability measures μn\mu_n converges to some probability measure μ\mu weakly if for every ff in Cb(R;R)C_b(\mathbb{R}; \mathbb{R}) (that is the set of continuous and bounded functions on R\mathbb{R}):

μn(dx)f(x)μ(dx)f(x).\int \mu_n(dx)f(x) \to \int \mu(dx)f(x).

If we had replaced Cb(R;R)C_b(\mathbb{R}; \mathbb{R}) with C0(R;R)C_0(\mathbb{R}; \mathbb{R}) here, note that this would coincide with the weak^*-convergence of μnμ\mu_n \to \mu (to be studied in the following). Nonetheless, in probability theory the convergence stated above is so important that this is simply called weak convergence.

Strong, Weak, and Weak* Convergence

Earlier, we discussed that in a normed space XX, a sequence of vectors {xn}\{x_n\} converges to a vector xx if

xnx0.\|x_n - x\| \to 0.

DefinitionStrong Convergence

A sequence {xn}\{x_n\} in a normed space XX converges strongly (or converges in norm) to xXx \in X if

xnx0.\|x_n - x\| \to 0.

Remark.

Intuition: Strong convergence is the most natural notion of convergence in a normed space -- it says that the "distance" between xnx_n and xx shrinks to zero. It is called "strong" to distinguish it from the weaker convergence notions (weak and weak^*) that follow. Strong convergence implies weak convergence, but not vice versa.

DefinitionWeak Convergence

A sequence {xn}\{x_n\} in XX is said to converge weakly to xx if

f(xn)f(x)f(x_n) \to f(x)

for all fXf \in X^*.

Remark.

Intuition: Weak convergence says that even though the sequence xnx_n might not get close to xx in norm, every continuous linear measurement you can make on xnx_n eventually agrees with the measurement on xx. It is a much less demanding notion of convergence -- two signals can be "weakly close" even if they look quite different pointwise.

Exercise

Let xl2(N;R)x \in l_2(\mathbb{N}; \mathbb{R}). Show that if

xx,x \to x^*,

then

x,fx,ffl2(N;R)\langle x, f \rangle \to \langle x^*, f \rangle \qquad \forall f \in l_2(\mathbb{N}; \mathbb{R})

We note however that, weak convergence does not imply strong convergence.

A related convergence notion, one that we will adopt while studying distributions, is that of weak^* convergence, defined next.

DefinitionWeak* Convergence

A sequence {fn}\{f_n\} in XX^* is said to converge in the weak^* sense to ff if

fn(x)f(x)f_n(x) \to f(x)

for all xXx \in X.

Remark.

Intuition: Weak^* convergence is about a sequence of functionals (dual elements) converging: fnf_n converges to ff in the weak^* sense if, for every fixed input xx, the numbers fn(x)f_n(x) converge to f(x)f(x). This is exactly the notion of convergence used for distributions: a sequence of distributions converges if it converges when applied to every test function.

We note that such a convergence notion is very useful in the study of solutions of differential equations (ordinary and partial), optimal control theory, and probability theory as well, even though we will not be able to discuss these in our course.


Distribution Theory

A distribution is a linear and continuous R\mathbb{R}-valued function (that is, a functional) on a space of test functions. Thus, a distribution can be viewed to be an element of the dual space of a linear space of test functions (even though we will see that the linear space of test functions does not need to form a normed linear space).

Studying distributions and sets of test functions present many benefits for our course. For example, the delta function has a natural representation as a distribution. Furthermore, Fourier analysis will be observed to be a bijective mapping from a space of test functions to another one, and this space of test functions is rich enough to approximate many functions that we encounter in applications sufficiently well. Furthermore, we will define the Fourier transform first on a space of test functions and extend it from this space to larger spaces, such as L2(R;C)L_2(\mathbb{R}; \mathbb{C}).

Space D\mathcal{D} and S\mathcal{S} of Test Functions

Let D\mathcal{D} denote a set of test functions from R\mathbb{R} to R\mathbb{R}, which are smooth (infinitely differentiable) and which have bounded support sets. Such functions exist, for example

f(t)=1{t1}e1t21,f(t) = 1_{\{|t| \leq 1\}} e^{\frac{1}{t^2 - 1}},

is one such function.

We say a sequence of functions {xi}\{x_i\} in D\mathcal{D} converges to the null element 0\underline{0} if a) For every iNi \in \mathbb{N}, there exists a compact, continuous-time domain TRT \subset \mathbb{R} such that the support set of xix_i is contained in TT (we define the support for a function ff to be the closure of the set of points {t:f(t)>0}\{t : f(t) > 0\}). b) For every ϵ>0\epsilon > 0, and kk there exists an NkZ+N_k \in \mathbb{Z}_+ such that for all nNkn \geq N_k, pk(x)ϵp_k(x) \leq \epsilon, where pk=suptRdkdtkx(t)p_k = \sup_{t \in \mathbb{R}} |\frac{d^k}{dt^k} x(t)| (that is, all the derivatives of xx converge to zero uniformly on R\mathbb{R}).

In applications we usually encounter functions with unbounded support. Hence, a theory based on the above test functions might not be satisfactory. Furthermore, the Fourier transform of a function in D\mathcal{D} is not in the same space (a topic to be discussed further). As such, we will find it convenient to slightly extend the space of test functions.

DefinitionSchwartz Function Space S

An infinitely differentiable function ϕ:RR\phi : \mathbb{R} \to \mathbb{R} is in the Schwartz function space, denoted with S\mathcal{S}, if for each kZ+k \in \mathbb{Z}_+ and for each lZ+l \in \mathbb{Z}_+

suptRtlϕ(k)(t)<,\sup_{t \in \mathbb{R}} |t^l \phi^{(k)}(t)| < \infty,

where ϕ(k)(t)=dkdtkϕ(t)\phi^{(k)}(t) = \frac{d^k}{dt^k}\phi(t).

Remark.

Intuition: A Schwartz function is one that is infinitely smooth and decays to zero faster than any polynomial (along with all its derivatives). The space S\mathcal{S} is larger than D\mathcal{D} (compact-support functions) but still small enough to be well-behaved. It is the ideal space of test functions for Fourier analysis because, as we will see, the Fourier transform maps S\mathcal{S} bijectively onto itself.

For example the function ϕ(t)=et2\phi(t) = e^{-t^2} is a Schwartz function.

One can equip S\mathcal{S} with a notion of convergence generated by a countable number of semi-norms:

pα,β(ϕ):=supttαdβdβϕ(t),p_{\alpha,\beta}(\phi) := \sup_t |t^\alpha \frac{d^\beta}{d^\beta}\phi(t)|,

for α,βN\alpha, \beta \in \mathbb{N}. That is, we say, a sequence of functions ϕn\phi_n in S\mathcal{S} converges to another one ϕ\phi if

limnpα,β(ϕnϕ)=0,(α,β)Z+×Z+.\lim_{n \to \infty} p_{\alpha,\beta}(\phi_n - \phi) = 0, \quad (\alpha, \beta) \in \mathbb{Z}_+ \times \mathbb{Z}_+.

With the above, we could define a metric by working with the above seminorms: for x,ySx, y \in \mathcal{S}, let us define a metric between the two vectors as:

d(x,y)=n12npn(x)1+pn(x),d(x,y) = \sum_n \frac{1}{2^n} \frac{p_n(x)}{1 + p_n(x)}, \qquad \text{}

where nn is a countable enumeration of the pairs (α,β)Z+×Z+(\alpha, \beta) \in \mathbb{Z}_+ \times \mathbb{Z}_+.

The Schwartz space of functions equipped with such a metric will be a complete space. Furthermore, differentiation operator becomes a continuous operation in S\mathcal{S}, under this metric; a topic which we will discuss further.

As had been discussed before (slightly generalizing Theorem), a functional TT from SC\mathcal{S} \to C is continuous if and only if for every convergent sequence in S\mathcal{S}, ϕnϕ\phi_n \to \phi, we have T(ϕn)T(ϕ)T(\phi_n) \to T(\phi). We note that checking sequential continuity is typically easier than continuity, since in the space S\mathcal{S}, it is not convenient to compute the distance between two vectors given the quite involved construction of the metric in .

DefinitionDistribution

A distribution is a linear, continuous functional on the space of test functions S\mathcal{S}.

Remark.

Intuition: A distribution is a generalized function. Instead of assigning a value at each point, it assigns a number to each test function. Regular functions give rise to distributions through integration, but distributions also include "singular" objects like the Dirac delta that cannot be represented as ordinary functions. Think of a distribution as something that is only meaningful "under an integral sign" against test functions.

Thus, a distribution is an element of the dual space of S\mathcal{S} (that is, S\mathcal{S}^*), even though S\mathcal{S} is not defined as a normed space, but as a metric space which is nonetheless a linear space.

General Distributions and Singular Distributions

Distributions can be regular and singular. Regular distributions can be expressed as an integral of a test function and a locally integrable function (that is a function which has a finite absolute integral on any compact domain on which it is defined). For example if γ\gamma is a real-valued integrable function on R\mathbb{R}, and ϕS\phi \in \mathcal{S} the distribution given by

γˉ(ϕ):=Rγ(t)ϕ(t)dt\bar{\gamma}(\phi) := \int_{\mathbb{R}} \gamma(t)\phi(t)dt \qquad \text{}

is a regular distribution on S\mathcal{S}, represented by the function γ(t)\gamma(t).

DefinitionTempered Function

A tempered function, x(t)x(t) is one which satisfies small growth, that is, for some β,γR\beta, \gamma \in \mathbb{R}, NZ+N \in \mathbb{Z}_+:

x(t)βtN+γ,tR|x(t)| \leq \beta|t|^N + \gamma, \quad \forall t \in \mathbb{R}

Remark.

Intuition: A tempered function is one that does not grow faster than some polynomial. Any tempered function can represent a regular distribution via integration against test functions, because the rapid decay of Schwartz functions compensates for the polynomial growth.

Any tempered function can represent a regular distribution.

Singular distributions do not admit such a representation. For example the Dirac delta distribution δˉ\bar{\delta}, defined for all ϕS\phi \in \mathcal{S}:

δˉ(ϕ)=ϕ(0),\bar{\delta}(\phi) = \phi(0),

does not admit a representation in the form g(t)ϕ(t)=ϕ(0)\int g(t)\phi(t) = \phi(0). Even when there is no function which can be used to represent a singular distribution, one occasionally represents a singular distribution as if such a function exists and call the representing function a singular or a generalized function. The informal expression δ(t)ϕ(t)=ϕ(0)\int \delta(t)\phi(t) = \phi(0) is a common example for this, where δ\delta is the generalized impulse function which takes the value \infty at 00, and zero elsewhere.

TheoremDirac Delta is a Distribution

The map δˉ(ϕ)=ϕ(0)\bar{\delta}(\phi) = \phi(0) defines a distribution on S\mathcal{S}. This distribution is called the Dirac delta distribution.

Remark.

Intuition: The Dirac delta is a perfectly well-defined object when viewed as a distribution: it simply evaluates a test function at the origin. It is linear (evaluating a linear combination at zero gives the linear combination of evaluations) and continuous (if test functions converge, their values at zero converge). The Dirac delta only becomes problematic if you try to treat it as an ordinary function.

Exercise

Show that the relation

fˉ(ϕ)=0t2ϕ(t)dt,ϕS\bar{f}(\phi) = \int_0^{\infty} t^2 \phi(t)\,dt, \qquad \phi \in \mathcal{S}

defines a distribution fˉS\bar{f} \in \mathcal{S}^*.

Equivalence and Convergence of Distributions

Two distributions γˉ\bar{\gamma} and ζˉ\bar{\zeta} are equal if

γˉ(f)=ζˉ(f),fS\bar{\gamma}(f) = \bar{\zeta}(f), \quad \forall f \in \mathcal{S}

DefinitionConvergence of Distributions

A sequence of distributions {γˉn}\{\bar{\gamma}_n\} converges to a distribution γˉ\bar{\gamma} if

γˉn(f)γˉ(f),fS\bar{\gamma}_n(f) \to \bar{\gamma}(f), \quad \forall f \in \mathcal{S}

Remark.

Intuition: Convergence of distributions is exactly weak^* convergence in the dual space S\mathcal{S}^*. A sequence of distributions converges if and only if it converges pointwise on every test function. This is a very permissive notion of convergence, which is exactly why objects like the Dirac delta can be obtained as limits of ordinary functions.

Observe that the above notion is identical to the weak^* convergence notion discussed earlier in Definition.

ExampleRegular Distributions Converging to the Dirac Delta

Let for jZ+j \in \mathbb{Z}_+, j>0j > 0

fj(t)={j,if 0t1j0elsef_j(t) = \begin{cases} j, & \text{if } 0 \leq t \leq \frac{1}{j} \\ 0 & \text{else} \end{cases} \qquad \text{}

a) For any real-valued function gSg \in \mathcal{S}, define

fˉj(g):=0fj(t)g(t)dt.\bar{f}_j(g) := \int_0^{\infty} f_j(t)g(t)dt.

Then fˉj\bar{f}_j is a distribution on S\mathcal{S} for every jNj \in \mathbb{N}.

b) We have that

limj0fj(t)g(t)dt=δˉ(g)=g(0).\lim_{j \to \infty} \int_0^{\infty} f_j(t)g(t)dt = \bar{\delta}(g) = g(0).

Conclude that, the sequence of regular distributions fˉj\bar{f}_j, represented by a real-valued, integrable function fjf_j, converges to the Dirac delta distribution δˉ\bar{\delta} on the space of test functions in S\mathcal{S}.

In fact, we can find many other functions which can define regular distributions whose limit is the delta distribution. This motivates the following section.


Approximate Identity Sequences

DefinitionApproximate Identity Sequence

Let ψn:RR\psi_n : \mathbb{R} \to \mathbb{R} be a sequence such that

  • ψn(t)0\psi_n(t) \geq 0, tR,nN\quad t \in \mathbb{R}, n \in \mathbb{N}.
  • ψn(t)dt=1\int \psi_n(t)dt = 1, nN\quad n \in \mathbb{N}.
  • limnδtψn(t)dt=0\lim_{n \to \infty} \int_{\delta \leq |t|} \psi_n(t)dt = 0, δ>0\quad \forall \delta > 0.

Such ψn\psi_n sequences are called approximate identity sequences.

Remark.

Intuition: An approximate identity sequence is a family of functions that become more and more concentrated around the origin while maintaining unit area. As nn \to \infty, they "squeeze" all their mass into an infinitesimally small region near t=0t = 0. In the limit, they behave like the Dirac delta: integrating them against any test function ϕ\phi yields ϕ(0)\phi(0). They provide a concrete, constructive way to approach the delta distribution through ordinary functions.

We have seen one example in . The result discussed generalizes to any approximate identity sequence:

TheoremApproximate Identity Sequences Converge to delta

Distributions represented by approximate identity sequences converge to the Dirac delta distribution as nn \to \infty.

Remark.

Intuition: No matter how you construct your approximate identity sequence (rectangles, Gaussians, sinc-like functions, etc.), the associated distributions always converge to the Dirac delta. This universality is what makes the delta distribution a robust and natural object -- it is the unique limit of any sequence of non-negative unit-area functions concentrating at the origin.

Examples of Approximate Identity Sequences

ExampleRectangle Approximate Identity

Let for nNn \in \mathbb{N},

fn(t)={n,if 0t1n0elsef_n(t) = \begin{cases} n, & \text{if } 0 \leq t \leq \frac{1}{n} \\ 0 & \text{else} \end{cases}

Such a sequence is an example of an approximate identity sequence.

Observe that for ϕS\phi \in \mathcal{S}, if we define

fˉn(ϕ):=0fn(t)ϕ(t)dt,\bar{f}_n(\phi) := \int_0^{\infty} f_n(t)\phi(t)dt,

it follows that fˉn\bar{f}_n is a distribution on S\mathcal{S} and we can show that

limn0fn(t)ϕ(t)dt=δˉ(ϕ)=ϕ(0).\lim_{n \to \infty} \int_0^{\infty} f_n(t)\phi(t)dt = \bar{\delta}(\phi) = \phi(0).

We thus conclude that, the sequence of regular distributions fˉn\bar{f}_n, represented by the real-valued, integrable function fnf_n, converges to the Dirac delta distribution δˉ\bar{\delta}.

ExampleGaussian Approximate Identity

Another very important example for such approximate identity sequences is the following Gaussian sequence of functions given by

fn(t)=12π1ne12t21nf_n(t) = \frac{1}{\sqrt{2\pi\frac{1}{n}}}e^{-\frac{1}{2}\frac{t^2}{\frac{1}{n}}} \qquad \text{}

Observe that each element fnf_n of this sequence lives in S\mathcal{S}, which will be very consequential.

TheoremCosine Approximate Identity (Proposition 3.4.1)

Consider the sequence

ψn(x)=cn(1+cos(x))n1{xπ}\psi_n(x) = c_n(1 + \cos(x))^n 1_{\{|x| \leq \pi\}}

where cnc_n is so that ψn(x)dx=1\int \psi_n(x)dx = 1. We have that

limxδψn(x)dx=0δ>0.\lim \int_{|x| \geq \delta} \psi_n(x)dx = 0 \qquad \forall \delta > 0.

Remark.

Intuition: The sequence (1+cos(x))n(1 + \cos(x))^n concentrates more and more sharply around x=0x = 0 as nn grows, because 1+cos(x)1 + \cos(x) achieves its maximum value of 22 at x=0x = 0 and is strictly less than 22 everywhere else on [π,π][-\pi, \pi]. After normalizing to have unit integral, this becomes an approximate identity sequence.

ExampleSinc Approximate Identity

One further very useful sequence, which does not satisfy the non-negativity property above, but nonetheless satisfies the convergence property (to δˉ\bar{\delta}) is the following sequence:

ψn(x)=sin(nx)πx.\psi_n(x) = \frac{\sin(nx)}{\pi x}. \qquad \text{}

TheoremSinc Sequence Converges to delta

For any ϕS\phi \in \mathcal{S}, with ψn\psi_n as in

limnψn(x)ϕ(x)dx=ϕ(0)\lim_{n \to \infty} \int \psi_n(x)\phi(x)dx = \phi(0) \qquad \text{}

Remark.

Intuition: Even though the sinc function sin(nx)πx\frac{\sin(nx)}{\pi x} takes negative values and thus is not a true approximate identity sequence, it still converges to the Dirac delta when integrated against Schwartz functions. This particular sequence is intimately connected to the Fourier transform and plays a central role in sampling theory.

Convolution and its use in approximations

The convolution of two functions (whenever this integration is well-defined) is defined as:

(ψϕ)(t)=ψ(τ)ϕ(tτ)dτ=ϕ(τ)ψ(tτ)dτ(\psi * \phi)(t) = \int \psi(\tau)\phi(t - \tau)d\tau = \int \phi(\tau)\psi(t - \tau)d\tau

The convolution can be defined for any pair of functions which are in L2(R;R)L_2(\mathbb{R}; \mathbb{R}). The convolution of two functions in S\mathcal{S} is also in S\mathcal{S}.

A very useful result is the following.

TheoremApproximate Identity Convolution

If ψn\psi_n is an approximate identity sequence, then,

(ψnf)(t)f(t),(\psi_n * f)(t) \to f(t),

for every continuous and bounded function f:RRf : \mathbb{R} \to \mathbb{R}, uniformly on compact sets [a,b]R[a,b] \subset \mathbb{R}.

Remark.

Intuition: Convolving a function with an approximate identity sequence "smooths" it, and as nn \to \infty the smoothed version converges back to the original function. This is the precise sense in which the Dirac delta is the identity element for convolution: convolving with δ\delta leaves a function unchanged, and approximate identities approximate this behavior.

Note that with ψn\psi_n defined as in , (ψnϕ)(\psi_n \star \phi) is always infinitely differentiable, and one may conclude the following:

CorollaryDensity of Smooth Functions

The space of smooth functions are dense in the space of continuous functions with a compact support under the supremum norm.

Remark.

Intuition: Any continuous function with compact support can be approximated arbitrarily well (in supremum norm) by a smooth function. This is achieved by convolving with a Gaussian approximate identity, which always produces an infinitely differentiable result.

Completeness of complex exponentials in L2([π,π];C)L_2([-\pi, \pi]; \mathbb{C})

Using Theorem, with

ψn(t)=cn(1+cos(t))n,\psi_n(t) = c_n(1 + \cos(t))^n,

(which is an approximate identity sequence as shown in Proposition, when cnc_n is picked so that ψn(t)dt=1\int \psi_n(t)dt = 1), we can prove the following:

TheoremCompleteness of Complex Exponentials

The family of complex exponentials in L2([π,π];C)L_2([-\pi, \pi]; \mathbb{C}):

{en(t)}={12πeint,nZ}\{e_n(t)\} = \left\{\frac{1}{\sqrt{2\pi}}e^{int}, \quad n \in \mathbb{Z}\right\}

forms an orthonormal sequence which is complete.

Remark.

Intuition: This theorem justifies the Fourier series: the complex exponentials {eint}\{e^{int}\} form a complete orthonormal basis for L2([π,π];C)L_2([-\pi, \pi]; \mathbb{C}), meaning every square-integrable periodic function can be represented as a (possibly infinite) sum of these exponentials with no information lost. There is no "missing direction" in the space that the exponentials cannot reach.

This sequence is used for the Fourier expansion of functions in L2([0,2π];C)L_2([0, 2\pi]; \mathbb{C}); see the relevant section.


Some Operations on Distributions [Optional]

While studying several properties of distributions, one typically first starts with a generalized distribution and tries to extend the properties to singular distributions.

One important property of distributions is that every distribution has a derivative. Furthermore, we will also be taking the Fourier transform of distributions, but the derivative, once again, will have a meaning as a distribution; that is it will only have a meaning when it is applied to a class of test functions.

DefinitionDerivative of a Distribution

The derivative of a distribution γˉS\bar{\gamma} \in \mathcal{S}^* is defined as:

(Dγˉ)(ϕ)=γˉ(dϕdt),ϕS.(D\bar{\gamma})(\phi) = -\bar{\gamma}\left(\frac{d\phi}{dt}\right), \qquad \phi \in \mathcal{S}.

Remark.

Intuition: The derivative of a distribution is defined by "shifting" the derivative onto the test function (with a sign change), which is just integration by parts without boundary terms (since Schwartz functions vanish at infinity). This means every distribution -- even singular ones like the Dirac delta -- is differentiable, a stark contrast to ordinary calculus where most functions are not differentiable everywhere.

We can check if this definition is consistent with a distribution represented by a regular function. Consider and note that through integration by parts

ddtγ(t)ϕ(t)dt=γ(t)ddtϕ(t)=γˉ(dϕdt).\int \frac{d}{dt}\gamma(t)\phi(t)dt = -\int \gamma(t)\frac{d}{dt}\phi(t) = -\bar{\gamma}\left(\frac{d\phi}{dt}\right).

ExampleDerivative of the Step Function Distribution

Given the definition of a distributional derivative, we show that the distributional derivative of the distribution uˉ\bar{u} represented by the unit step function is the Dirac delta distribution: Let u(t)u(t) denote the step function: that is u(t)=1{t0}u(t) = 1_{\{t \geq 0\}} (1{}1_{\{\cdot\}} being the indicator function). Define for ϕS\phi \in \mathcal{S},

uˉ(ϕ)=Ru(t)ϕ(t)dt=0ϕ(t)dt.\bar{u}(\phi) = \int_{\mathbb{R}} u(t)\phi(t)dt = \int_0^{\infty} \phi(t)dt.

We can verify that the Dirac delta distribution is the derivative of the step distribution above:

Duˉ(ϕ)=uˉ(Dϕ)=0(Dϕ)(t)=limtϕ(t)+ϕ(0)=δˉ(ϕ),D\bar{u}(\phi) = -\bar{u}(D\phi) = -\int_0^{\infty} (D\phi)(t) = -\lim_{t \to \infty} \phi(t) + \phi(0) = \bar{\delta}(\phi),

for all ϕS\phi \in \mathcal{S} (in the above equation, ϕS\phi \in \mathcal{S} allows us to use that limtϕ(t)=0\lim_{t \to \infty} \phi(t) = 0).

This is an important relationship in engineering applications; for example, the step function often models a turn-on event for a switch in circuit theory and its derivative is often approximated to be the Dirac delta function (to be cautiously interpreted).

Convolution of Distributions

Let Fˉ\bar{F} be a distribution given by Fˉ(ϕ)=F(t)ϕ(t)dt\bar{F}(\phi) = \int F(t)\phi(t)dt.

The convolution of FF with ϕS\phi \in \mathcal{S} would be:

F(τ)ϕ(tτ)dτ\int F(\tau)\phi(t - \tau)d\tau

We can interpret this as a distribution in the following sense. Let Tt(ϕ)(τ)=ϕ(τt)T_t(\phi)(\tau) = \phi(\tau - t) be the shifting operator and Rg(x)=g(x)Rg(x) = g(-x) be the inversing operator. Then,

F(τ)ϕ(tτ)dτ=F(τ)(RTtϕ)(τ)=Fˉ((RTtϕ))\int F(\tau)\phi(t - \tau)d\tau = \int F(\tau)(RT_t\phi)(\tau) = \bar{F}((RT_t\phi))

This then motivates the following: The convolution of a function ϕ\phi in S\mathcal{S} and a distribution fˉ\bar{f} is defined by:

(ϕfˉ)(t)=f(τ)ϕ(tτ)dτ=fˉ(RTtϕ),(\phi * \bar{f})(t) = \int f(\tau)\phi(t - \tau)d\tau = \bar{f}(RT_t\phi),

where, as before, Rg(x)=g(x)Rg(x) = g(-x) is the inversing operator and Tt(ϕ)(τ)=ϕ(τt)T_t(\phi)(\tau) = \phi(\tau - t) is the shifting operator.

TheoremConvolution with a Distribution

For any distribution fˉ\bar{f} and ϕ\phi in S\mathcal{S}, ϕfˉ\phi * \bar{f} is an infinitely differentiable function and can be used to represent a regular distribution.

Remark.

Intuition: Convolving any distribution (even a singular one) with a Schwartz function always produces a smooth, well-behaved function. This is a powerful regularization property: the smoothness of the test function "wins out" and tames any singularity.

Let fˉ,gˉ\bar{f}, \bar{g} be two regular distributions represented by f,gf, g, respectively. The convolution of fˉgˉ\bar{f} \star \bar{g} is given by the relation, whenever this is well-defined:

(fˉgˉ)(ϕ)=fˉ(hg(ϕ)),ϕS,\left(\bar{f} \star \bar{g}\right)(\phi) = \bar{f}(h_g(\phi)), \quad \forall \phi \in \mathcal{S},

with

hg(ϕ)=g(τt)ϕ(τ)dτh_g(\phi) = \int_{-\infty}^{\infty} g(\tau - t)\phi(\tau)d\tau

It should be observed that, with the above definition:

(fˉδˉ)(ϕ)=fˉ(ϕ),ϕS,\left(\bar{f} \star \bar{\delta}\right)(\phi) = \bar{f}(\phi), \quad \forall \phi \in \mathcal{S},

that is the delta distribution is the identity element in distributions under the operation of convolution.

Let ψn\psi_n be the approximate identity sequence given by , so that ψnS\psi_n \in \mathcal{S}. Then, it can be shown that for any singular gˉ\bar{g}, ψngˉ\psi_n * \bar{g} is a smooth function, and can be used to represent a regular distribution such that (ψngˉ)(ϕ)=gˉ(ψn(tτ)ϕ(t)dt)gˉ(ϕ)(\psi_n * \bar{g})(\phi) = \bar{g}(\int \psi_n(t - \tau)\phi(t)dt) \to \bar{g}(\phi) for any ϕS\phi \in \mathcal{S} (by linearity and by continuity of gˉ\bar{g}). Accordingly, for any singular distribution, there exists a sequence of regular distributions which converges to the singular distribution.


Fourier Transform of Schwartz Functions

We will continue the discussion of Schwartz functions in the context of Fourier transforms. One appealing aspect of Schwartz functions is that the Fourier transform of a Schwartz function lives in the space of Schwartz functions. In fact, the Fourier transform on the space of Schwartz functions is both onto and one to one (hence a bijection). This will be proven later. Since the space of continuous functions is dense in the space of square integrable functions, and S\mathcal{S} is dense in the space of continuous functions under the supremum norm by Theorem, we will use the bijection property of the Fourier transform on S\mathcal{S} to define the Fourier transform of square integrable functions.


Appendix

Optional: Application to Optimization Problems and the Generalization of the Projection Theorem

The duality results and Holder's inequality are important in applications to optimization problems. The geometric ideas we reviewed in the context of the projection theorem apply very similarly to such spaces, where the inner-product is replaced by the duality pairings. Let us make this more explicit: Let for a subspace MM,

M:={x:m,x=0,mM}.M^\perp := \{x^* : \langle m, x^* \rangle = 0, \forall m \in M\}.

TheoremDistance and Dual Characterization (Projection Generalization)

(i) Let xx be an element in a real normed space XX and let dd denote its distance from a subspace MM. Then,

d=infmMxm=max{x1,xM}x,xd = \inf_{m \in M} \|x - m\| = \max_{\{\|x^*\| \leq 1, x^* \in M^\perp\}} \langle x, x^* \rangle

If the infimum is achieved, then the maximum on the right is achieved for some x0x_0^* such that xm0x - m_0 is aligned with x0x_0^*.

(ii) In particular, if m0m_0 satisfies

xm0xm,mM,\|x - m_0\| \leq \|x - m\|, \forall m \in M,

there must be a non-zero vector xXx^* \in X^* such that m,x0=0\langle m, x_0^* \rangle = 0 for all mm and xx^* is aligned with xm0x - m_0.

Remark.

Intuition: This theorem generalizes the projection theorem from Hilbert spaces to general normed spaces using duality. The distance from a point to a subspace can be computed by maximizing a dual functional over unit-norm elements in the annihilator MM^\perp. This is the foundation of duality in optimization: the "primal" problem (minimizing distance) equals the "dual" problem (maximizing a linear functional).

TheoremDual Distance Characterization

Let MM be a subspace in a real normed space XX. Let xXx^* \in X^* be at a distance dd from MM^\perp. Then, (i)

d=minmMxm=supxM,x1x,x,d = \min_{m^* \in M^\perp} \|x^* - m^*\| = \sup_{x \in M, \|x\| \leq 1} \langle x, x^* \rangle,

where the minimum on the left is achieved for m0Mm_0^* \in M^\perp. (ii) If the supremum on the right is achieved for some x0Mx_0 \in M, then xm0x^* - m_0^* is aligned with x0x_0.

Remark.

Intuition: This is the "dual version" of the previous theorem: now we are measuring the distance of a dual element xx^* from the annihilator MM^\perp, and this equals the supremum of x,x\langle x, x^* \rangle over unit-norm elements in MM. Together with Theorem, these results establish the symmetric duality between primal and dual optimization problems.

An Application: Constrained Dual Optimization Problems

Consider the following constrained optimization problem:

d=minx:yi,x=ci,1inxd = \min_{x^*: \langle y_i, x^* \rangle = c_i, 1 \leq i \leq n} \|x^*\|

Observe that if xˉ\bar{x}^* is any vector satisfying the constraints, then

d=minx:yi,x=ci,1inx=minmMxˉm,d = \min_{x^*: \langle y_i, x^* \rangle = c_i, 1 \leq i \leq n} \|x^*\| = \min_{m^* \in M^\perp} \|\bar{x}^* - m^*\|,

where MM denotes the space spanned by {y1,y2,,yn}\{y_1, y_2, \cdots, y_n\} and xˉ\bar{x}^* is some vector satisfying the constraints.

From Theorem, we have that

d=minmMxˉm=supxM,x1x,xˉ,d = \min_{m^* \in M^\perp} \|\bar{x}^* - m^*\| = \sup_{x \in M, \|x\| \leq 1} \langle x, \bar{x}^* \rangle,

Now, any vector in MM is of the form m=Yam = Ya where Y=[y1y2yn]Y = \begin{bmatrix} y_1 & y_2 & \cdots & y_n \end{bmatrix} is a matrix and aa is a column vector. Thus,

d=minx:yi,x=ci,1inx=supYa1Ya,xˉ=supYa1cTa,d = \min_{x^*: \langle y_i, x^* \rangle = c_i, 1 \leq i \leq n} |x^*| = \sup_{\|Ya\| \leq 1} \langle Ya, \bar{x}^* \rangle = \sup_{\|Ya\| \leq 1} c^T a,

where the last equality follows because xˉ\bar{x}^* satisfies the constraints and that

Ya,xˉ=a,YTxˉ=cTa\langle Ya, \bar{x}^* \rangle = \langle a, Y^T \bar{x}^* \rangle = c^T a

Thus, the optimal solution to the constrained problem can be written as

supYa1cTa,\sup_{\|Ya\| \leq 1} c^T a,

where the optimal xx^* is aligned with the optimal YaYa.

In the following, we present another approach to arrive at the above.

minx:yi,x=ci,1inx\min_{x^*: \langle y_i, x^* \rangle = c_i, 1 \leq i \leq n} \|x^*\|

=minxmaxλx+λT(cYTx)= \min_{x^*} \max_{\lambda} \|x^*\| + \lambda^T(c - Y^T x^*)

=maxλminxx+λ,cλT(YTx)= \max_{\lambda} \min_{x^*} \|x^*\| + \langle \lambda, c \rangle - \lambda^T(Y^T x^*) \qquad \text{}

=maxλminxx+λ,c(Yλ)T(x)= \max_{\lambda} \min_{x^*} \|x^*\| + \langle \lambda, c \rangle - (Y\lambda)^T(x^*)

=maxλminxx+λ,cYλx= \max_{\lambda} \min_{x^*} \|x^*\| + \langle \lambda, c \rangle - \|Y\lambda\|\|x^*\|

=maxλminxx(1Yλ)+λ,c= \max_{\lambda} \min_{x^*} \|x^*\|(1 - \|Y\lambda\|) + \langle \lambda, c \rangle

=maxλ:Yλ1minxx(1Yλ)+λ,c= \max_{\lambda: \|Y\lambda\| \leq 1} \min_{x^*} \|x^*\|(1 - \|Y\lambda\|) + \langle \lambda, c \rangle

=maxλ:Yλ1λ,c= \max_{\lambda: \|Y\lambda\| \leq 1} \langle \lambda, c \rangle \qquad \text{}

where the optimal xx^* is aligned with YλY\lambda. In the above, follows from Sion's minimax theorem. In the above, λ\lambda serves as a Lagrangian multiplier.


Exercises

Exercise

Does there exist a sequence of functions {fj}\{f_j\} in L2(R+;R)L_2(\mathbb{R}_+; \mathbb{R}) such that a sequence of distributions fˉj\bar{f}_j represented by fjf_j on the set of Schwartz functions S\mathcal{S} converges to zero in a distributional sense, but fjf_j does not converge to zero (that is, in the L2L_2 norm). That is, does there exist a sequence of functions {fj}\{f_j\} in L2(R+;R)L_2(\mathbb{R}_+; \mathbb{R}) such that

limj(0fj(t)2dt)\lim_{j \to \infty} \left(\int_0^{\infty} |f_j(t)|^2 dt\right)

is not zero, but

limj(0fj(t)ϕ(t)dt)=0,ϕS.\lim_{j \to \infty} \left(\int_0^{\infty} f_j(t)\phi(t)dt\right) = 0, \qquad \forall \phi \in \mathcal{S}.

If there exists one, give an example. If there does not exist one, explain why.

Exercise

(a) Let TT be a mapping from L2(R+;R)L_2(\mathbb{R}_+; \mathbb{R}) to R\mathbb{R} (extended to possibly include ,-\infty, \infty) given by:

T(f)=R+f(t)t1+t2dtT(f) = \int_{\mathbb{R}_+} f(t) \frac{t}{1 + t^2}\,dt

Let f0L2(R+;R)f_0 \in L_2(\mathbb{R}_+; \mathbb{R}) be given by:

f0(t)=1t2+1,tR+.f_0(t) = \frac{1}{t^2 + 1}, \qquad \forall t \in \mathbb{R}_+.

Is TT continuous on L2(R+;R)L_2(\mathbb{R}_+; \mathbb{R}) at f0f_0?

(b) Let S\mathcal{S} be the space of Schwartz functions. Let T:SRT : \mathcal{S} \to \mathbb{R} be a mapping given by:

T(ϕ)=ϕ(0),ϕS,T(\phi) = \phi'(0), \qquad \phi \in \mathcal{S},

where

ϕ(t)=ddtϕ(t)t.\phi'(t) = \frac{d}{dt}\phi(t) \quad \forall t.

Is TT a distribution on S\mathcal{S}? That is, is TT continuous and linear on S\mathcal{S}?

Exercise

Let T:S[,]T : \mathcal{S} \to [-\infty, \infty] be a mapping defined by:

T(ϕ)=lim supAAAϕ(t)et2dtT(\phi) = \limsup_{A \to \infty} \int_{-A}^{A} \phi(t) e^{t^2}\,dt

Is TT continuous on S\mathcal{S}? Prove your argument.

Hint: The function g(t)=eat2g(t) = e^{-at^2} is in S\mathcal{S}, for any a>0a > 0.

Exercise

Let for jNj \in \mathbb{N},

fj(t)={j,if 0t1j0elsef_j(t) = \begin{cases} j, & \text{if } 0 \leq t \leq \frac{1}{j} \\ 0 & \text{else} \end{cases}

For gSg \in \mathcal{S}, define

fˉj(g):=0fj(t)g(t)dt.\bar{f}_j(g) := \int_0^{\infty} f_j(t) g(t)\,dt.

Show that fˉj()\bar{f}_j(\cdot) is a distribution on S\mathcal{S}. Show that

limj0fj(t)g(t)dt=δˉ(g)=g(0).\lim_{j \to \infty} \int_0^{\infty} f_j(t) g(t)\,dt = \bar{\delta}(g) = g(0).

Conclude that, the sequence of regular distributions fˉj()\bar{f}_j(\cdot), represented by a real-valued, integrable function fj(t)f_j(t), converges to the delta distribution δˉ()\bar{\delta}(\cdot) on the space of test functions S\mathcal{S}.

Exercise

Let S\mathcal{S} be the space of Schwartz functions. Let T:SRT : \mathcal{S} \to \mathbb{R} be a mapping given by:

T(ϕ)=ϕ(0),ϕS,T(\phi) = \phi'(0), \qquad \phi \in \mathcal{S},

where

ϕ(t)=ddtϕ(t)t.\phi'(t) = \frac{d}{dt}\phi(t) \quad \forall t.

Is TT a distribution on S\mathcal{S}? That is, is TT continuous and linear on S\mathcal{S}?