Home/Chapter 2

Signal Spaces: Linear, Banach and Hilbert Spaces, and Basis Expansions

Normed linear spaces, metric spaces, Banach spaces, inner product spaces, Hilbert spaces. Orthogonality, separability, and signal expansions using Fourier, Haar, and polynomial bases.

In this chapter, we present a general review of signal spaces. These spaces form the mathematical foundation for everything that follows in the course: systems, transforms, stability, and control all rely on understanding the structure of the spaces in which signals live.


Normed Linear (Vector) Spaces and Metric Spaces

Linear Spaces

DefinitionLinear (Vector) Space

A linear (vector) space X\mathbb{X} is a space which is closed under addition operation ++ and a scalar multiplication operation \cdot, such that

+:X×XX+ : \mathbb{X} \times \mathbb{X} \to \mathbb{X}

:C×XX\cdot : \mathbb{C} \times \mathbb{X} \to \mathbb{X}

with the following properties (we note that we may take the scalars to be either real or complex numbers). The following are satisfied for all x,yXx, y \in \mathbb{X} and α,β\alpha, \beta scalars:

(i) x+y=y+xXx + y = y + x \in \mathbb{X}

(ii) (x+y)+z=x+(y+z)(x + y) + z = x + (y + z)

(iii) α(x+y)=αx+αyX\alpha \cdot (x + y) = \alpha \cdot x + \alpha \cdot y \in \mathbb{X}

(iv) (α+β)x=αx+βx(\alpha + \beta) \cdot x = \alpha \cdot x + \beta \cdot x

(v) There is a null vector 0\underline{0} such that x+0=xx + \underline{0} = x

(vi) α(βx)=(αβ)x\alpha \cdot (\beta \cdot x) = (\alpha\beta) \cdot x

(vii) 1x=x1 \cdot x = x

(viii) For every xXx \in \mathbb{X}, there exists an element, called the (additive) inverse of xx and denoted with x-x with the property x+(x)=0x + (-x) = \underline{0}.

Remark.

Intuition: A linear space is the most basic algebraic structure for working with signals: it guarantees you can add signals together, scale them, and these operations behave predictably. Think of it as the minimal "sandbox" in which superposition -- the cornerstone of linear systems theory -- makes sense.

Intuitively, a linear space is a set where you can add elements together and scale them, and these operations behave the way you would expect. This is the most basic algebraic structure we need to talk about "signals" in a mathematically precise way.

ExampleExamples of Linear Spaces

(i) The space Rn\mathbb{R}^n with the pointwise addition and scalar multiplication is a linear space. The null vector is 0=[000]Rn\underline{0} = \begin{bmatrix} 0 & 0 & \cdots & 0 \end{bmatrix} \in \mathbb{R}^n.

(ii) Consider the interval [a,b][a, b]. The collection of real-valued continuous functions on [a,b][a, b], with pointwise addition and scalar multiplication is a linear space. The null element 0\underline{0} is the function which is identically 00. This space is called the space of real-valued continuous functions on [a,b][a, b].

(iii) With pointwise addition and scalar multiplication, the set of all infinite sequences of real numbers mapping Z+\mathbb{Z}_+ to R\mathbb{R}, each having only a finite number of elements not equal to zero is a vector space; for example, if one adds two such sequences, the sum also belongs to this space. This space is called the space of finitely many non-zero sequences.

(iv) The collection of all polynomial functions defined on an interval [a,b][a, b] with complex coefficients

{f:[a,b]C,f(x)=i=0nαixi;    nN;x[0,1],α0,α1,,αnC},\left\{f : [a, b] \to \mathbb{C},\, f(x) = \sum_{i=0}^{n} \alpha_i x^i;\;\; n \in \mathbb{N};\, x \in [0, 1],\, \alpha_0, \alpha_1, \cdots, \alpha_n \in \mathbb{C}\right\},

forms a complex linear space. Note that the sum of polynomials is another polynomial.

(v) The nn-dimensional integer lattice Zn={v:v={v1,,vn}Rn,vkZ,k=1,,n}\mathbb{Z}^n = \{v : v = \{v_1, \cdots, v_n\} \in \mathbb{R}^n, v_k \in \mathbb{Z}, k = 1, \cdots, n\} with pointwise addition and scalar multiplication is not a linear space.

DefinitionSubspace

A non-empty subset MM of a (real) linear vector space X\mathbb{X} is called a subspace of X\mathbb{X} if

αx+βyM,x,yMandα,βR.\alpha x + \beta y \in M, \qquad \forall x, y \in M \quad \text{and} \quad \alpha, \beta \in \mathbb{R}.

Remark.

Intuition: A subspace is a "self-contained" portion of a vector space -- any linear combination of elements in the subspace stays in the subspace. In signal processing, subspaces often represent restricted signal classes (e.g., band-limited signals or signals spanned by a finite set of basis functions) within a larger signal space.

In particular, the null element 0\underline{0} is an element of every subspace. For MM, NN two subspaces of a vector space X\mathbb{X}, MNM \cap N is also a subspace of X\mathbb{X}.

Normed Linear Spaces

DefinitionNormed Linear Space

A normed linear space X\mathbb{X} is a linear vector space on which a map from X\mathbb{X} to R+\mathbb{R}_+, called its norm, is defined such that:

  • x0xX\|x\| \geq 0 \quad \forall x \in \mathbb{X}, and x=0\|x\| = 0 if and only if xx is the null element of X\mathbb{X}.
  • x+yx+y\|x + y\| \leq \|x\| + \|y\| (triangle inequality)
  • αx=αx,αR,xX\|\alpha x\| = |\alpha|\|x\|, \quad \forall \alpha \in \mathbb{R}, \quad \forall x \in \mathbb{X}.
Remark.

Intuition: A norm gives us a way to measure "size" or "distance" in a vector space -- just like absolute value does for real numbers, but generalized to arbitrary signal spaces. Without a norm, we can add and scale vectors but have no notion of how large a signal is or how far apart two signals are. The choice of norm (e.g., peak value vs. total energy) determines what "closeness" means for your application.

The norm gives us a way to measure the "size" of a signal. Different norms measure size in different ways -- for instance, the maximum value of a signal vs. its total energy. The choice of norm profoundly affects which signals are "small" or "large."

Let for two linear spaces X\mathbb{X}, Y\mathbb{Y}, Γ(X;Y)\Gamma(\mathbb{X}; \mathbb{Y}) denote the set of all maps γ:XY\gamma : \mathbb{X} \to \mathbb{Y} such that for all xXx \in \mathbb{X}, γ(x)Y\gamma(x) \in \mathbb{Y}.

ExampleExamples of Normed Linear Spaces

a) The space C([a,b];R)C([a, b]; \mathbb{R}) of continuous functions from [a,b][a, b] to R\mathbb{R} with the norm x=max{atb}x(t)\|x\| = \max_{\{a \leq t \leq b\}} |x(t)| is a normed linear space.

b) For 1p<1 \leq p < \infty:

lp(Z+;R):={xΓ(Z+;R):xp=(iZ+x(i)p)1/p<}l_p(\mathbb{Z}_+; \mathbb{R}) := \left\{x \in \Gamma(\mathbb{Z}_+; \mathbb{R}) : \|x\|_p = \left(\sum_{i \in \mathbb{Z}_+} |x(i)|^p\right)^{1/p} < \infty\right\}

is a normed linear space for all 1p<1 \leq p < \infty.

c) Recall that if SS is a set of real numbers bounded from above, then there is a smallest real number yy such that xyx \leq y for all xSx \in S. The number yy is called the least upper bound or supremum of SS. If SS is not bounded from above, then the supremum is \infty. In view of this, for p=p = \infty, we define

l(Z+;R):={xΓ(Z+;R):x=supiZ+x(i)<}.l_\infty(\mathbb{Z}_+; \mathbb{R}) := \left\{x \in \Gamma(\mathbb{Z}_+; \mathbb{R}) : \|x\|_\infty = \sup_{i \in \mathbb{Z}_+} |x(i)| < \infty\right\}.

d) Lp([a,b];R)={xΓ([a,b];R):xp=(abx(t)pdt)1/p<}L_p([a, b]; \mathbb{R}) = \left\{x \in \Gamma([a, b]; \mathbb{R}) : \|x\|_p = \left(\int_a^b |x(t)|^p\,dt\right)^{1/p} < \infty\right\} is a normed linear space. For p=p = \infty, we typically write: L([a,b];R):={xΓ([a,b];R):x=supt[a,b]x(t)<}L_\infty([a, b]; \mathbb{R}) := \left\{x \in \Gamma([a, b]; \mathbb{R}) : \|x\|_\infty = \sup_{t \in [a,b]} |x(t)| < \infty\right\}. However, for 1p<1 \leq p < \infty, to satisfy the condition that xp=0\|x\|_p = 0 implies that x(t)=0x(t) = 0, we need to assume that functions which are equal to zero almost everywhere are equivalent; for p=p = \infty the definition is often revised with essential supremum instead of supremum so that

x=infy:y(t)=x(t)a.e.supt[a,b]y(t)\|x\|_\infty = \inf_{y:\, y(t) = x(t)\,\text{a.e.}} \sup_{t \in [a, b]} |y(t)|

This subtle difference needs to be made explicit in some applications.

To show that lpl_p defined above is a normed linear space, we need to show that x+ypxp+yp\|x + y\|_p \leq \|x\|_p + \|y\|_p.

TheoremMinkowski's Inequality

For x,ylp(Z+;R)x, y \in l_p(\mathbb{Z}_+; \mathbb{R}) with 1p1 \leq p \leq \infty,

x+ypxp+yp\|x + y\|_p \leq \|x\|_p + \|y\|_p

Remark.

Intuition: Minkowski's inequality is the triangle inequality for lpl_p spaces. It says the "length" of the sum of two sequences is at most the sum of their "lengths." This is the key property that confirms lpl_p norms are genuine norms, and it generalizes the familiar fact that the shortest path between two points is a straight line.

See Exercise, which also studies a proof of Holder's Inequality, that is critically used in the proof of Minkowski's inequality.

TheoremHolder's Inequality

Let 1p,q1 \leq p, q \leq \infty with 1/p+1/q=11/p + 1/q = 1. Then, for xlp(Z+;R)x \in l_p(\mathbb{Z}_+; \mathbb{R}), ylq(Z+;R)y \in l_q(\mathbb{Z}_+; \mathbb{R}),

kx(k)y(k)xpyq,\sum_k x(k)y(k) \leq \|x\|_p \|y\|_q,

Remark.

Intuition: Holder's inequality bounds the "interaction" between two sequences living in dual lpl_p spaces. When p=q=2p = q = 2, this reduces to Cauchy-Schwarz. In practice, it is the workhorse behind proving that convolutions, inner products, and many signal processing operations produce finite results.

Transformations and Continuity

DefinitionTransformation

Let XX and YY be two normed linear spaces, and let BXB \subset X be a subset of XX. A law (rule, relation) TT which relates with every element of BB an element in YY, is called a transformation from XX to YY with domain BB. The relation is often expressed as xy=T(x)x \mapsto y = T(x).

If for every yYy \in Y there is an xx such that y=T(x)y = T(x), the transformation is said to be onto (or surjective). If for every element of YY, there is at most one xx such that y=T(x)y = T(x), the transformation is said to be one-to-one (or injective). If these two properties hold simultaneously, the transformation is said to be bijective.

Remark.

Intuition: A transformation is simply a rule that maps inputs to outputs -- this is the abstract notion of a "system." Surjectivity means every possible output is reachable, injectivity means distinct inputs always produce distinct outputs, and bijectivity means the mapping is perfectly reversible. These properties are essential when asking whether a system can be inverted or whether information is lost.

DefinitionLinear Transformation

A transformation T:XYT : X \to Y (or TΓ(X;Y)T \in \Gamma(X; Y)) is linear if for every x1,x2Xx_1, x_2 \in X and α1,α2R\alpha_1, \alpha_2 \in \mathbb{R}, we have T(α1x1+α2x2)=α1T(x1)+α2T(x2)T(\alpha_1 x_1 + \alpha_2 x_2) = \alpha_1 T(x_1) + \alpha_2 T(x_2).

Remark.

Intuition: Linearity means the system obeys superposition: the response to a sum of inputs equals the sum of the individual responses, and scaling the input scales the output by the same factor. This is the defining property of the systems we study in this course, and it is what makes Fourier analysis so powerful -- we can decompose any input into simple components, analyze each one separately, and add the results.

DefinitionContinuity

A transformation T:XYT : X \to Y for normed linear spaces X,YX, Y is continuous at x0Xx_0 \in X, if for every ϵ>0\epsilon > 0, δ>0\exists \delta > 0 such that xx0δ\|x - x_0\| \leq \delta implies that T(x)T(x0)ϵ\|T(x) - T(x_0)\| \leq \epsilon (here the norms depend on the vector spaces XX and YY). TT is said to be continuous, if it is continuous at every x0Xx_0 \in X.

Remark.

Intuition: Continuity means that small perturbations in the input produce small perturbations in the output -- the system does not "blow up" from tiny changes. This is the mathematical formalization of robustness: a continuous system behaves predictably under small measurement errors or noise.

DefinitionSequential Continuity

A transformation T:XYT : X \to Y is sequentially continuous at x0Xx_0 \in X, if xnxx_n \to x implies that T(xn)T(x)T(x_n) \to T(x).

Remark.

Intuition: Sequential continuity says the same thing as continuity but in the language of sequences: if a sequence of inputs converges to some input, then the corresponding outputs converge to the corresponding output. This formulation is often easier to verify in practice, since you can work with concrete sequences rather than abstract ϵ\epsilon-δ\delta arguments.

TheoremEquivalence of Sequential Continuity and Continuity

Sequential continuity and continuity are equivalent for normed linear spaces.

Remark.

Intuition: This theorem tells us that in normed linear spaces, the ϵ\epsilon-δ\delta definition of continuity and the sequence-based definition are interchangeable. You can use whichever is more convenient for the problem at hand -- they will always give the same answer.

TheoremContinuity of Linear Transformations

If the transformation TT is a linear one, then continuity is equivalent to being continuous at the null element.

Remark.

Intuition: For a linear transformation, you only need to check continuity at the origin -- continuity there automatically guarantees continuity everywhere. This is because linearity lets you "translate" any neighborhood to the origin. In practice, this greatly simplifies verifying that a linear system is well-behaved.

For some applications, sequential continuity may be more convenient to work with as one may not need to quantify (ϵ,δ)(\epsilon, \delta) pairs to verify continuity.

Metric Spaces

DefinitionMetric Spaces

A metric defined on a set XX, is a function d:X×XRd : X \times X \to \mathbb{R} such that:

  • d(x,y)0,x,yXd(x, y) \geq 0, \quad \forall x, y \in X and d(x,y)=0d(x, y) = 0 if and only if x=yx = y.
  • d(x,y)=d(y,x),x,yXd(x, y) = d(y, x), \quad \forall x, y \in X.
  • d(x,y)d(x,z)+d(z,y),x,y,zXd(x, y) \leq d(x, z) + d(z, y), \quad \forall x, y, z \in X.

A metric space (X,d)(X, d) is the set XX equipped with metric dd.

Remark.

Intuition: A metric generalizes the concept of "distance" to sets that may not be vector spaces. While a norm requires linear structure (addition and scaling), a metric only requires a sensible notion of distance between pairs of points. This makes metric spaces useful for analyzing convergence and continuity in settings where addition of elements may not be defined.

A normed linear space is also a metric space, with metric d(x,y)=xyd(x, y) = \|x - y\|. Many spaces of functions that we encounter are not linear spaces. For such spaces the concept of metric, defined above, provides an appropriate generalization of the notion of norm studied earlier.

Banach Spaces

DefinitionCauchy Sequence

A sequence {xn}\{x_n\} in a normed space XX is Cauchy if for every ϵ>0\epsilon > 0, there exists an NN such that xnxmϵ\|x_n - x_m\| \leq \epsilon, for all n,mNn, m \geq N.

Remark.

Intuition: A Cauchy sequence is one whose elements get arbitrarily close to each other as you go further along -- the sequence "wants" to converge, even if we do not yet know the limit. Think of it as a sequence that is "trying to settle down." The key question is whether the space contains the point it is settling toward.

An important observation on Cauchy sequences is that every converging sequence is Cauchy, however, not all Cauchy sequences are convergent: This is because the limit might not live in the original space where the sequence elements take values from. This motivates the property of completeness.

DefinitionBanach Spaces

A normed linear space XX is complete, if every Cauchy sequence in XX has a limit in XX. A complete normed linear space is called Banach.

Remark.

Intuition: Completeness means there are "no holes" in the space -- every sequence that looks like it should converge actually does converge to something inside the space. A Banach space is the right setting for iterative algorithms and approximations, because you are guaranteed that limits of approximating sequences exist. Without completeness, a perfectly reasonable iterative procedure might converge to something that is not even in your space.

Banach spaces are important for many applications in engineering and applied mathematics: In many mathematical applications (such as existence of and numerical methods for solutions to differential equations), machine learning problems (such as iterative updates of data driven dynamics), stochastic analysis or optimization problems (for which a sequence of approximating solutions may be obtained for example via dynamic programming methods), sometimes we would like to know whether a given sequence converges, without necessarily knowing what the limit of the sequence may be. Banach spaces allow us to use Cauchy sequence arguments to ensure the existence of limits and some of their properties.

An example is the following: consider the solutions to the Ax=bAx = b for AA a square matrix and bb a vector. One can identify conditions on an iteration of the form xk+1=(IA)xk+bx_{k+1} = (I - A)x_k + b to form a Cauchy sequence and converge to a solution through the contraction principle. As noted above, existence of solutions to ordinary differential equations also follow from Cauchy sequence arguments.

A subset of a Banach space XX is complete if and only if it is closed. If the set is closed, every Cauchy sequence in this set has a limit in XX and this limit should be a member of this set, hence the set is complete. If it is not closed, one can provide a counterexample sequence which does not converge to a point in the subset.

The real space R\mathbb{R} with the usual distance d(x,y)=xyd(x, y) = |x - y| is a complete space.

TheoremCompleteness of Function Spaces

a) Let C([0,1];R)C([0, 1]; \mathbb{R}) be the space of continuous functions in Γ([0,1];R)\Gamma([0, 1]; \mathbb{R}) with the supremum norm

f=supt[0,1]f(t).\|f\| = \sup_{t \in [0, 1]} |f(t)|.

This space is a Banach space.

b) Let C([0,1];R)C([0, 1]; \mathbb{R}) be the space of continuous functions in Γ([0,1];R)\Gamma([0, 1]; \mathbb{R}) with the norm

f=01f(t)dt\|f\| = \int_0^1 |f(t)|\,dt

This space is not a Banach space.

Remark.

Intuition: This theorem shows that completeness depends critically on the choice of norm. Under the supremum norm, the limit of continuous functions must be continuous (uniform convergence preserves continuity). But under the L1L_1 norm, a sequence of continuous functions can converge to a discontinuous function that is not in the original space. The lesson: the same set of functions can be "complete" or "incomplete" depending on how you measure distance.

This theorem illustrates a crucial point: the same set of functions can be complete or incomplete depending on the norm. The supremum norm is "strong enough" to prevent discontinuous limits, while the L1L_1 norm is not.

TheoremCompleteness of lp Spaces

lp(Z+;R):={xΓ(Z+;R):xp=(iZ+x(i)p)1/p<}l_p(\mathbb{Z}_+; \mathbb{R}) := \left\{x \in \Gamma(\mathbb{Z}_+; \mathbb{R}) : \|x\|_p = \left(\sum_{i \in \mathbb{Z}_+} |x(i)|^p\right)^{1/p} < \infty\right\} is a Banach space for all 1p<1 \leq p < \infty. The same holds for p=p = \infty.

Remark.

Intuition: The lpl_p sequence spaces are complete, meaning they are safe to work in: any "convergent-looking" sequence of signals in lpl_p will converge to something that is still in lpl_p. This is essential because many algorithms in signal processing and control construct solutions as limits of iterative approximations, and completeness guarantees those limits are well-defined.

Remark.

A brief remark on some typical notations: When the range space is R\mathbb{R}, the notation lp(Ω)l_p(\Omega) denotes lp(Ω;R)l_p(\Omega; \mathbb{R}) for a discrete-time index set Ω\Omega and likewise for a continuous-time index set Ω\Omega, Lp(Ω)L_p(\Omega) denotes Lp(Ω;R)L_p(\Omega; \mathbb{R}).

DefinitionConvergence

In a normed linear space X\mathbb{X}, an infinite sequence of elements {xn}\{x_n\} converges to an element xx, if the sequence {xnx}\{\|x_n - x\|\} converges to zero.

Remark.

Intuition: Convergence in a normed space means the distance between the sequence elements and the limit shrinks to zero. Note that what "distance" means depends on the norm -- convergence in the supremum norm (uniform convergence) is a much stronger statement than convergence in the L2L_2 norm (energy convergence). Always be aware of which norm you are working with.


Hilbert Spaces

DefinitionPre-Hilbert Space

A pre-Hilbert space XX is a linear vector space on which an inner product ,:X×XC\langle \cdot, \cdot \rangle : X \times X \to \mathbb{C} is defined, where the following are satisfied:

  1. x,y=y,x\langle x, y \rangle = \langle y, x \rangle^* (the superscript denotes the complex conjugate) (we will also use y,x\overline{\langle y, x \rangle} to denote the complex conjugate)
  2. x+y,z=x,z+y,z\langle x + y, z \rangle = \langle x, z \rangle + \langle y, z \rangle
  3. αx,y=αx,y\langle \alpha x, y \rangle = \alpha \langle x, y \rangle
  4. x,x0\langle x, x \rangle \geq 0, equals 00 iff xx is the null element.
Remark.

Intuition: An inner product enriches a vector space with geometric structure: it gives us notions of "angle" and "projection" between signals, not just "size." This is what allows us to decompose a signal into orthogonal components -- the mathematical foundation for Fourier series, least-squares fitting, and matched filtering.

Thus, corresponding to each pair (x,y)X×X(x, y) \in X \times X the inner product x,y\langle x, y \rangle is a scalar. The inner product gives us a notion of "angle" between signals and allows us to define orthogonality -- concepts that are essential for signal decomposition and optimal approximation.

TheoremCauchy-Schwarz Inequality

For x,yXx, y \in X,

x,yx,xy,y,|\langle x, y \rangle| \leq \sqrt{\langle x, x \rangle}\sqrt{\langle y, y \rangle},

where equality occurs if and only if x=αyx = \alpha y for some scalar α\alpha.

Remark.

Intuition: Cauchy-Schwarz is the abstract generalization of "the dot product of two vectors cannot exceed the product of their lengths." It bounds how much two signals can be "aligned" -- and equality holds precisely when one signal is a scaled copy of the other. This inequality underpins nearly every bound and estimate in signal processing and statistics.

The Cauchy-Schwarz inequality is one of the most important inequalities in all of mathematics. It tells us that the inner product of two signals is bounded by the product of their norms.

In a pre-Hilbert space, the inner-product defines a norm: x=x,x\|x\| = \sqrt{\langle x, x \rangle}

The proof for this result requires one to show that x,x\sqrt{\langle x, x \rangle} satisfies the triangle inequality, that is

x+yx+y,\|x + y\| \leq \|x\| + \|y\|,

which can be proven by an application of the Cauchy-Schwarz inequality.

In a given problem or application, the inner-product is to be defined. The inner product, in the special case of RN\mathbb{R}^N, is typically the usual inner vector product; hence RN\mathbb{R}^N with the usual inner-product is a pre-Hilbert space.

DefinitionHilbert Spaces

A complete pre-Hilbert space, is called a Hilbert space.

Remark.

Intuition: A Hilbert space is the "best of all worlds" for signal analysis: it has addition and scaling (linear space), a notion of distance (norm), a notion of angle and projection (inner product), and no missing limits (completeness). The spaces L2L_2 and l2l_2 are the prototypical Hilbert spaces in signal processing -- they capture signals with finite energy.

Hence, a Hilbert space is a Banach space, endowed with an inner product, which induces its norm. For example, if we define C([0,1];R)C([0, 1]; \mathbb{R}) as a pre-Hilbert space with the inner-product x,y=01x(t)y(t)dt\langle x, y \rangle = \int_0^1 x(t)y(t)\,dt, this would not be a Hilbert space, since this would not be a complete space.

The space l2(Z+;R)l_2(\mathbb{Z}_+; \mathbb{R}) is a Hilbert space with x,y=nZ+x(n)y(n)\langle x, y \rangle = \sum_{n \in \mathbb{Z}_+} x(n)y(n) for x,yl2(Z+;R)x, y \in l_2(\mathbb{Z}_+; \mathbb{R}). Furthermore, x=x,x\|x\| = \sqrt{\langle x, x \rangle} defines a norm in l2(Z+;R)l_2(\mathbb{Z}_+; \mathbb{R}).

Likewise, the space L2(R+;R)L_2(\mathbb{R}_+; \mathbb{R}) is a Hilbert space with x,y=tR+x(t)y(t)dt\langle x, y \rangle = \int_{t \in \mathbb{R}_+} x(t)y(t)\,dt for x,yL2(R+;R)x, y \in L_2(\mathbb{R}_+; \mathbb{R}) and its norm is x=x,x\|x\| = \sqrt{\langle x, x \rangle}.

TheoremContinuity of Inner Product

The inner product is continuous: if xnxx_n \to x, and ynyy_n \to y, then xn,ynx,y\langle x_n, y_n \rangle \to \langle x, y \rangle for xn,ynx_n, y_n in a Hilbert space.

Remark.

Intuition: Continuity of the inner product means that if two sequences of signals converge, then their "correlation" (inner product) converges to the correlation of the limits. This is practically important because it guarantees that Fourier coefficients, projections, and energy computations are stable under approximation.

Why are we interested in Hilbert Spaces?

Hilbert spaces have several very useful properties:

  1. Hilbert spaces allow us to have a geometric insight on a set of signals via the inner-product, orthogonality, and basis expansions.
  2. If a Hilbert space is separable (to be defined shortly), there exists a countably (or sometimes only finitely) many sequence of orthonormal vectors which can be used as basis to represent all the members in this space. This is used in many areas of applied mathematics and engineering, such as Fourier expansions among many others.
  3. A Hilbert space formulation allows us to develop approximations of signals using a finite number of basis signals.
  4. Also related to the item above, we can state a Projection Theorem which certifies optimality in a wide variety of applications. This geometric insight carries over to more general optimization problems in spaces which are not Hilbert (via duality analysis, to be discussed later).

Orthogonality and the Projection Theorem

PropositionOrthogonality

In a Hilbert space XX, two vectors x,yXx, y \in X are orthogonal if x,y=0\langle x, y \rangle = 0. A vector xx is orthogonal to a set SXS \subset X if x,y=0yS\langle x, y \rangle = 0 \quad \forall y \in S.

Remark.

Intuition: Orthogonality is the generalization of perpendicularity to abstract signal spaces. Two orthogonal signals have zero "overlap" or correlation -- they carry completely independent information. This concept is at the heart of Fourier analysis, where signals are decomposed into mutually orthogonal frequency components.

By the Pythagorean theorem, for x,yx, y orthogonal, we have that x+y2=x2+y2\|x + y\|^2 = \|x\|^2 + \|y\|^2.

A set AXA \subset X is closed if AA contains the limit of any converging sequence taking values from AA.

TheoremProjection Theorem

Let HH be a Hilbert space and BB a subspace of HH. Consider the problem:

infmBxm\inf_{m \in B} \|x - m\| \qquad \text{}

(i) A necessary and sufficient condition for mBm^* \in B to be the minimizing element in BB so that

infmBxm=xm,\inf_{m \in B} \|x - m\| = \|x - m^*\|, \qquad \text{}

or equivalently

xmxy,yB.\|x - m^*\| \leq \|x - y\|, \qquad \forall y \in B.

is that, xmx - m^* is orthogonal to BB.

If exists, such an mm^* is unique.

(ii) Let HH be a Hilbert space and BB be a closed subspace of HH. For any vector xHx \in H, there is a unique vector mBm^* \in B satisfying .

Remark.

Intuition: The Projection Theorem says that the best approximation to a signal from a subspace is obtained by "dropping a perpendicular" onto that subspace -- the approximation error is orthogonal to the subspace. This is exactly what happens in least-squares estimation, optimal filtering (Wiener filter), and signal compression: the optimal reconstruction is always the orthogonal projection.

The Projection Theorem is one of the most powerful results in Hilbert space theory. It says that the best approximation to a signal from a subspace is obtained by "projecting" orthogonally onto that subspace -- the error is perpendicular to the subspace. This has direct applications in optimal filtering, least squares estimation, and signal compression.

ExampleProjection Theorem Application

Consider the minimization of

11(tnm(t))2dt,\int_{-1}^{1} (t^n - m(t))^2\,dt,

for some nZ+n \in \mathbb{Z}_+, over all mm such that

mM:={f:fL2([1,1];R),f(t)=α+βt+γt2;  α,β,γR}.m \in M := \{f : f \in L_2([-1, 1]; \mathbb{R}),\, f(t) = \alpha + \beta t + \gamma t^2;\; \alpha, \beta, \gamma \in \mathbb{R}\}.

(a) State the problem as a Projection problem by identifying the Hilbert space, and the projected subspace.

(b) Compute the solution, that is find the minimizing mm.

Remark.

Intuition: This exercise shows that finding the best polynomial approximation of degree 2\leq 2 to tnt^n on [1,1][-1,1] is exactly a projection problem in L2L_2. The Hilbert space is L2([1,1];R)L_2([-1,1]; \mathbb{R}), the subspace MM is the set of polynomials of degree at most 2, and the minimizer is the orthogonal projection of tnt^n onto MM.


Approximations and Signal Expansions

Orthogonality

DefinitionOrthogonal and Orthonormal Sets

A set of vectors in a Hilbert space SS is orthogonal if all elements of this set are orthogonal to each other. The set is orthonormal if each vector in this set has norm equal to one.

Remark.

Intuition: An orthonormal set is a collection of "perfectly independent, unit-sized" building blocks for signals. Orthogonality ensures no redundancy (each basis vector carries unique information), and normalization to unit length simplifies coefficient computation to simple inner products. The complex exponentials in Fourier analysis form the prototypical orthonormal set.

The Gram-Schmidt orthogonalization procedure can be invoked to generate a set of orthonormal sequences. This procedure states that given a sequence {xi}\{x_i\} of linearly independent vectors, there exists an orthonormal sequence of vectors {ei}\{e_i\} such that for every xk,αk,1knx_k, \alpha_k, 1 \leq k \leq n, there exists βk,1kn\beta_k, 1 \leq k \leq n with

k=1nαkxk=k=1nβkek,\sum_{k=1}^{n} \alpha_k x_k = \sum_{k=1}^{n} \beta_k e_k,

that is, the linear span of {xk,1kn}\{x_k, 1 \leq k \leq n\} is equal to the linear span of {ek,1kn}\{e_k, 1 \leq k \leq n\} for every nNn \in \mathbb{N}.

Recall that a set of vectors {ei}\{e_i\} is linearly dependent if there exists a complex-valued vector c={c1,c2,,cN}\mathbf{c} = \{c_1, c_2, \ldots, c_N\} such that iNciei=0\sum_i^N c_i e_i = 0 with at least one coefficient ci0c_i \neq 0.

PropositionLinear Independence of Orthonormal Vectors

A sequence of orthonormal vectors is a linearly independent collection.

Remark.

Intuition: Because orthonormal vectors point in completely "different directions," no vector in the set can be built from the others. This guarantees there is no redundancy in an orthonormal basis -- each basis function contributes something genuinely new to the representation.

We say that a sequence i=1nϵiei\sum_{i=1}^{n} \epsilon_i e_i converges to xx, if for every ϵ>0\epsilon > 0 there exists NNN \in \mathbb{N} such that xi=1nϵiei<ϵ\|x - \sum_{i=1}^{n} \epsilon_i e_i\| < \epsilon, for all nNn \geq N.

TheoremConvergence of Orthonormal Expansions

Let {ei}\{e_i\} be a sequence of orthonormal vectors in a Hilbert space HH. Let {xn=i=1nϵiei}\{x_n = \sum_{i=1}^{n} \epsilon_i e_i\} be a sequence of vectors in HH. The sequence converges to some vector xˉ\bar{x} if and only if

i=1ϵi2<.\sum_{i=1}^{\infty} |\epsilon_i|^2 < \infty.

In this case, we have that xˉ,ei=ϵi\langle \bar{x}, e_i \rangle = \epsilon_i.

Remark.

Intuition: This theorem provides the bridge between abstract Hilbert space theory and practical signal representation: a signal can be expanded in an orthonormal basis if and only if the sum of squared coefficients is finite -- which is exactly the condition that the signal has finite energy. Moreover, the coefficients are simply inner products with the basis vectors, making them easy to compute.

This theorem provides the fundamental link between Hilbert space theory and signal representation: a signal can be expanded in an orthonormal basis if and only if the sum of squared coefficients is finite -- which is exactly the condition that the signal has finite energy.

Separable Hilbert Spaces and Countable Expansions

DefinitionDense Subset

Given a normed linear space XX, a subset DXD \subset X is dense in XX, if for every xXx \in X, and each ϵ>0\epsilon > 0, there exists a member dDd \in D such that xdϵ\|x - d\| \leq \epsilon.

Remark.

Intuition: A dense subset can approximate any element in the space to arbitrary precision. Think of how rational numbers are dense in the reals -- every real number can be approximated as closely as you like by a rational. Similarly, saying that polynomials are dense in L2L_2 means any square-integrable signal can be approximated arbitrarily well by a polynomial.

DefinitionCountable Set

A set is countable if either it has finitely many elements or there is a one-to-one correspondence between the set and the set N\mathbb{N} of natural numbers (which would then be an enumeration / counting map).

Remark.

Intuition: Countability means you can list all elements in a sequence, even if the list is infinite. This matters because computers and algorithms work with sequences of numbers. A countable basis for a signal space means you can, in principle, represent any signal by a list of coefficients -- which is exactly what digital signal processing does.

Examples of countable spaces are finite sets, the set Z\mathbb{Z} of integers, and the set Q\mathbb{Q} of rational numbers. An example of uncountable sets is the set R\mathbb{R} of real numbers.

TheoremCountability Properties

(a) A countable union of countable sets is countable.

(b) Cartesian product of finitely many countable sets is countable.

(c) Cartesian product of (countably) infinitely many finite sets may not be countable. The same holds even when each of the sets is finite.

(d) [0,1][0, 1] is not countable.

Remark.

Intuition: These results establish the fundamental size distinctions between sets. Part (d) -- that [0,1][0,1] is uncountable -- is especially important: it means continuous-time signals live in an inherently "larger" space than any list can capture. This is precisely why we need the machinery of dense subsets and basis expansions to represent continuous signals with countable data.

Cantor's diagonal argument and the triangular enumeration are important steps in proving the theorem above.

DefinitionSeparable Space

A space XX is separable, if it contains a countable dense set.

Remark.

Intuition: Separability means that even though the space may contain uncountably many elements, a countable subset is "enough" to approximate everything. This is the mathematical guarantee that digital signal processing is possible: separable signal spaces can be faithfully represented using countably many basis functions (like Fourier harmonics) and their coefficients.

Separability states that it suffices to work with a countable set, when a set is uncountable, for computational purposes. Examples of separable sets are R\mathbb{R}, and the set of continuous and bounded functions on a compact set metrized with the maximum distance between the functions.

TheoremCountability of Orthonormal Systems in Separable Spaces

Let HH be a separable Hilbert space. Then, every orthonormal system of vectors in HH has a finite or countably infinite number of elements.

Remark.

Intuition: In a separable Hilbert space, you can never have an uncountable collection of mutually orthogonal directions. This limits the "dimensionality" of the space in a way that makes countable basis expansions (like Fourier series) sufficient to represent every signal. It is the reason why a countable set of frequencies captures all the information in a square-integrable signal.

DefinitionComplete Orthonormal Sequence

An orthonormal sequence in a Hilbert space HH is complete if the only vector in HH which is orthogonal to each of the vectors in the sequence is the null vector.

Remark.

Intuition: Completeness of an orthonormal sequence means there is "no signal left over" that the basis misses. If a signal is orthogonal to every basis vector, it must be zero. This is the condition that distinguishes a true basis (which can represent everything) from a partial set (which can only represent signals in a subspace).

TheoremSeparability and Complete Orthonormal Sequences

A Hilbert space HH contains a complete orthonormal sequence (that is, a countable collection of such vectors) if and only if it is separable.

Remark.

Intuition: This is the central structural result: a Hilbert space admits a countable basis (enabling Fourier-type expansions) if and only if it is separable. Since the signal spaces we care about (L2L_2, l2l_2) are separable, this theorem guarantees that every finite-energy signal can be expressed as a countable sum of orthonormal basis functions.

The proof above also showed the following result:

TheoremBasis Expansion in Hilbert Spaces

In a Hilbert space HH, a complete orthonormal sequence {en,nN}\{e_n, n \in \mathbb{N}\} defines a basis in the sense that for any xHx \in H, we have

x=limNk=1Nx,ekekx = \lim_{N \to \infty} \sum_{k=1}^{N} \langle x, e_k \rangle e_k

Remark.

Intuition: This is the master representation theorem: any signal in a separable Hilbert space can be perfectly reconstructed from its inner products with basis vectors. The coefficients x,ek\langle x, e_k \rangle are the "coordinates" of the signal, and the reconstruction converges in the energy (L2L_2) sense. Fourier series, wavelet decompositions, and all other orthonormal expansions are special cases of this result.

This is the fundamental representation theorem for signals in Hilbert spaces. Every signal can be decomposed into a (possibly infinite) sum of orthonormal basis vectors, with coefficients given by inner products. This is the abstract framework underlying Fourier series, wavelet expansions, and many other signal representations.

Separability of l2l_2 and L2L_2 spaces

In view of Theorem, the following result builds on the fact that the sequence of orthonormal vectors

{en,nN:en:Z+R,en(m)=1{m=n},mZ+}\left\{e_n, n \in \mathbb{N} : \quad e_n : \mathbb{Z}_+ \to \mathbb{R}, \quad e_n(m) = 1_{\{m = n\}}, \quad m \in \mathbb{Z}_+\right\}

is a countable complete orthonormal set in l2(Z+;R)l_2(\mathbb{Z}_+; \mathbb{R}): Note that for any h={h(1),h(2),}l2(Z+;R)h = \{h(1), h(2), \cdots\} \in l_2(\mathbb{Z}_+; \mathbb{R}), h,en=h(n)\langle h, e_n \rangle = h(n) and hence for any vector vl2(Z+;R)v \in l_2(\mathbb{Z}_+; \mathbb{R})

v,en=0nZ+    v=0.\langle v, e_n \rangle = 0 \quad \forall n \in \mathbb{Z}_+ \implies \|v\| = 0.

TheoremSeparability of l2

The Hilbert space l2(Z+;R)l_2(\mathbb{Z}_+; \mathbb{R}) with inner product

h1,h2=nZ+h1(n)h2(n),\langle h_1, h_2 \rangle = \sum_{n \in \mathbb{Z}_+} h_1(n) h_2(n),

is separable.

Remark.

Intuition: Separability of l2l_2 means that the space of square-summable sequences -- the natural home for discrete-time finite-energy signals -- can be spanned by a countable orthonormal basis. The standard basis vectors ene_n (which are 1 at position nn and 0 elsewhere) serve as this basis, and every sequence is simply a list of its values at each time index.

Next, we will show that L2([a,b];R)L_2([a, b]; \mathbb{R}) is separable for a,bRa, b \in \mathbb{R}. To establish this result, we will review some useful facts.

TheoremBernstein-Weierstrass' Theorem

Any function in C([0,1];R)C([0, 1]; \mathbb{R}) can be approximated arbitrarily well by a polynomial under the supremum norm.

Remark.

Intuition: The Weierstrass theorem says that polynomials are "universal approximators" for continuous functions on a closed interval. No matter how complicated a continuous signal is, a polynomial of sufficiently high degree can match it to any desired accuracy. This foundational result is what lets us build toward proving that L2L_2 spaces are separable.

TheoremDensity of Continuous Functions in L2

The set C([0,1];R)C([0, 1]; \mathbb{R}), of continuous functions, is dense in L2([0,1];R)L_2([0, 1]; \mathbb{R}).

Remark.

Intuition: This theorem says that any square-integrable function -- even one with discontinuities -- can be approximated arbitrarily well (in the energy sense) by a continuous function. This is reassuring for engineers: continuous, "well-behaved" signals are dense enough to represent all finite-energy signals, so working with smooth approximations does not lose generality.

TheoremSeparability of L2([0,1]; R)

The space L2([0,1];R)L_2([0, 1]; \mathbb{R}) is separable.

Remark.

Intuition: Separability of L2([0,1])L_2([0,1]) is the key result that justifies Fourier series: it guarantees that every finite-energy signal on [0,1][0,1] can be represented by a countable orthonormal basis. The proof chains together polynomial approximation (Weierstrass) and density of continuous functions to construct such a basis -- and the complex exponentials of Fourier series are one natural choice.

TheoremSeparability of L2 on the Positive Reals

L2(R+;R)L_2(\mathbb{R}_+; \mathbb{R}) is separable.

Remark.

Intuition: Extending separability from a bounded interval to all of R+\mathbb{R}_+ means Fourier-type expansions work for signals defined on the entire positive real line, not just finite intervals. The key idea is that any finite-energy signal has most of its energy concentrated in some finite window, so the bounded-interval result can be "stitched together" to cover the whole line.

Two further results:

TheoremDensity of L2 in L1

The set L2([1,);R)L_2([1, \infty); \mathbb{R}) is dense in L1([1,);R)L_1([1, \infty); \mathbb{R}).

Remark.

Intuition: This says that any absolutely integrable signal on [1,)[1,\infty) can be approximated by a square-integrable signal. In practice, this means results proven for L2L_2 signals (where we have Hilbert space tools) can often be extended to L1L_1 signals through approximation arguments.

TheoremDensity of Continuous Functions with Compact Support

Let CcC_c denote the space of continuous functions with compact support. CcC_c is dense in L1(R;R)L_1(\mathbb{R}; \mathbb{R}).

Remark.

Intuition: This result says that any integrable signal can be approximated by a continuous function that is zero outside some finite interval. In engineering terms, every L1L_1 signal can be well-approximated by a "nice" signal that is both smooth and compactly supported -- a key step in proving results about Fourier transforms, since such signals are easy to integrate and transform.

Signal expansions in L2([a,b];R)L_2([a, b]; \mathbb{R}) or L2([a,b];C)L_2([a, b]; \mathbb{C}): Fourier, Haar and Polynomial Bases

Fourier Signals as Basis Vectors and Fourier Series

Fourier series is a very important class of orthonormal sequences which are used to represent both discrete-time and continuous-time signals. These will be studied later on in much detail. In particular, we will soon see that in L2([0,P];C)L_2([0, P]; \mathbb{C}) the family of complex exponentials

{ek:ek(t)=1Pei2πkPt,kZ},\left\{e_k : e_k(t) = \frac{1}{\sqrt{P}} e^{i2\pi \frac{k}{P} t}, k \in \mathbb{Z}\right\},

provides a complete orthonormal sequence.

Accordingly, for any xL2([0,P];C)x \in L_2([0, P]; \mathbb{C}), we can write

x=kZx,ekekx = \sum_{k \in \mathbb{Z}} \langle x, e_k \rangle e_k

or by expanding the inner-product, we have

x(t)=kZ(Rx(s)1Pei2πkPsds)1Pei2πkPtx(t) = \sum_{k \in \mathbb{Z}} \left(\int_{\mathbb{R}} x(s) \frac{1}{\sqrt{P}} e^{-i2\pi \frac{k}{P} s}\,ds\right) \frac{1}{\sqrt{P}} e^{i2\pi \frac{k}{P} t}

where the convergence of the infinite sum is in the L2L_2 sense.

This expansion is precisely the Fourier series expansion of the function xx in L2([0,P];C)L_2([0, P]; \mathbb{C}). The inner-product x,ek\langle x, e_k \rangle defines the Fourier transform.

Legendre Polynomials as Basis Vectors

We have seen, in the context of Theorems 2.3.7 and 2.3.8 (see the proof of 2.3.9), the functions {tk,kZ+}\{t^k, k \in \mathbb{Z}_+\} can be used to construct an orthonormal collection of signals which is complete in L2([a,b];R)L_2([a, b]; \mathbb{R}). These complete orthonormal polynomials are called Legendre polynomials.

Haar Functions as Basis Vectors

One further practically very important basis is the class of Haar functions (known as wavelets). Define

Ψ0,0(x)={1,if 0x10else\Psi_{0,0}(x) = \begin{cases} 1, & \text{if } 0 \leq x \leq 1 \\ 0 & \text{else} \end{cases} \qquad \text{}

and for nZ+n \in \mathbb{Z}_+, k{0,1,2,,2n1}k \in \{0, 1, 2, \ldots, 2^n - 1\},

Φn,k(x)={2n/2,if k2nx<(k+1/2)2n2n/2,if (k+1/2)2nx<(k+1)2n0else\Phi_{n,k}(x) = \begin{cases} 2^{n/2}, & \text{if } k2^{-n} \leq x < (k + 1/2)2^{-n} \\ -2^{n/2}, & \text{if } (k + 1/2)2^{-n} \leq x < (k+1)2^{-n} \\ 0 & \text{else} \end{cases} \qquad \text{}

TheoremCompleteness of the Haar System

The (Haar) set of vectors

{Ψ0,0,Φn,k,nZ+,k{0,1,2,,2n1}}\left\{\Psi_{0,0}, \Phi_{n,k}, n \in \mathbb{Z}_+, k \in \{0, 1, 2, \ldots, 2^n - 1\}\right\}

is a complete orthonormal sequence in L2([0,1];R)L_2([0, 1]; \mathbb{R}).

Remark.

Intuition: The Haar system provides an alternative to Fourier series for representing L2L_2 signals. While Fourier basis functions are smooth sinusoids spread over the entire interval, Haar functions are localized step functions that capture information at specific locations and scales. This makes wavelets particularly effective for signals with sharp transitions or edges, which is why they are widely used in image compression (e.g., JPEG 2000).

The important observation to note here is that, different expansions might be suited for different engineering applications: for instance Haar series are occasionally used in image processing with certain edge behaviours, whereas Fourier expansion is extensively used in speech processing and communications theoretic applications.

Approximations

Approximations allow us to represent data using finitely many vectors. The basis expansions studied above can be used to obtain the best approximation of a signal up to finitely many terms to be used in an approximation: This can be posed as a projection problem, and we have seen that the best approximation is one in which the approximation error is orthogonal to all the vectors used in the approximation (defining an approximation subspace).


Exercises

Exercise

(a) The set C(R)C^\infty(\mathbb{R}), which is the set of all functions from R\mathbb{R} to R\mathbb{R} that are infinitely differentiable, together with the operations of addition and scalar multiplication defined as follows, is a vector space: For any f1,f2C(R)f_1, f_2 \in C^\infty(\mathbb{R})

(f1+f2)(x)=f1(x)+f2(x),xR(f_1 + f_2)(x) = f_1(x) + f_2(x), \qquad x \in \mathbb{R}

and for any αR\alpha \in \mathbb{R} and fC(R)f \in C^\infty(\mathbb{R})

(αf)(x)=αf(x),xR(\alpha \cdot f)(x) = \alpha f(x), \qquad x \in \mathbb{R}

(i) Now, consider P(R)\mathcal{P}(\mathbb{R}) to be the set of all (polynomial) functions that maps R\mathbb{R} to R\mathbb{R} such that any fP(R)f \in \mathcal{P}(\mathbb{R}) can be written as f(x)=i=0naixif(x) = \sum_{i=0}^{n} a_i x^i for some nNn \in \mathbb{N} with a0,a1,,anRa_0, a_1, \cdots, a_n \in \mathbb{R}. Suppose that we define the same addition and scalar multiplication operations as defined above. Is P(R)\mathcal{P}(\mathbb{R}) a subspace in C(R)C^\infty(\mathbb{R})?

(ii) Show that the space of all functions in C(R)C^\infty(\mathbb{R}) which map R\mathbb{R} to R\mathbb{R} which satisfy f(10)=0f(10) = 0 is a vector space with addition and multiplication defined as above.

(b) Consider the set Rn\mathbb{R}^n. On Rn\mathbb{R}^n, define an addition operation and a scalar multiplication operation as follows:

(x1,x2,,xn)+(y1,y2,,yn)=(x1+y1,x2+y2,,xn+yn)(x_1, x_2, \cdots, x_n) + (y_1, y_2, \cdots, y_n) = (x_1 + y_1, x_2 + y_2, \cdots, x_n + y_n)

α(x1,x2,,xn)=(αx1,αx2,,αxn)\alpha \cdot (x_1, x_2, \cdots, x_n) = (\alpha x_1, \alpha x_2, \cdots, \alpha x_n)

Show that, with these operations, Rn\mathbb{R}^n is a vector space.

(c) Consider the set

W={(x,y):xR,xR,x>0,y>0}\mathbb{W} = \{(x, y) : x \in \mathbb{R}, x \in \mathbb{R}, x > 0, y > 0\}

On this set, define an addition operation and a scalar multiplication operation as follows:

(x1,y1)+(x2,y2)=(x1x2,y1y2)(x_1, y_1) + (x_2, y_2) = (x_1 x_2, y_1 y_2)

α(x,y)=(xα,yα)\alpha \cdot (x, y) = (x^\alpha, y^\alpha)

Show that, with these operations, W\mathbb{W} is a vector space. Hint: Consider a bijection between W\mathbb{W} and the space R2\mathbb{R}^2 with W(x,y)(log(x),log(y))R2\mathbb{W} \ni (x, y) \mapsto (\log(x), \log(y)) \in \mathbb{R}^2.

Exercise (Holder's and Minkowski's Inequalities)

Let 1p,q1 \leq p, q \leq \infty with 1/p+1/q=11/p + 1/q = 1. Let xlp(Z+)x \in l_p(\mathbb{Z}_+) and ylq(Z+)y \in l_q(\mathbb{Z}_+). Then,

i=0xiyixpyq\sum_{i=0}^{\infty} |x_i y_i| \leq \|x\|_p \|y\|_q

This is known as Holder's inequality. Equality holds if and only if

(xixp)(1/q)=(yiyq)(1/p),\left(\frac{x_i}{\|x\|_p}\right)^{(1/q)} = \left(\frac{y_i}{\|y\|_q}\right)^{(1/p)},

for each iZ+i \in \mathbb{Z}_+.

To prove this, perform the following:

(a) Show that for a0,b0,c(0,1)a \geq 0, b \geq 0, c \in (0, 1): acb1cca+(1c)ba^c b^{1-c} \leq ca + (1 - c)b with equality if and only if a=ba = b. To show this, you may consider the function f(t)=tcct+c1f(t) = t^c - ct + c - 1 and see how it behaves for t0t \geq 0 and let t=a/bt = a/b.

(b) Apply the inequality acb1cca+(1c)ba^c b^{1-c} \leq ca + (1 - c)b to the numbers:

a=(xixp)p,b=(yiyq)q,c=1/pa = \left(\frac{|x_i|}{\|x\|_p}\right)^p, \quad b = \left(\frac{|y_i|}{\|y\|_q}\right)^q, \quad c = 1/p

Holder's inequality is useful to prove Minkowski's inequality which states that for 1<p<1 < p < \infty,

x+ypxp+yp\|x + y\|_p \leq \|x\|_p + \|y\|_p

This proceeds as follows:

i=1nxi+yipi=1nxi+yip1xi+yii=1nxi+yip1xi+i=1nxi+yip1yi\sum_{i=1}^{n} |x_i + y_i|^p \leq \sum_{i=1}^{n} |x_i + y_i|^{p-1}|x_i + y_i| \leq \sum_{i=1}^{n} |x_i + y_i|^{p-1}|x_i| + \sum_{i=1}^{n} |x_i + y_i|^{p-1}|y_i|

=(i=1nxi+yi(p1)q)1/q(i=1nxip)1/p+(i=1nxi+yi(p1)q)1/q(i=1nyip)1/p= \left(\sum_{i=1}^{n} |x_i + y_i|^{(p-1)q}\right)^{1/q}\left(\sum_{i=1}^{n} |x_i|^p\right)^{1/p} + \left(\sum_{i=1}^{n} |x_i + y_i|^{(p-1)q}\right)^{1/q}\left(\sum_{i=1}^{n} |y_i|^p\right)^{1/p}

=(i=1nxi+yip)1/q((i=1nxip)1/p+(i=1nyip)1/p)= \left(\sum_{i=1}^{n} |x_i + y_i|^p\right)^{1/q}\left(\left(\sum_{i=1}^{n} |x_i|^p\right)^{1/p} + \left(\sum_{i=1}^{n} |y_i|^p\right)^{1/p}\right) \qquad \text{}

Thus, using that 11/q=1/p1 - 1/q = 1/p,

(i=1nxi+yip)1/p(i=1nxip)1/p+(i=1nyip)1/p,\left(\sum_{i=1}^{n} |x_i + y_i|^p\right)^{1/p} \leq \left(\sum_{i=1}^{n} |x_i|^p\right)^{1/p} + \left(\sum_{i=1}^{n} |y_i|^p\right)^{1/p},

Now, the above holds for every nn. Taking the limit nn \to \infty (first on the right and then on the left), it follows that

x+ypxp+yp\|x + y\|_p \leq \|x\|_p + \|y\|_p

which is the desired inequality.

Exercise

Given a normed linear space (X,)(X, \|\cdot\|), introduce a map n:X×XRn : X \times X \to \mathbb{R}:

n(x,y)=xy1+xyn(x, y) = \frac{\|x - y\|}{1 + \|x - y\|}

Show that n(x,y)n(x, y) is a metric: That is, it satisfies the triangle inequality:

n(x,y)n(x,z)+n(z,y),x,y,zX,n(x, y) \leq n(x, z) + n(z, y), \qquad \forall x, y, z \in X,

and that n(x,y)=0n(x, y) = 0 iff x=yx = y, and finally n(x,y)=n(y,x)n(x, y) = n(y, x).

Exercise

Let {en,nN}\{e_n, n \in \mathbb{N}\} be a complete orthonormal sequence in a real Hilbert space HH. Let M\mathcal{M} be a subspace of HH, spanned by {ek,kS}\{e_k, k \in S\}, for some finite set SNS \subset \mathbb{N}. That is,

M={vH:αkR,kS,v=kSαkek}\mathcal{M} = \{v \in H : \exists \alpha_k \in \mathbb{R}, k \in S,\quad v = \sum_{k \in S} \alpha_k e_k\}

Let xHx \in H be given. Find xMx^* \in \mathcal{M} which is the solution to the following:

minx0Mxx0,\min_{x_0 \in \mathcal{M}} \|x - x_0\|,

in terms of xx, and {en,nN}\{e_n, n \in \mathbb{N}\}.

Hint: Any vector in HH can be written as x=nNx,enenx = \sum_{n \in \mathbb{N}} \langle x, e_n \rangle e_n.

Exercise

Let T:L2(R+;R)RT : L_2(\mathbb{R}_+; \mathbb{R}) \to \mathbb{R} be a mapping given by:

T(f)=1f(t)1+t2t4dtT(f) = \int_1^{\infty} f(t) \frac{1 + t^2}{t^4}\,dt

Is TT continuous at any given f0L2(R+;R)f_0 \in L_2(\mathbb{R}_+; \mathbb{R})?

Exercise

Consider an inner-product defined by:

x,y=limT1Tt=0Tx(t)y(t)\langle x, y \rangle = \lim_{T \to \infty} \frac{1}{T} \int_{t=0}^{T} x(t) y(t)

Is the resulting inner-product (pre-Hilbert) space separable?

Exercise

Let x,yf(Z;R)x, y \in f(\mathbb{Z}; \mathbb{R}); that is, x,yx, y map Z\mathbb{Z} to R\mathbb{R}, such that x={,x2,x1,x0,x1,x2,}x = \{\ldots, x_{-2}, x_{-1}, x_0, x_1, x_2, \ldots\} and y={,y2,y1,y0,y1,y2,}y = \{\ldots, y_{-2}, y_{-1}, y_0, y_1, y_2, \ldots\} and xk,ykRx_k, y_k \in \mathbb{R}, for all kZk \in \mathbb{Z}.

For (a)-(c) below, state if the following are true or false with justifications in a few sentences:

(a) x,y=iZi2xiyi\langle x, y \rangle = \sum_{i \in \mathbb{Z}} i^2 x_i y_i is an inner-product.

(b) x,y=iZxiyi\langle x, y \rangle = \sum_{i \in \mathbb{Z}} x_i y_i is an inner-product.

(c) {x:x22<}\{x : \|x\|_2^2 < \infty\} contains a complete orthonormal sequence, where x2=iZx(i)2\|x\|_2 = \sqrt{\sum_{i \in \mathbb{Z}} |x(i)|^2}.

Exercise

Let X\mathbb{X} be a Hilbert space and x,yXx, y \in \mathbb{X}. Prove the following:

(a) (Parallelogram Law)

x+y2+xy2=2x2+2y2.\|x + y\|^2 + \|x - y\|^2 = 2\|x\|^2 + 2\|y\|^2.

(b)

x,y(x)(y).|\langle x, y \rangle| \leq (\|x\|)(\|y\|).

(c)

2x,yx2+y2.2|\langle x, y \rangle| \leq \|x\|^2 + \|y\|^2.

Exercise

Let HH be a finite dimensional Hilbert space and {v1,v2}\{v_1, v_2\} be two linearly independent vectors in HH.

Let b1,b2Rb_1, b_2 \in \mathbb{R}. Show that, among all vectors xHx \in H, which satisfies

x,v1=b1,\langle x, v_1 \rangle = b_1,

x,v2=b2,\langle x, v_2 \rangle = b_2,

the vector xHx^* \in H has the minimum norm if xx^* satisfies:

x=α1v1+α2v2,x^* = \alpha_1 v_1 + \alpha_2 v_2,

with

v1,v1α1+v2,v1α2=b1,\langle v_1, v_1 \rangle \alpha_1 + \langle v_2, v_1 \rangle \alpha_2 = b_1,

v1,v2α1+v2,v2α2=b2.\langle v_1, v_2 \rangle \alpha_1 + \langle v_2, v_2 \rangle \alpha_2 = b_2.

Exercise

Let HH be a Hilbert space and CHC \subset H be a dense subset of HH. Suppose that any element hCh_C in CC is such that for every ϵ>0\epsilon > 0, there exist nNn \in \mathbb{N} and βiR,iN\beta_i \in \mathbb{R}, i \in \mathbb{N} so that

i=0nβieihCϵ\left\|\sum_{i=0}^{n} \beta_i e_i - h_C\right\| \leq \epsilon

where {eα,αN}\{e_\alpha, \alpha \in \mathbb{N}\} is a countable sequence of orthonormal vectors in HH.

Is it the case that HH is separable?

Exercise

Let xx be in the real Hilbert space L2([0,1];R)L_2([0, 1]; \mathbb{R}) with the inner product

x,y=01x(t)y(t)dt.\langle x, y \rangle = \int_0^1 x(t) y(t)\,dt.

We would like to express xx in terms of the following two signals (which belong to the Haar signal space)

u1(t)=1{t[0,1/2)}1{t[1/2,1]},t[0,1]u_1(t) = 1_{\{t \in [0, 1/2)\}} - 1_{\{t \in [1/2, 1]\}}, \qquad t \in [0, 1]

u2(t)=1{t[0,1]},t[0,1]u_2(t) = 1_{\{t \in [0, 1]\}}, \qquad t \in [0, 1]

such that

01x(t)i=12αiui(t)2dt\int_0^1 \left|x(t) - \sum_{i=1}^{2} \alpha_i u_i(t)\right|^2 dt

is minimized, for {α1,α2R}\{\alpha_1, \alpha_2 \in \mathbb{R}\}.

(a) Using the Gram-Schmidt procedure, obtain two orthonormal vectors {e1(t),e2(t)}\{e_1(t), e_2(t)\} such that these vectors linearly span the same space spanned by {u1(t),u2(t)}\{u_1(t), u_2(t)\}.

(b) State the problem as a projection theorem problem by clearly identifying the Hilbert space and the projected subspace.

(c) Let x(t)=1{t[1/2,1]}x(t) = 1_{\{t \in [1/2, 1]\}}. Find the minimizing α1,α2\alpha_1, \alpha_2 values.

Exercise

Let C([0,1];R)C([0, 1]; \mathbb{R}) denote the normed linear space of continuous functions from [1,1][-1, 1] to R\mathbb{R} under the supremum norm. We observed earlier that polynomials can be used to approximate any function in this space with arbitrary precision, under the supremum norm (Weierstrass Theorem).

Given this, repeating the arguments we made in class, argue that the family of polynomials {1,t,t2,}\{1, t, t^2, \cdots\} can be used to form a complete orthonormal sequence in L2([0,1];R)L_2([0, 1]; \mathbb{R}). This also establishes that L2([0,1];R)L_2([0, 1]; \mathbb{R}) is separable.

Exercise

Alice and Bob are approached by a generous company and asked to solve the following problem: The company wishes to store any signal ff in L2(R+;R)L_2(\mathbb{R}_+; \mathbb{R}) in a computer with a given error of ϵ>0\epsilon > 0, that is for every fL2(R+)f \in L_2(\mathbb{R}_+), there exists some signal hHh \in H such that fh2ϵ\|f - h\|_2 \leq \epsilon (thus the error is uniform over all possible signals), where HH is the stored family of signals (in the computer's memory).

To achieve this, they encourage Alice or Bob to use a finite or a countable expansion to represent the signal and later store this signal in an arbitrarily large memory. Hence, they allow Alice or Bob to purchase as much memory as they would like for a given error value of ϵ\epsilon.

Alice turns down the offer and says it is impossible to do that for any ϵ\epsilon with a finite memory and argues then she needs infinite memory, which is impossible.

Bob accepts the offer and says he may need a very large, but finite, memory for any given ϵ>0\epsilon > 0; thus, the task is possible.

Which one is the accurate assessment?

(a) If you think Alice is right, which further conditions can she impose to make this possible? Why is she right?

(b) If you think Bob is right, can you suggest a method? Why is he right?

Exercise

Prove Theorem using a probability theoretic method. Proceed as follows: The number

Bn,f(t)=k=0nf(kn)(nk)tk(1t)nk.B_{n,f}(t) = \sum_{k=0}^{n} f\left(\frac{k}{n}\right) \binom{n}{k} t^k (1-t)^{n-k}.

can be expressed as the expectation E[ft(Snn)]E[f_t(\frac{S_n}{n})], where Sn=X1+X2++XnS_n = X_1 + X_2 + \cdots + X_n, where XiX_i is an i.i.d. collection of Bernoulli random variables where Xi=1X_i = 1 with probability tt and Xi=0X_i = 0 with probability 1t1 - t. Here, observe that the sum SnS_n has a binomial distribution. Thus,

supt[0,1]f(t)Bn,f(t)=supt[0,1]Et[f(Snn)]f(t),\sup_{t \in [0,1]} |f(t) - B_{n,f}(t)| = \sup_{t \in [0,1]} |E_t[f(\frac{S_n}{n})] - f(t)|,

where EtE_t, for each tt, denotes the expectation with respect to the i.i.d. Bernoulli random variables XiX_i each with P(Xi=1)=tP(X_i = 1) = t. Let PtP_t denote the probability measure induced by these tt-parametrized i.i.d. sequence of Bernoulli random variables.

Since ff is continuous and [0,1][0,1] is compact, ff is uniformly continuous. Thus, for every ϵ>0\epsilon > 0, there exists δ>0\delta > 0 such that xy<δ|x - y| < \delta implies that f(x)f(y)ϵ|f(x) - f(y)| \leq \epsilon. Thus,

Et[f(Snn)]f(t)|E_t[f(\frac{S_n}{n})] - f(t)|

=ω:Snntδf(Snn)f(t)P(dω)+ω:Snnt>δf(Snn)f(t)P(dω)= \int_{\omega: |\frac{S_n}{n} - t| \leq \delta} |f(\frac{S_n}{n}) - f(t)| P(d\omega) + \int_{\omega: |\frac{S_n}{n} - t| > \delta} |f(\frac{S_n}{n}) - f(t)| P(d\omega)

ϵ+2supy[0,1]f(y)Pt(Snnt>δ)\leq \epsilon + 2 \sup_{y \in [0,1]} |f(y)| P_t(|\frac{S_n}{n} - t| > \delta) \qquad \text{}

The last term converges to ϵ\epsilon as nn \to \infty by the law of large numbers. The above holds for every ϵ>0\epsilon > 0.

Now, one needs to show that this convergence is uniform in tt: For this show that for all t[0,1]t \in [0, 1], via Markov's inequality and the independence of XiX_i

Pt(Snnt>δ)=Pt(Snnt2>δ2)14nδ2,P_t(|\frac{S_n}{n} - t| > \delta) = P_t(|\frac{S_n}{n} - t|^2 > \delta^2) \leq \frac{1}{4n\delta^2},

establishing uniform convergence (over t[0,1]t \in [0, 1]), and thus complete the proof.

Exercise (A useful result on countability properties)

Let F:RRF : \mathbb{R} \to \mathbb{R} be a monotonically increasing function (that is, x1x2x_1 \leq x_2 implies that F(x1)F(x2)F(x_1) \leq F(x_2)). Show that FF can have at most countably many points of discontinuity.

Hint: If a point is such that FF is discontinuous, then there exists nNn \in \mathbb{N} with F(x+):=liminfxnxF(x)F(x^+) := \lim\inf_{x_n \downarrow x} F(x), F(x):=limsupxnxF(x)F(x^-) := \lim\sup_{x_n \uparrow x} F(x), F(x+)F(x)>1nF(x^+) - F(x^-) > \frac{1}{n}. Express R=mZ(m,m+1]\mathbb{R} = \cup_{m \in \mathbb{Z}} (m, m+1]. Let Bnm:={x(m,m+1]:F(x+)F(x)>1n}B_n^m := \{x \in (m, m+1] : F(x^+) - F(x^-) > \frac{1}{n}\}. It must be that BnmB_n^m is finite for otherwise the jump would be unbounded in the interval (m,m+1](m, m+1]. Then, the countable union nBnm\cup_n B_n^m will be countable. Finally mBnm\cup_m B_n^m is also countable.

Exercise

Prove Theorem (that the Haar system is a complete orthonormal sequence in L2([0,1];R)L_2([0,1]; \mathbb{R})).