Home/Chapter 2

Signal Spaces: Linear, Banach and Hilbert Spaces, and Basis Expansions

Normed linear spaces, metric spaces, Banach spaces, inner product spaces, Hilbert spaces. Orthogonality, separability, and signal expansions using Fourier, Haar, and polynomial bases.

In this chapter, we present a general review of signal spaces. These spaces form the mathematical foundation for everything that follows in the course: systems, transforms, stability, and control all rely on understanding the structure of the spaces in which signals live.

Normed Linear (Vector) Spaces and Metric Spaces

Linear Spaces

DefinitionLinear (Vector) Space

A linear (vector) space $\mathbb{X}$ is a space which is closed under addition operation $+$ and a scalar multiplication operation $\cdot$ , such that

$+ : \mathbb{X} \times \mathbb{X} \to \mathbb{X}$

$\cdot : \mathbb{C} \times \mathbb{X} \to \mathbb{X}$

with the following properties (we note that we may take the scalars to be either real or complex numbers). The following are satisfied for all $x, y \in \mathbb{X}$ and $\alpha, \beta$ scalars:

(i) $x + y = y + x \in \mathbb{X}$

(ii) $(x + y) + z = x + (y + z)$

(iii) $\alpha \cdot (x + y) = \alpha \cdot x + \alpha \cdot y \in \mathbb{X}$

(iv) $(\alpha + \beta) \cdot x = \alpha \cdot x + \beta \cdot x$

(v) There is a null vector $\underline{0}$ such that $x + \underline{0} = x$

(vi) $\alpha \cdot (\beta \cdot x) = (\alpha\beta) \cdot x$

(vii) $1 \cdot x = x$

(viii) For every $x \in \mathbb{X}$ , there exists an element, called the (additive) inverse of $x$ and denoted with $-x$ with the property $x + (-x) = \underline{0}$ .

Remark.

Intuition: A linear space is the most basic algebraic structure for working with signals: it guarantees you can add signals together, scale them, and these operations behave predictably. Think of it as the minimal "sandbox" in which superposition -- the cornerstone of linear systems theory -- makes sense.

Intuitively, a linear space is a set where you can add elements together and scale them, and these operations behave the way you would expect. This is the most basic algebraic structure we need to talk about "signals" in a mathematically precise way.

ExampleExamples of Linear Spaces

(i) The space $\mathbb{R}^n$ with the pointwise addition and scalar multiplication is a linear space. The null vector is $\underline{0} = \begin{bmatrix} 0 & 0 & \cdots & 0 \end{bmatrix} \in \mathbb{R}^n$ .

(ii) Consider the interval $[a, b]$ . The collection of real-valued continuous functions on $[a, b]$ , with pointwise addition and scalar multiplication is a linear space. The null element $\underline{0}$ is the function which is identically $0$ . This space is called the space of real-valued continuous functions on $[a, b]$ .

(iii) With pointwise addition and scalar multiplication, the set of all infinite sequences of real numbers mapping $\mathbb{Z}_+$ to $\mathbb{R}$ , each having only a finite number of elements not equal to zero is a vector space; for example, if one adds two such sequences, the sum also belongs to this space. This space is called the space of finitely many non-zero sequences.

(iv) The collection of all polynomial functions defined on an interval $[a, b]$ with complex coefficients

$\left\{f : [a, b] \to \mathbb{C},\, f(x) = \sum_{i=0}^{n} \alpha_i x^i;\;\; n \in \mathbb{N};\, x \in [0, 1],\, \alpha_0, \alpha_1, \cdots, \alpha_n \in \mathbb{C}\right\},$

forms a complex linear space. Note that the sum of polynomials is another polynomial.

(v) The $n$ -dimensional integer lattice $\mathbb{Z}^n = \{v : v = \{v_1, \cdots, v_n\} \in \mathbb{R}^n, v_k \in \mathbb{Z}, k = 1, \cdots, n\}$ with pointwise addition and scalar multiplication is not a linear space.

DefinitionSubspace

A non-empty subset $M$ of a (real) linear vector space $\mathbb{X}$ is called a subspace of $\mathbb{X}$ if

$\alpha x + \beta y \in M, \qquad \forall x, y \in M \quad \text{and} \quad \alpha, \beta \in \mathbb{R}.$

Remark.

Intuition: A subspace is a "self-contained" portion of a vector space -- any linear combination of elements in the subspace stays in the subspace. In signal processing, subspaces often represent restricted signal classes (e.g., band-limited signals or signals spanned by a finite set of basis functions) within a larger signal space.

In particular, the null element $\underline{0}$ is an element of every subspace. For $M$ , $N$ two subspaces of a vector space $\mathbb{X}$ , $M \cap N$ is also a subspace of $\mathbb{X}$ .

Normed Linear Spaces

DefinitionNormed Linear Space

A normed linear space $\mathbb{X}$ is a linear vector space on which a map from $\mathbb{X}$ to $\mathbb{R}_+$ , called its norm, is defined such that:

$\|x\| \geq 0 \quad \forall x \in \mathbb{X}$ , and $\|x\| = 0$ if and only if $x$ is the null element of $\mathbb{X}$ .
$\|x + y\| \leq \|x\| + \|y\|$ (triangle inequality)
$\|\alpha x\| = |\alpha|\|x\|, \quad \forall \alpha \in \mathbb{R}, \quad \forall x \in \mathbb{X}$ .

Remark.

Intuition: A norm gives us a way to measure "size" or "distance" in a vector space -- just like absolute value does for real numbers, but generalized to arbitrary signal spaces. Without a norm, we can add and scale vectors but have no notion of how large a signal is or how far apart two signals are. The choice of norm (e.g., peak value vs. total energy) determines what "closeness" means for your application.

The norm gives us a way to measure the "size" of a signal. Different norms measure size in different ways -- for instance, the maximum value of a signal vs. its total energy. The choice of norm profoundly affects which signals are "small" or "large."

Let for two linear spaces $\mathbb{X}$ , $\mathbb{Y}$ , $\Gamma(\mathbb{X}; \mathbb{Y})$ denote the set of all maps $\gamma : \mathbb{X} \to \mathbb{Y}$ such that for all $x \in \mathbb{X}$ , $\gamma(x) \in \mathbb{Y}$ .

ExampleExamples of Normed Linear Spaces

a) The space $C([a, b]; \mathbb{R})$ of continuous functions from $[a, b]$ to $\mathbb{R}$ with the norm $\|x\| = \max_{\{a \leq t \leq b\}} |x(t)|$ is a normed linear space.

b) For $1 \leq p < \infty$ :

$l_p(\mathbb{Z}_+; \mathbb{R}) := \left\{x \in \Gamma(\mathbb{Z}_+; \mathbb{R}) : \|x\|_p = \left(\sum_{i \in \mathbb{Z}_+} |x(i)|^p\right)^{1/p} < \infty\right\}$

is a normed linear space for all $1 \leq p < \infty$ .

c) Recall that if $S$ is a set of real numbers bounded from above, then there is a smallest real number $y$ such that $x \leq y$ for all $x \in S$ . The number $y$ is called the least upper bound or supremum of $S$ . If $S$ is not bounded from above, then the supremum is $\infty$ . In view of this, for $p = \infty$ , we define

$l_\infty(\mathbb{Z}_+; \mathbb{R}) := \left\{x \in \Gamma(\mathbb{Z}_+; \mathbb{R}) : \|x\|_\infty = \sup_{i \in \mathbb{Z}_+} |x(i)| < \infty\right\}.$

d) $L_p([a, b]; \mathbb{R}) = \left\{x \in \Gamma([a, b]; \mathbb{R}) : \|x\|_p = \left(\int_a^b |x(t)|^p\,dt\right)^{1/p} < \infty\right\}$ is a normed linear space. For $p = \infty$ , we typically write: $L_\infty([a, b]; \mathbb{R}) := \left\{x \in \Gamma([a, b]; \mathbb{R}) : \|x\|_\infty = \sup_{t \in [a,b]} |x(t)| < \infty\right\}$ . However, for $1 \leq p < \infty$ , to satisfy the condition that $\|x\|_p = 0$ implies that $x(t) = 0$ , we need to assume that functions which are equal to zero almost everywhere are equivalent; for $p = \infty$ the definition is often revised with essential supremum instead of supremum so that

$\|x\|_\infty = \inf_{y:\, y(t) = x(t)\,\text{a.e.}} \sup_{t \in [a, b]} |y(t)|$

This subtle difference needs to be made explicit in some applications.

To show that $l_p$ defined above is a normed linear space, we need to show that $\|x + y\|_p \leq \|x\|_p + \|y\|_p$ .

TheoremMinkowski's Inequality

For $x, y \in l_p(\mathbb{Z}_+; \mathbb{R})$ with $1 \leq p \leq \infty$ ,

$\|x + y\|_p \leq \|x\|_p + \|y\|_p$

Remark.

Intuition: Minkowski's inequality is the triangle inequality for $l_p$ spaces. It says the "length" of the sum of two sequences is at most the sum of their "lengths." This is the key property that confirms $l_p$ norms are genuine norms, and it generalizes the familiar fact that the shortest path between two points is a straight line.

See Exercise, which also studies a proof of Holder's Inequality, that is critically used in the proof of Minkowski's inequality.

TheoremHolder's Inequality

Let $1 \leq p, q \leq \infty$ with $1/p + 1/q = 1$ . Then, for $x \in l_p(\mathbb{Z}_+; \mathbb{R})$ , $y \in l_q(\mathbb{Z}_+; \mathbb{R})$ ,

$\sum_k x(k)y(k) \leq \|x\|_p \|y\|_q,$

Remark.

Intuition: Holder's inequality bounds the "interaction" between two sequences living in dual $l_p$ spaces. When $p = q = 2$ , this reduces to Cauchy-Schwarz. In practice, it is the workhorse behind proving that convolutions, inner products, and many signal processing operations produce finite results.

Transformations and Continuity

DefinitionTransformation

Let $X$ and $Y$ be two normed linear spaces, and let $B \subset X$ be a subset of $X$ . A law (rule, relation) $T$ which relates with every element of $B$ an element in $Y$ , is called a transformation from $X$ to $Y$ with domain $B$ . The relation is often expressed as $x \mapsto y = T(x)$ .

If for every $y \in Y$ there is an $x$ such that $y = T(x)$ , the transformation is said to be onto (or surjective). If for every element of $Y$ , there is at most one $x$ such that $y = T(x)$ , the transformation is said to be one-to-one (or injective). If these two properties hold simultaneously, the transformation is said to be bijective.

Remark.

Intuition: A transformation is simply a rule that maps inputs to outputs -- this is the abstract notion of a "system." Surjectivity means every possible output is reachable, injectivity means distinct inputs always produce distinct outputs, and bijectivity means the mapping is perfectly reversible. These properties are essential when asking whether a system can be inverted or whether information is lost.

DefinitionLinear Transformation

A transformation $T : X \to Y$ (or $T \in \Gamma(X; Y)$ ) is linear if for every $x_1, x_2 \in X$ and $\alpha_1, \alpha_2 \in \mathbb{R}$ , we have $T(\alpha_1 x_1 + \alpha_2 x_2) = \alpha_1 T(x_1) + \alpha_2 T(x_2)$ .

Remark.

Intuition: Linearity means the system obeys superposition: the response to a sum of inputs equals the sum of the individual responses, and scaling the input scales the output by the same factor. This is the defining property of the systems we study in this course, and it is what makes Fourier analysis so powerful -- we can decompose any input into simple components, analyze each one separately, and add the results.

DefinitionContinuity

A transformation $T : X \to Y$ for normed linear spaces $X, Y$ is continuous at $x_0 \in X$ , if for every $\epsilon > 0$ , $\exists \delta > 0$ such that $\|x - x_0\| \leq \delta$ implies that $\|T(x) - T(x_0)\| \leq \epsilon$ (here the norms depend on the vector spaces $X$ and $Y$ ). $T$ is said to be continuous, if it is continuous at every $x_0 \in X$ .

Remark.

Intuition: Continuity means that small perturbations in the input produce small perturbations in the output -- the system does not "blow up" from tiny changes. This is the mathematical formalization of robustness: a continuous system behaves predictably under small measurement errors or noise.

DefinitionSequential Continuity

A transformation $T : X \to Y$ is sequentially continuous at $x_0 \in X$ , if $x_n \to x$ implies that $T(x_n) \to T(x)$ .

Remark.

Intuition: Sequential continuity says the same thing as continuity but in the language of sequences: if a sequence of inputs converges to some input, then the corresponding outputs converge to the corresponding output. This formulation is often easier to verify in practice, since you can work with concrete sequences rather than abstract $\epsilon$ - $\delta$ arguments.

TheoremEquivalence of Sequential Continuity and Continuity

Sequential continuity and continuity are equivalent for normed linear spaces.

Remark.

Intuition: This theorem tells us that in normed linear spaces, the $\epsilon$ - $\delta$ definition of continuity and the sequence-based definition are interchangeable. You can use whichever is more convenient for the problem at hand -- they will always give the same answer.

TheoremContinuity of Linear Transformations

If the transformation $T$ is a linear one, then continuity is equivalent to being continuous at the null element.

Remark.

Intuition: For a linear transformation, you only need to check continuity at the origin -- continuity there automatically guarantees continuity everywhere. This is because linearity lets you "translate" any neighborhood to the origin. In practice, this greatly simplifies verifying that a linear system is well-behaved.

For some applications, sequential continuity may be more convenient to work with as one may not need to quantify $(\epsilon, \delta)$ pairs to verify continuity.

Metric Spaces

DefinitionMetric Spaces

A metric defined on a set $X$ , is a function $d : X \times X \to \mathbb{R}$ such that:

$d(x, y) \geq 0, \quad \forall x, y \in X$ and $d(x, y) = 0$ if and only if $x = y$ .
$d(x, y) = d(y, x), \quad \forall x, y \in X$ .
$d(x, y) \leq d(x, z) + d(z, y), \quad \forall x, y, z \in X$ .

A metric space $(X, d)$ is the set $X$ equipped with metric $d$ .

Remark.

Intuition: A metric generalizes the concept of "distance" to sets that may not be vector spaces. While a norm requires linear structure (addition and scaling), a metric only requires a sensible notion of distance between pairs of points. This makes metric spaces useful for analyzing convergence and continuity in settings where addition of elements may not be defined.

A normed linear space is also a metric space, with metric $d(x, y) = \|x - y\|$ . Many spaces of functions that we encounter are not linear spaces. For such spaces the concept of metric, defined above, provides an appropriate generalization of the notion of norm studied earlier.

Banach Spaces

DefinitionCauchy Sequence

A sequence $\{x_n\}$ in a normed space $X$ is Cauchy if for every $\epsilon > 0$ , there exists an $N$ such that $\|x_n - x_m\| \leq \epsilon$ , for all $n, m \geq N$ .

Remark.

Intuition: A Cauchy sequence is one whose elements get arbitrarily close to each other as you go further along -- the sequence "wants" to converge, even if we do not yet know the limit. Think of it as a sequence that is "trying to settle down." The key question is whether the space contains the point it is settling toward.

An important observation on Cauchy sequences is that every converging sequence is Cauchy, however, not all Cauchy sequences are convergent: This is because the limit might not live in the original space where the sequence elements take values from. This motivates the property of completeness.

DefinitionBanach Spaces

A normed linear space $X$ is complete, if every Cauchy sequence in $X$ has a limit in $X$ . A complete normed linear space is called Banach.

Remark.

Intuition: Completeness means there are "no holes" in the space -- every sequence that looks like it should converge actually does converge to something inside the space. A Banach space is the right setting for iterative algorithms and approximations, because you are guaranteed that limits of approximating sequences exist. Without completeness, a perfectly reasonable iterative procedure might converge to something that is not even in your space.

Banach spaces are important for many applications in engineering and applied mathematics: In many mathematical applications (such as existence of and numerical methods for solutions to differential equations), machine learning problems (such as iterative updates of data driven dynamics), stochastic analysis or optimization problems (for which a sequence of approximating solutions may be obtained for example via dynamic programming methods), sometimes we would like to know whether a given sequence converges, without necessarily knowing what the limit of the sequence may be. Banach spaces allow us to use Cauchy sequence arguments to ensure the existence of limits and some of their properties.

An example is the following: consider the solutions to the $Ax = b$ for $A$ a square matrix and $b$ a vector. One can identify conditions on an iteration of the form $x_{k+1} = (I - A)x_k + b$ to form a Cauchy sequence and converge to a solution through the contraction principle. As noted above, existence of solutions to ordinary differential equations also follow from Cauchy sequence arguments.

A subset of a Banach space $X$ is complete if and only if it is closed. If the set is closed, every Cauchy sequence in this set has a limit in $X$ and this limit should be a member of this set, hence the set is complete. If it is not closed, one can provide a counterexample sequence which does not converge to a point in the subset.

The real space $\mathbb{R}$ with the usual distance $d(x, y) = |x - y|$ is a complete space.

TheoremCompleteness of Function Spaces

a) Let $C([0, 1]; \mathbb{R})$ be the space of continuous functions in $\Gamma([0, 1]; \mathbb{R})$ with the supremum norm

$\|f\| = \sup_{t \in [0, 1]} |f(t)|.$

This space is a Banach space.

b) Let $C([0, 1]; \mathbb{R})$ be the space of continuous functions in $\Gamma([0, 1]; \mathbb{R})$ with the norm

$\|f\| = \int_0^1 |f(t)|\,dt$

This space is not a Banach space.

Remark.

Intuition: This theorem shows that completeness depends critically on the choice of norm. Under the supremum norm, the limit of continuous functions must be continuous (uniform convergence preserves continuity). But under the $L_1$ norm, a sequence of continuous functions can converge to a discontinuous function that is not in the original space. The lesson: the same set of functions can be "complete" or "incomplete" depending on how you measure distance.

This theorem illustrates a crucial point: the same set of functions can be complete or incomplete depending on the norm. The supremum norm is "strong enough" to prevent discontinuous limits, while the $L_1$ norm is not.

TheoremCompleteness of lp Spaces

$l_p(\mathbb{Z}_+; \mathbb{R}) := \left\{x \in \Gamma(\mathbb{Z}_+; \mathbb{R}) : \|x\|_p = \left(\sum_{i \in \mathbb{Z}_+} |x(i)|^p\right)^{1/p} < \infty\right\}$ is a Banach space for all $1 \leq p < \infty$ . The same holds for $p = \infty$ .

Remark.

Intuition: The $l_p$ sequence spaces are complete, meaning they are safe to work in: any "convergent-looking" sequence of signals in $l_p$ will converge to something that is still in $l_p$ . This is essential because many algorithms in signal processing and control construct solutions as limits of iterative approximations, and completeness guarantees those limits are well-defined.

Remark.

A brief remark on some typical notations: When the range space is $\mathbb{R}$ , the notation $l_p(\Omega)$ denotes $l_p(\Omega; \mathbb{R})$ for a discrete-time index set $\Omega$ and likewise for a continuous-time index set $\Omega$ , $L_p(\Omega)$ denotes $L_p(\Omega; \mathbb{R})$ .

DefinitionConvergence

In a normed linear space $\mathbb{X}$ , an infinite sequence of elements $\{x_n\}$ converges to an element $x$ , if the sequence $\{\|x_n - x\|\}$ converges to zero.

Remark.

Intuition: Convergence in a normed space means the distance between the sequence elements and the limit shrinks to zero. Note that what "distance" means depends on the norm -- convergence in the supremum norm (uniform convergence) is a much stronger statement than convergence in the $L_2$ norm (energy convergence). Always be aware of which norm you are working with.

Hilbert Spaces

DefinitionPre-Hilbert Space

A pre-Hilbert space $X$ is a linear vector space on which an inner product $\langle \cdot, \cdot \rangle : X \times X \to \mathbb{C}$ is defined, where the following are satisfied:

$\langle x, y \rangle = \langle y, x \rangle^*$ (the superscript denotes the complex conjugate) (we will also use $\overline{\langle y, x \rangle}$ to denote the complex conjugate)
$\langle x + y, z \rangle = \langle x, z \rangle + \langle y, z \rangle$
$\langle \alpha x, y \rangle = \alpha \langle x, y \rangle$
$\langle x, x \rangle \geq 0$ , equals $0$ iff $x$ is the null element.

Remark.

Intuition: An inner product enriches a vector space with geometric structure: it gives us notions of "angle" and "projection" between signals, not just "size." This is what allows us to decompose a signal into orthogonal components -- the mathematical foundation for Fourier series, least-squares fitting, and matched filtering.

Thus, corresponding to each pair $(x, y) \in X \times X$ the inner product $\langle x, y \rangle$ is a scalar. The inner product gives us a notion of "angle" between signals and allows us to define orthogonality -- concepts that are essential for signal decomposition and optimal approximation.

TheoremCauchy-Schwarz Inequality

For $x, y \in X$ ,

$|\langle x, y \rangle| \leq \sqrt{\langle x, x \rangle}\sqrt{\langle y, y \rangle},$

where equality occurs if and only if $x = \alpha y$ for some scalar $\alpha$ .

Remark.

Intuition: Cauchy-Schwarz is the abstract generalization of "the dot product of two vectors cannot exceed the product of their lengths." It bounds how much two signals can be "aligned" -- and equality holds precisely when one signal is a scaled copy of the other. This inequality underpins nearly every bound and estimate in signal processing and statistics.

The Cauchy-Schwarz inequality is one of the most important inequalities in all of mathematics. It tells us that the inner product of two signals is bounded by the product of their norms.

In a pre-Hilbert space, the inner-product defines a norm: $\|x\| = \sqrt{\langle x, x \rangle}$

The proof for this result requires one to show that $\sqrt{\langle x, x \rangle}$ satisfies the triangle inequality, that is

$\|x + y\| \leq \|x\| + \|y\|,$

which can be proven by an application of the Cauchy-Schwarz inequality.

In a given problem or application, the inner-product is to be defined. The inner product, in the special case of $\mathbb{R}^N$ , is typically the usual inner vector product; hence $\mathbb{R}^N$ with the usual inner-product is a pre-Hilbert space.

DefinitionHilbert Spaces

A complete pre-Hilbert space, is called a Hilbert space.

Remark.

Intuition: A Hilbert space is the "best of all worlds" for signal analysis: it has addition and scaling (linear space), a notion of distance (norm), a notion of angle and projection (inner product), and no missing limits (completeness). The spaces $L_2$ and $l_2$ are the prototypical Hilbert spaces in signal processing -- they capture signals with finite energy.

Hence, a Hilbert space is a Banach space, endowed with an inner product, which induces its norm. For example, if we define $C([0, 1]; \mathbb{R})$ as a pre-Hilbert space with the inner-product $\langle x, y \rangle = \int_0^1 x(t)y(t)\,dt$ , this would not be a Hilbert space, since this would not be a complete space.

The space $l_2(\mathbb{Z}_+; \mathbb{R})$ is a Hilbert space with $\langle x, y \rangle = \sum_{n \in \mathbb{Z}_+} x(n)y(n)$ for $x, y \in l_2(\mathbb{Z}_+; \mathbb{R})$ . Furthermore, $\|x\| = \sqrt{\langle x, x \rangle}$ defines a norm in $l_2(\mathbb{Z}_+; \mathbb{R})$ .

Likewise, the space $L_2(\mathbb{R}_+; \mathbb{R})$ is a Hilbert space with $\langle x, y \rangle = \int_{t \in \mathbb{R}_+} x(t)y(t)\,dt$ for $x, y \in L_2(\mathbb{R}_+; \mathbb{R})$ and its norm is $\|x\| = \sqrt{\langle x, x \rangle}$ .

TheoremContinuity of Inner Product

The inner product is continuous: if $x_n \to x$ , and $y_n \to y$ , then $\langle x_n, y_n \rangle \to \langle x, y \rangle$ for $x_n, y_n$ in a Hilbert space.

Remark.

Intuition: Continuity of the inner product means that if two sequences of signals converge, then their "correlation" (inner product) converges to the correlation of the limits. This is practically important because it guarantees that Fourier coefficients, projections, and energy computations are stable under approximation.

Why are we interested in Hilbert Spaces?

Hilbert spaces have several very useful properties:

Hilbert spaces allow us to have a geometric insight on a set of signals via the inner-product, orthogonality, and basis expansions.
If a Hilbert space is separable (to be defined shortly), there exists a countably (or sometimes only finitely) many sequence of orthonormal vectors which can be used as basis to represent all the members in this space. This is used in many areas of applied mathematics and engineering, such as Fourier expansions among many others.
A Hilbert space formulation allows us to develop approximations of signals using a finite number of basis signals.
Also related to the item above, we can state a Projection Theorem which certifies optimality in a wide variety of applications. This geometric insight carries over to more general optimization problems in spaces which are not Hilbert (via duality analysis, to be discussed later).

Orthogonality and the Projection Theorem

PropositionOrthogonality

In a Hilbert space $X$ , two vectors $x, y \in X$ are orthogonal if $\langle x, y \rangle = 0$ . A vector $x$ is orthogonal to a set $S \subset X$ if $\langle x, y \rangle = 0 \quad \forall y \in S$ .

Remark.

Intuition: Orthogonality is the generalization of perpendicularity to abstract signal spaces. Two orthogonal signals have zero "overlap" or correlation -- they carry completely independent information. This concept is at the heart of Fourier analysis, where signals are decomposed into mutually orthogonal frequency components.

By the Pythagorean theorem, for $x, y$ orthogonal, we have that $\|x + y\|^2 = \|x\|^2 + \|y\|^2$ .

A set $A \subset X$ is closed if $A$ contains the limit of any converging sequence taking values from $A$ .

TheoremProjection Theorem

Let $H$ be a Hilbert space and $B$ a subspace of $H$ . Consider the problem:

$\inf_{m \in B} \|x - m\| \qquad \text{}$

(i) A necessary and sufficient condition for $m^* \in B$ to be the minimizing element in $B$ so that

$\inf_{m \in B} \|x - m\| = \|x - m^*\|, \qquad \text{}$

or equivalently

$\|x - m^*\| \leq \|x - y\|, \qquad \forall y \in B.$

is that, $x - m^*$ is orthogonal to $B$ .

If exists, such an $m^*$ is unique.

(ii) Let $H$ be a Hilbert space and $B$ be a closed subspace of $H$ . For any vector $x \in H$ , there is a unique vector $m^* \in B$ satisfying .

Remark.

Intuition: The Projection Theorem says that the best approximation to a signal from a subspace is obtained by "dropping a perpendicular" onto that subspace -- the approximation error is orthogonal to the subspace. This is exactly what happens in least-squares estimation, optimal filtering (Wiener filter), and signal compression: the optimal reconstruction is always the orthogonal projection.

The Projection Theorem is one of the most powerful results in Hilbert space theory. It says that the best approximation to a signal from a subspace is obtained by "projecting" orthogonally onto that subspace -- the error is perpendicular to the subspace. This has direct applications in optimal filtering, least squares estimation, and signal compression.

ExampleProjection Theorem Application

Consider the minimization of

$\int_{-1}^{1} (t^n - m(t))^2\,dt,$

for some $n \in \mathbb{Z}_+$ , over all $m$ such that

$m \in M := \{f : f \in L_2([-1, 1]; \mathbb{R}),\, f(t) = \alpha + \beta t + \gamma t^2;\; \alpha, \beta, \gamma \in \mathbb{R}\}.$

(a) State the problem as a Projection problem by identifying the Hilbert space, and the projected subspace.

(b) Compute the solution, that is find the minimizing $m$ .

Remark.

Intuition: This exercise shows that finding the best polynomial approximation of degree $\leq 2$ to $t^n$ on $[-1,1]$ is exactly a projection problem in $L_2$ . The Hilbert space is $L_2([-1,1]; \mathbb{R})$ , the subspace $M$ is the set of polynomials of degree at most 2, and the minimizer is the orthogonal projection of $t^n$ onto $M$ .

Approximations and Signal Expansions

Orthogonality

DefinitionOrthogonal and Orthonormal Sets

A set of vectors in a Hilbert space $S$ is orthogonal if all elements of this set are orthogonal to each other. The set is orthonormal if each vector in this set has norm equal to one.

Remark.

Intuition: An orthonormal set is a collection of "perfectly independent, unit-sized" building blocks for signals. Orthogonality ensures no redundancy (each basis vector carries unique information), and normalization to unit length simplifies coefficient computation to simple inner products. The complex exponentials in Fourier analysis form the prototypical orthonormal set.

The Gram-Schmidt orthogonalization procedure can be invoked to generate a set of orthonormal sequences. This procedure states that given a sequence $\{x_i\}$ of linearly independent vectors, there exists an orthonormal sequence of vectors $\{e_i\}$ such that for every $x_k, \alpha_k, 1 \leq k \leq n$ , there exists $\beta_k, 1 \leq k \leq n$ with

$\sum_{k=1}^{n} \alpha_k x_k = \sum_{k=1}^{n} \beta_k e_k,$

that is, the linear span of $\{x_k, 1 \leq k \leq n\}$ is equal to the linear span of $\{e_k, 1 \leq k \leq n\}$ for every $n \in \mathbb{N}$ .

Recall that a set of vectors $\{e_i\}$ is linearly dependent if there exists a complex-valued vector $\mathbf{c} = \{c_1, c_2, \ldots, c_N\}$ such that $\sum_i^N c_i e_i = 0$ with at least one coefficient $c_i \neq 0$ .

PropositionLinear Independence of Orthonormal Vectors

A sequence of orthonormal vectors is a linearly independent collection.

Remark.

Intuition: Because orthonormal vectors point in completely "different directions," no vector in the set can be built from the others. This guarantees there is no redundancy in an orthonormal basis -- each basis function contributes something genuinely new to the representation.

We say that a sequence $\sum_{i=1}^{n} \epsilon_i e_i$ converges to $x$ , if for every $\epsilon > 0$ there exists $N \in \mathbb{N}$ such that $\|x - \sum_{i=1}^{n} \epsilon_i e_i\| < \epsilon$ , for all $n \geq N$ .

TheoremConvergence of Orthonormal Expansions

Let $\{e_i\}$ be a sequence of orthonormal vectors in a Hilbert space $H$ . Let $\{x_n = \sum_{i=1}^{n} \epsilon_i e_i\}$ be a sequence of vectors in $H$ . The sequence converges to some vector $\bar{x}$ if and only if

$\sum_{i=1}^{\infty} |\epsilon_i|^2 < \infty.$

In this case, we have that $\langle \bar{x}, e_i \rangle = \epsilon_i$ .

Remark.

Intuition: This theorem provides the bridge between abstract Hilbert space theory and practical signal representation: a signal can be expanded in an orthonormal basis if and only if the sum of squared coefficients is finite -- which is exactly the condition that the signal has finite energy. Moreover, the coefficients are simply inner products with the basis vectors, making them easy to compute.

This theorem provides the fundamental link between Hilbert space theory and signal representation: a signal can be expanded in an orthonormal basis if and only if the sum of squared coefficients is finite -- which is exactly the condition that the signal has finite energy.

Separable Hilbert Spaces and Countable Expansions

DefinitionDense Subset

Given a normed linear space $X$ , a subset $D \subset X$ is dense in $X$ , if for every $x \in X$ , and each $\epsilon > 0$ , there exists a member $d \in D$ such that $\|x - d\| \leq \epsilon$ .

Remark.

Intuition: A dense subset can approximate any element in the space to arbitrary precision. Think of how rational numbers are dense in the reals -- every real number can be approximated as closely as you like by a rational. Similarly, saying that polynomials are dense in $L_2$ means any square-integrable signal can be approximated arbitrarily well by a polynomial.

DefinitionCountable Set

A set is countable if either it has finitely many elements or there is a one-to-one correspondence between the set and the set $\mathbb{N}$ of natural numbers (which would then be an enumeration / counting map).

Remark.

Intuition: Countability means you can list all elements in a sequence, even if the list is infinite. This matters because computers and algorithms work with sequences of numbers. A countable basis for a signal space means you can, in principle, represent any signal by a list of coefficients -- which is exactly what digital signal processing does.

Examples of countable spaces are finite sets, the set $\mathbb{Z}$ of integers, and the set $\mathbb{Q}$ of rational numbers. An example of uncountable sets is the set $\mathbb{R}$ of real numbers.

TheoremCountability Properties

(a) A countable union of countable sets is countable.

(b) Cartesian product of finitely many countable sets is countable.

(c) Cartesian product of (countably) infinitely many finite sets may not be countable. The same holds even when each of the sets is finite.

(d) $[0, 1]$ is not countable.

Remark.

Intuition: These results establish the fundamental size distinctions between sets. Part (d) -- that $[0,1]$ is uncountable -- is especially important: it means continuous-time signals live in an inherently "larger" space than any list can capture. This is precisely why we need the machinery of dense subsets and basis expansions to represent continuous signals with countable data.

Cantor's diagonal argument and the triangular enumeration are important steps in proving the theorem above.

DefinitionSeparable Space

A space $X$ is separable, if it contains a countable dense set.

Remark.

Intuition: Separability means that even though the space may contain uncountably many elements, a countable subset is "enough" to approximate everything. This is the mathematical guarantee that digital signal processing is possible: separable signal spaces can be faithfully represented using countably many basis functions (like Fourier harmonics) and their coefficients.

Separability states that it suffices to work with a countable set, when a set is uncountable, for computational purposes. Examples of separable sets are $\mathbb{R}$ , and the set of continuous and bounded functions on a compact set metrized with the maximum distance between the functions.

TheoremCountability of Orthonormal Systems in Separable Spaces

Let $H$ be a separable Hilbert space. Then, every orthonormal system of vectors in $H$ has a finite or countably infinite number of elements.

Remark.

Intuition: In a separable Hilbert space, you can never have an uncountable collection of mutually orthogonal directions. This limits the "dimensionality" of the space in a way that makes countable basis expansions (like Fourier series) sufficient to represent every signal. It is the reason why a countable set of frequencies captures all the information in a square-integrable signal.

DefinitionComplete Orthonormal Sequence

An orthonormal sequence in a Hilbert space $H$ is complete if the only vector in $H$ which is orthogonal to each of the vectors in the sequence is the null vector.

Remark.

Intuition: Completeness of an orthonormal sequence means there is "no signal left over" that the basis misses. If a signal is orthogonal to every basis vector, it must be zero. This is the condition that distinguishes a true basis (which can represent everything) from a partial set (which can only represent signals in a subspace).

TheoremSeparability and Complete Orthonormal Sequences

A Hilbert space $H$ contains a complete orthonormal sequence (that is, a countable collection of such vectors) if and only if it is separable.

Remark.

Intuition: This is the central structural result: a Hilbert space admits a countable basis (enabling Fourier-type expansions) if and only if it is separable. Since the signal spaces we care about ( $L_2$ , $l_2$ ) are separable, this theorem guarantees that every finite-energy signal can be expressed as a countable sum of orthonormal basis functions.

The proof above also showed the following result:

TheoremBasis Expansion in Hilbert Spaces

In a Hilbert space $H$ , a complete orthonormal sequence $\{e_n, n \in \mathbb{N}\}$ defines a basis in the sense that for any $x \in H$ , we have

$x = \lim_{N \to \infty} \sum_{k=1}^{N} \langle x, e_k \rangle e_k$

Remark.

Intuition: This is the master representation theorem: any signal in a separable Hilbert space can be perfectly reconstructed from its inner products with basis vectors. The coefficients $\langle x, e_k \rangle$ are the "coordinates" of the signal, and the reconstruction converges in the energy ( $L_2$ ) sense. Fourier series, wavelet decompositions, and all other orthonormal expansions are special cases of this result.

This is the fundamental representation theorem for signals in Hilbert spaces. Every signal can be decomposed into a (possibly infinite) sum of orthonormal basis vectors, with coefficients given by inner products. This is the abstract framework underlying Fourier series, wavelet expansions, and many other signal representations.

Separability of $l_2$ and $L_2$ spaces

In view of Theorem, the following result builds on the fact that the sequence of orthonormal vectors

$\left\{e_n, n \in \mathbb{N} : \quad e_n : \mathbb{Z}_+ \to \mathbb{R}, \quad e_n(m) = 1_{\{m = n\}}, \quad m \in \mathbb{Z}_+\right\}$

is a countable complete orthonormal set in $l_2(\mathbb{Z}_+; \mathbb{R})$ : Note that for any $h = \{h(1), h(2), \cdots\} \in l_2(\mathbb{Z}_+; \mathbb{R})$ , $\langle h, e_n \rangle = h(n)$ and hence for any vector $v \in l_2(\mathbb{Z}_+; \mathbb{R})$

$\langle v, e_n \rangle = 0 \quad \forall n \in \mathbb{Z}_+ \implies \|v\| = 0.$

TheoremSeparability of l2

The Hilbert space $l_2(\mathbb{Z}_+; \mathbb{R})$ with inner product

$\langle h_1, h_2 \rangle = \sum_{n \in \mathbb{Z}_+} h_1(n) h_2(n),$

is separable.

Remark.

Intuition: Separability of $l_2$ means that the space of square-summable sequences -- the natural home for discrete-time finite-energy signals -- can be spanned by a countable orthonormal basis. The standard basis vectors $e_n$ (which are 1 at position $n$ and 0 elsewhere) serve as this basis, and every sequence is simply a list of its values at each time index.

Next, we will show that $L_2([a, b]; \mathbb{R})$ is separable for $a, b \in \mathbb{R}$ . To establish this result, we will review some useful facts.

TheoremBernstein-Weierstrass' Theorem

Any function in $C([0, 1]; \mathbb{R})$ can be approximated arbitrarily well by a polynomial under the supremum norm.

Remark.

Intuition: The Weierstrass theorem says that polynomials are "universal approximators" for continuous functions on a closed interval. No matter how complicated a continuous signal is, a polynomial of sufficiently high degree can match it to any desired accuracy. This foundational result is what lets us build toward proving that $L_2$ spaces are separable.

TheoremDensity of Continuous Functions in L2

The set $C([0, 1]; \mathbb{R})$ , of continuous functions, is dense in $L_2([0, 1]; \mathbb{R})$ .

Remark.

Intuition: This theorem says that any square-integrable function -- even one with discontinuities -- can be approximated arbitrarily well (in the energy sense) by a continuous function. This is reassuring for engineers: continuous, "well-behaved" signals are dense enough to represent all finite-energy signals, so working with smooth approximations does not lose generality.

TheoremSeparability of L2([0,1]; R)

The space $L_2([0, 1]; \mathbb{R})$ is separable.

Remark.

Intuition: Separability of $L_2([0,1])$ is the key result that justifies Fourier series: it guarantees that every finite-energy signal on $[0,1]$ can be represented by a countable orthonormal basis. The proof chains together polynomial approximation (Weierstrass) and density of continuous functions to construct such a basis -- and the complex exponentials of Fourier series are one natural choice.

TheoremSeparability of L2 on the Positive Reals

$L_2(\mathbb{R}_+; \mathbb{R})$ is separable.

Remark.

Intuition: Extending separability from a bounded interval to all of $\mathbb{R}_+$ means Fourier-type expansions work for signals defined on the entire positive real line, not just finite intervals. The key idea is that any finite-energy signal has most of its energy concentrated in some finite window, so the bounded-interval result can be "stitched together" to cover the whole line.

Two further results:

TheoremDensity of L2 in L1

The set $L_2([1, \infty); \mathbb{R})$ is dense in $L_1([1, \infty); \mathbb{R})$ .

Remark.

Intuition: This says that any absolutely integrable signal on $[1,\infty)$ can be approximated by a square-integrable signal. In practice, this means results proven for $L_2$ signals (where we have Hilbert space tools) can often be extended to $L_1$ signals through approximation arguments.

TheoremDensity of Continuous Functions with Compact Support

Let $C_c$ denote the space of continuous functions with compact support. $C_c$ is dense in $L_1(\mathbb{R}; \mathbb{R})$ .

Remark.

Intuition: This result says that any integrable signal can be approximated by a continuous function that is zero outside some finite interval. In engineering terms, every $L_1$ signal can be well-approximated by a "nice" signal that is both smooth and compactly supported -- a key step in proving results about Fourier transforms, since such signals are easy to integrate and transform.

Signal expansions in $L_2([a, b]; \mathbb{R})$ or $L_2([a, b]; \mathbb{C})$ : Fourier, Haar and Polynomial Bases

Fourier Signals as Basis Vectors and Fourier Series

Fourier series is a very important class of orthonormal sequences which are used to represent both discrete-time and continuous-time signals. These will be studied later on in much detail. In particular, we will soon see that in $L_2([0, P]; \mathbb{C})$ the family of complex exponentials

$\left\{e_k : e_k(t) = \frac{1}{\sqrt{P}} e^{i2\pi \frac{k}{P} t}, k \in \mathbb{Z}\right\},$

provides a complete orthonormal sequence.

Accordingly, for any $x \in L_2([0, P]; \mathbb{C})$ , we can write

$x = \sum_{k \in \mathbb{Z}} \langle x, e_k \rangle e_k$

or by expanding the inner-product, we have

$x(t) = \sum_{k \in \mathbb{Z}} \left(\int_{\mathbb{R}} x(s) \frac{1}{\sqrt{P}} e^{-i2\pi \frac{k}{P} s}\,ds\right) \frac{1}{\sqrt{P}} e^{i2\pi \frac{k}{P} t}$

where the convergence of the infinite sum is in the $L_2$ sense.

This expansion is precisely the Fourier series expansion of the function $x$ in $L_2([0, P]; \mathbb{C})$ . The inner-product $\langle x, e_k \rangle$ defines the Fourier transform.

Legendre Polynomials as Basis Vectors

We have seen, in the context of Theorems 2.3.7 and 2.3.8 (see the proof of 2.3.9), the functions $\{t^k, k \in \mathbb{Z}_+\}$ can be used to construct an orthonormal collection of signals which is complete in $L_2([a, b]; \mathbb{R})$ . These complete orthonormal polynomials are called Legendre polynomials.

Haar Functions as Basis Vectors

One further practically very important basis is the class of Haar functions (known as wavelets). Define

$\Psi_{0,0}(x) = \begin{cases} 1, & \text{if } 0 \leq x \leq 1 \\ 0 & \text{else} \end{cases} \qquad \text{}$

and for $n \in \mathbb{Z}_+$ , $k \in \{0, 1, 2, \ldots, 2^n - 1\}$ ,

$\Phi_{n,k}(x) = \begin{cases} 2^{n/2}, & \text{if } k2^{-n} \leq x < (k + 1/2)2^{-n} \\ -2^{n/2}, & \text{if } (k + 1/2)2^{-n} \leq x < (k+1)2^{-n} \\ 0 & \text{else} \end{cases} \qquad \text{}$

TheoremCompleteness of the Haar System

The (Haar) set of vectors

$\left\{\Psi_{0,0}, \Phi_{n,k}, n \in \mathbb{Z}_+, k \in \{0, 1, 2, \ldots, 2^n - 1\}\right\}$

is a complete orthonormal sequence in $L_2([0, 1]; \mathbb{R})$ .

Remark.

Intuition: The Haar system provides an alternative to Fourier series for representing $L_2$ signals. While Fourier basis functions are smooth sinusoids spread over the entire interval, Haar functions are localized step functions that capture information at specific locations and scales. This makes wavelets particularly effective for signals with sharp transitions or edges, which is why they are widely used in image compression (e.g., JPEG 2000).

The important observation to note here is that, different expansions might be suited for different engineering applications: for instance Haar series are occasionally used in image processing with certain edge behaviours, whereas Fourier expansion is extensively used in speech processing and communications theoretic applications.

Approximations

Approximations allow us to represent data using finitely many vectors. The basis expansions studied above can be used to obtain the best approximation of a signal up to finitely many terms to be used in an approximation: This can be posed as a projection problem, and we have seen that the best approximation is one in which the approximation error is orthogonal to all the vectors used in the approximation (defining an approximation subspace).

Exercises

Exercise

(a) The set $C^\infty(\mathbb{R})$ , which is the set of all functions from $\mathbb{R}$ to $\mathbb{R}$ that are infinitely differentiable, together with the operations of addition and scalar multiplication defined as follows, is a vector space: For any $f_1, f_2 \in C^\infty(\mathbb{R})$

$(f_1 + f_2)(x) = f_1(x) + f_2(x), \qquad x \in \mathbb{R}$

and for any $\alpha \in \mathbb{R}$ and $f \in C^\infty(\mathbb{R})$

$(\alpha \cdot f)(x) = \alpha f(x), \qquad x \in \mathbb{R}$

(i) Now, consider $\mathcal{P}(\mathbb{R})$ to be the set of all (polynomial) functions that maps $\mathbb{R}$ to $\mathbb{R}$ such that any $f \in \mathcal{P}(\mathbb{R})$ can be written as $f(x) = \sum_{i=0}^{n} a_i x^i$ for some $n \in \mathbb{N}$ with $a_0, a_1, \cdots, a_n \in \mathbb{R}$ . Suppose that we define the same addition and scalar multiplication operations as defined above. Is $\mathcal{P}(\mathbb{R})$ a subspace in $C^\infty(\mathbb{R})$ ?

(ii) Show that the space of all functions in $C^\infty(\mathbb{R})$ which map $\mathbb{R}$ to $\mathbb{R}$ which satisfy $f(10) = 0$ is a vector space with addition and multiplication defined as above.

(b) Consider the set $\mathbb{R}^n$ . On $\mathbb{R}^n$ , define an addition operation and a scalar multiplication operation as follows:

$(x_1, x_2, \cdots, x_n) + (y_1, y_2, \cdots, y_n) = (x_1 + y_1, x_2 + y_2, \cdots, x_n + y_n)$

$\alpha \cdot (x_1, x_2, \cdots, x_n) = (\alpha x_1, \alpha x_2, \cdots, \alpha x_n)$

Show that, with these operations, $\mathbb{R}^n$ is a vector space.

(c) Consider the set

$\mathbb{W} = \{(x, y) : x \in \mathbb{R}, x \in \mathbb{R}, x > 0, y > 0\}$

On this set, define an addition operation and a scalar multiplication operation as follows:

$(x_1, y_1) + (x_2, y_2) = (x_1 x_2, y_1 y_2)$

$\alpha \cdot (x, y) = (x^\alpha, y^\alpha)$

Show that, with these operations, $\mathbb{W}$ is a vector space. Hint: Consider a bijection between $\mathbb{W}$ and the space $\mathbb{R}^2$ with $\mathbb{W} \ni (x, y) \mapsto (\log(x), \log(y)) \in \mathbb{R}^2$ .

Exercise (Holder's and Minkowski's Inequalities)

Let $1 \leq p, q \leq \infty$ with $1/p + 1/q = 1$ . Let $x \in l_p(\mathbb{Z}_+)$ and $y \in l_q(\mathbb{Z}_+)$ . Then,

$\sum_{i=0}^{\infty} |x_i y_i| \leq \|x\|_p \|y\|_q$

This is known as Holder's inequality. Equality holds if and only if

$\left(\frac{x_i}{\|x\|_p}\right)^{(1/q)} = \left(\frac{y_i}{\|y\|_q}\right)^{(1/p)},$

for each $i \in \mathbb{Z}_+$ .

To prove this, perform the following:

(a) Show that for $a \geq 0, b \geq 0, c \in (0, 1)$ : $a^c b^{1-c} \leq ca + (1 - c)b$ with equality if and only if $a = b$ . To show this, you may consider the function $f(t) = t^c - ct + c - 1$ and see how it behaves for $t \geq 0$ and let $t = a/b$ .

(b) Apply the inequality $a^c b^{1-c} \leq ca + (1 - c)b$ to the numbers:

$a = \left(\frac{|x_i|}{\|x\|_p}\right)^p, \quad b = \left(\frac{|y_i|}{\|y\|_q}\right)^q, \quad c = 1/p$

Holder's inequality is useful to prove Minkowski's inequality which states that for $1 < p < \infty$ ,

$\|x + y\|_p \leq \|x\|_p + \|y\|_p$

This proceeds as follows:

$\sum_{i=1}^{n} |x_i + y_i|^p \leq \sum_{i=1}^{n} |x_i + y_i|^{p-1}|x_i + y_i| \leq \sum_{i=1}^{n} |x_i + y_i|^{p-1}|x_i| + \sum_{i=1}^{n} |x_i + y_i|^{p-1}|y_i|$

$= \left(\sum_{i=1}^{n} |x_i + y_i|^{(p-1)q}\right)^{1/q}\left(\sum_{i=1}^{n} |x_i|^p\right)^{1/p} + \left(\sum_{i=1}^{n} |x_i + y_i|^{(p-1)q}\right)^{1/q}\left(\sum_{i=1}^{n} |y_i|^p\right)^{1/p}$

$= \left(\sum_{i=1}^{n} |x_i + y_i|^p\right)^{1/q}\left(\left(\sum_{i=1}^{n} |x_i|^p\right)^{1/p} + \left(\sum_{i=1}^{n} |y_i|^p\right)^{1/p}\right) \qquad \text{}$

Thus, using that $1 - 1/q = 1/p$ ,

$\left(\sum_{i=1}^{n} |x_i + y_i|^p\right)^{1/p} \leq \left(\sum_{i=1}^{n} |x_i|^p\right)^{1/p} + \left(\sum_{i=1}^{n} |y_i|^p\right)^{1/p},$

Now, the above holds for every $n$ . Taking the limit $n \to \infty$ (first on the right and then on the left), it follows that

$\|x + y\|_p \leq \|x\|_p + \|y\|_p$

which is the desired inequality.

Exercise

Given a normed linear space $(X, \|\cdot\|)$ , introduce a map $n : X \times X \to \mathbb{R}$ :

$n(x, y) = \frac{\|x - y\|}{1 + \|x - y\|}$

Show that $n(x, y)$ is a metric: That is, it satisfies the triangle inequality:

$n(x, y) \leq n(x, z) + n(z, y), \qquad \forall x, y, z \in X,$

and that $n(x, y) = 0$ iff $x = y$ , and finally $n(x, y) = n(y, x)$ .

Exercise

Let $\{e_n, n \in \mathbb{N}\}$ be a complete orthonormal sequence in a real Hilbert space $H$ . Let $\mathcal{M}$ be a subspace of $H$ , spanned by $\{e_k, k \in S\}$ , for some finite set $S \subset \mathbb{N}$ . That is,

$\mathcal{M} = \{v \in H : \exists \alpha_k \in \mathbb{R}, k \in S,\quad v = \sum_{k \in S} \alpha_k e_k\}$

Let $x \in H$ be given. Find $x^* \in \mathcal{M}$ which is the solution to the following:

$\min_{x_0 \in \mathcal{M}} \|x - x_0\|,$

in terms of $x$ , and $\{e_n, n \in \mathbb{N}\}$ .

Hint: Any vector in $H$ can be written as $x = \sum_{n \in \mathbb{N}} \langle x, e_n \rangle e_n$ .

Exercise

Let $T : L_2(\mathbb{R}_+; \mathbb{R}) \to \mathbb{R}$ be a mapping given by:

$T(f) = \int_1^{\infty} f(t) \frac{1 + t^2}{t^4}\,dt$

Is $T$ continuous at any given $f_0 \in L_2(\mathbb{R}_+; \mathbb{R})$ ?

Exercise

Consider an inner-product defined by:

$\langle x, y \rangle = \lim_{T \to \infty} \frac{1}{T} \int_{t=0}^{T} x(t) y(t)$

Is the resulting inner-product (pre-Hilbert) space separable?

Exercise

Let $x, y \in f(\mathbb{Z}; \mathbb{R})$ ; that is, $x, y$ map $\mathbb{Z}$ to $\mathbb{R}$ , such that $x = \{\ldots, x_{-2}, x_{-1}, x_0, x_1, x_2, \ldots\}$ and $y = \{\ldots, y_{-2}, y_{-1}, y_0, y_1, y_2, \ldots\}$ and $x_k, y_k \in \mathbb{R}$ , for all $k \in \mathbb{Z}$ .

For (a)-(c) below, state if the following are true or false with justifications in a few sentences:

(a) $\langle x, y \rangle = \sum_{i \in \mathbb{Z}} i^2 x_i y_i$ is an inner-product.

(b) $\langle x, y \rangle = \sum_{i \in \mathbb{Z}} x_i y_i$ is an inner-product.

(c) $\{x : \|x\|_2^2 < \infty\}$ contains a complete orthonormal sequence, where $\|x\|_2 = \sqrt{\sum_{i \in \mathbb{Z}} |x(i)|^2}$ .

Exercise

Let $\mathbb{X}$ be a Hilbert space and $x, y \in \mathbb{X}$ . Prove the following:

(a) (Parallelogram Law)

$\|x + y\|^2 + \|x - y\|^2 = 2\|x\|^2 + 2\|y\|^2.$

(b)

$|\langle x, y \rangle| \leq (\|x\|)(\|y\|).$

(c)

$2|\langle x, y \rangle| \leq \|x\|^2 + \|y\|^2.$

Exercise

Let $H$ be a finite dimensional Hilbert space and $\{v_1, v_2\}$ be two linearly independent vectors in $H$ .

Let $b_1, b_2 \in \mathbb{R}$ . Show that, among all vectors $x \in H$ , which satisfies

$\langle x, v_1 \rangle = b_1,$

$\langle x, v_2 \rangle = b_2,$

the vector $x^* \in H$ has the minimum norm if $x^*$ satisfies:

$x^* = \alpha_1 v_1 + \alpha_2 v_2,$

with

$\langle v_1, v_1 \rangle \alpha_1 + \langle v_2, v_1 \rangle \alpha_2 = b_1,$

$\langle v_1, v_2 \rangle \alpha_1 + \langle v_2, v_2 \rangle \alpha_2 = b_2.$

Exercise

Let $H$ be a Hilbert space and $C \subset H$ be a dense subset of $H$ . Suppose that any element $h_C$ in $C$ is such that for every $\epsilon > 0$ , there exist $n \in \mathbb{N}$ and $\beta_i \in \mathbb{R}, i \in \mathbb{N}$ so that

$\left\|\sum_{i=0}^{n} \beta_i e_i - h_C\right\| \leq \epsilon$

where $\{e_\alpha, \alpha \in \mathbb{N}\}$ is a countable sequence of orthonormal vectors in $H$ .

Is it the case that $H$ is separable?

Exercise

Let $x$ be in the real Hilbert space $L_2([0, 1]; \mathbb{R})$ with the inner product

$\langle x, y \rangle = \int_0^1 x(t) y(t)\,dt.$

We would like to express $x$ in terms of the following two signals (which belong to the Haar signal space)

$u_1(t) = 1_{\{t \in [0, 1/2)\}} - 1_{\{t \in [1/2, 1]\}}, \qquad t \in [0, 1]$

$u_2(t) = 1_{\{t \in [0, 1]\}}, \qquad t \in [0, 1]$

such that

$\int_0^1 \left|x(t) - \sum_{i=1}^{2} \alpha_i u_i(t)\right|^2 dt$

is minimized, for $\{\alpha_1, \alpha_2 \in \mathbb{R}\}$ .

(a) Using the Gram-Schmidt procedure, obtain two orthonormal vectors $\{e_1(t), e_2(t)\}$ such that these vectors linearly span the same space spanned by $\{u_1(t), u_2(t)\}$ .

(b) State the problem as a projection theorem problem by clearly identifying the Hilbert space and the projected subspace.

(c) Let $x(t) = 1_{\{t \in [1/2, 1]\}}$ . Find the minimizing $\alpha_1, \alpha_2$ values.

Exercise

Let $C([0, 1]; \mathbb{R})$ denote the normed linear space of continuous functions from $[-1, 1]$ to $\mathbb{R}$ under the supremum norm. We observed earlier that polynomials can be used to approximate any function in this space with arbitrary precision, under the supremum norm (Weierstrass Theorem).

Given this, repeating the arguments we made in class, argue that the family of polynomials $\{1, t, t^2, \cdots\}$ can be used to form a complete orthonormal sequence in $L_2([0, 1]; \mathbb{R})$ . This also establishes that $L_2([0, 1]; \mathbb{R})$ is separable.

Exercise

Alice and Bob are approached by a generous company and asked to solve the following problem: The company wishes to store any signal $f$ in $L_2(\mathbb{R}_+; \mathbb{R})$ in a computer with a given error of $\epsilon > 0$ , that is for every $f \in L_2(\mathbb{R}_+)$ , there exists some signal $h \in H$ such that $\|f - h\|_2 \leq \epsilon$ (thus the error is uniform over all possible signals), where $H$ is the stored family of signals (in the computer's memory).

To achieve this, they encourage Alice or Bob to use a finite or a countable expansion to represent the signal and later store this signal in an arbitrarily large memory. Hence, they allow Alice or Bob to purchase as much memory as they would like for a given error value of $\epsilon$ .

Alice turns down the offer and says it is impossible to do that for any $\epsilon$ with a finite memory and argues then she needs infinite memory, which is impossible.

Bob accepts the offer and says he may need a very large, but finite, memory for any given $\epsilon > 0$ ; thus, the task is possible.

Which one is the accurate assessment?

(a) If you think Alice is right, which further conditions can she impose to make this possible? Why is she right?

(b) If you think Bob is right, can you suggest a method? Why is he right?

Exercise

Prove Theorem using a probability theoretic method. Proceed as follows: The number

$B_{n,f}(t) = \sum_{k=0}^{n} f\left(\frac{k}{n}\right) \binom{n}{k} t^k (1-t)^{n-k}.$

can be expressed as the expectation $E[f_t(\frac{S_n}{n})]$ , where $S_n = X_1 + X_2 + \cdots + X_n$ , where $X_i$ is an i.i.d. collection of Bernoulli random variables where $X_i = 1$ with probability $t$ and $X_i = 0$ with probability $1 - t$ . Here, observe that the sum $S_n$ has a binomial distribution. Thus,

$\sup_{t \in [0,1]} |f(t) - B_{n,f}(t)| = \sup_{t \in [0,1]} |E_t[f(\frac{S_n}{n})] - f(t)|,$

where $E_t$ , for each $t$ , denotes the expectation with respect to the i.i.d. Bernoulli random variables $X_i$ each with $P(X_i = 1) = t$ . Let $P_t$ denote the probability measure induced by these $t$ -parametrized i.i.d. sequence of Bernoulli random variables.

Since $f$ is continuous and $[0,1]$ is compact, $f$ is uniformly continuous. Thus, for every $\epsilon > 0$ , there exists $\delta > 0$ such that $|x - y| < \delta$ implies that $|f(x) - f(y)| \leq \epsilon$ . Thus,

$|E_t[f(\frac{S_n}{n})] - f(t)|$

$= \int_{\omega: |\frac{S_n}{n} - t| \leq \delta} |f(\frac{S_n}{n}) - f(t)| P(d\omega) + \int_{\omega: |\frac{S_n}{n} - t| > \delta} |f(\frac{S_n}{n}) - f(t)| P(d\omega)$

$\leq \epsilon + 2 \sup_{y \in [0,1]} |f(y)| P_t(|\frac{S_n}{n} - t| > \delta) \qquad \text{}$

The last term converges to $\epsilon$ as $n \to \infty$ by the law of large numbers. The above holds for every $\epsilon > 0$ .

Now, one needs to show that this convergence is uniform in $t$ : For this show that for all $t \in [0, 1]$ , via Markov's inequality and the independence of $X_i$

$P_t(|\frac{S_n}{n} - t| > \delta) = P_t(|\frac{S_n}{n} - t|^2 > \delta^2) \leq \frac{1}{4n\delta^2},$

establishing uniform convergence (over $t \in [0, 1]$ ), and thus complete the proof.

Exercise (A useful result on countability properties)

Let $F : \mathbb{R} \to \mathbb{R}$ be a monotonically increasing function (that is, $x_1 \leq x_2$ implies that $F(x_1) \leq F(x_2)$ ). Show that $F$ can have at most countably many points of discontinuity.

Hint: If a point is such that $F$ is discontinuous, then there exists $n \in \mathbb{N}$ with $F(x^+) := \lim\inf_{x_n \downarrow x} F(x)$ , $F(x^-) := \lim\sup_{x_n \uparrow x} F(x)$ , $F(x^+) - F(x^-) > \frac{1}{n}$ . Express $\mathbb{R} = \cup_{m \in \mathbb{Z}} (m, m+1]$ . Let $B_n^m := \{x \in (m, m+1] : F(x^+) - F(x^-) > \frac{1}{n}\}$ . It must be that $B_n^m$ is finite for otherwise the jump would be unbounded in the interval $(m, m+1]$ . Then, the countable union $\cup_n B_n^m$ will be countable. Finally $\cup_m B_n^m$ is also countable.

Exercise

Prove Theorem (that the Haar system is a complete orthonormal sequence in $L_2([0,1]; \mathbb{R})$ ).

Normed Linear (Vector) Spaces and Metric Spaces

Linear Spaces

Normed Linear Spaces

Transformations and Continuity

Metric Spaces

Banach Spaces

Hilbert Spaces

Why are we interested in Hilbert Spaces?

Orthogonality and the Projection Theorem

Approximations and Signal Expansions

Orthogonality

Separable Hilbert Spaces and Countable Expansions

Separability of l2l_2l2​ and L2L_2L2​ spaces

Signal expansions in L2([a,b];R)L_2([a, b]; \mathbb{R})L2​([a,b];R) or L2([a,b];C)L_2([a, b]; \mathbb{C})L2​([a,b];C): Fourier, Haar and Polynomial Bases

Fourier Signals as Basis Vectors and Fourier Series

Legendre Polynomials as Basis Vectors

Haar Functions as Basis Vectors

Approximations

Exercises

Exercise

Exercise (Holder's and Minkowski's Inequalities)

Exercise

Exercise

Exercise

Exercise

Exercise

Exercise

Exercise

Exercise

Exercise

Exercise

Exercise

Exercise

Exercise (A useful result on countability properties)

Exercise

Separability of $l_2$ and $L_2$ spaces

Signal expansions in $L_2([a, b]; \mathbb{R})$ or $L_2([a, b]; \mathbb{C})$ : Fourier, Haar and Polynomial Bases