Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions lectures/_static/lecture_specific/orth_proj/orth_proj_thm2.tex
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@
\usetikzlibrary{arrows.meta, arrows}
\begin{document}

%.. tikz::
\begin{tikzpicture}
[scale=5, axis/.style={<->, >=stealth'}, important line/.style={thick}, dotted line/.style={dotted, thick,red}, dashed line/.style={dashed, thin}, every node/.style={color=black}] \coordinate(O) at (0,0);
[scale=5, axis/.style={<->, >=stealth'}, important line/.style={thick},
dotted line/.style={dotted, thick,red}, dashed line/.style={dashed, thin},
every node/.style={color=black}]
\coordinate(O) at (0,0);
\coordinate (y') at (-0.4,0.1);
\coordinate (Py) at (0.6,0.3);
\coordinate (y) at (0.4,0.7);
Expand All @@ -14,11 +16,12 @@
\coordinate (Py') at (-0.28,-0.14);
\draw[axis] (-0.5,0) -- (0.9,0) node(xline)[right] {};
\draw[axis] (0,-0.3) -- (0,0.7) node(yline)[above] {};
\draw[important line, thick] (Z1) -- (O);
\draw[important line, thick] (Py) -- (Z2) node[right] {$S$};
\draw[important line,blue,thick, ->] (O) -- (Py) node[anchor = north west, text width=2em] {$P y$};
\draw[important line,blue, ->] (O) -- (y') node[left] {$y'$};
\draw[important line, thick] (Z1) -- (O) node[right] {};
\draw[important line, thick] (Py) -- (Z2) node[right] {$S$};
\draw[important line, blue,->] (O) -- (y) node[right] {$y$};
\draw[important line,blue,thick, ->] (O) -- (Py');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built this but it still generates the same old figure, so I updated this line to

    \draw[important line, blue,->]  (O) -- (Py') node[anchor = north west, text width=5em] {$P y'$};

following the previous lines which gives the blue arrow!

\draw[dotted line] (0.54,0.27) -- (0.51,0.33);
\draw[dotted line] (0.57,0.36) -- (0.51,0.33);
\draw[dotted line] (-0.22,-0.11) -- (-0.25,-0.05);
Expand Down
79 changes: 54 additions & 25 deletions lectures/orth_proj.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,10 @@ What vector within a linear subspace of $\mathbb R^n$ best approximates a given

The next theorem answers this question.

**Theorem** (OPT) Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$,
```{prf:theorem} Orthogonal Projection Theorem
:label: opt

Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$,
there exists a unique solution to the minimization problem

$$
Expand All @@ -144,6 +147,7 @@ The minimizer $\hat y$ is the unique vector in $\mathbb R^n$ that satisfies
* $y - \hat y \perp S$

The vector $\hat y$ is called the **orthogonal projection** of $y$ onto $S$.
```

The next figure provides some intuition

Expand Down Expand Up @@ -179,7 +183,7 @@ $$
y \in Y\; \mapsto \text{ its orthogonal projection } \hat y \in S
$$

By the OPT, this is a well-defined mapping or *operator* from $\mathbb R^n$ to $\mathbb R^n$.
By the {prf:ref}`opt`, this is a well-defined mapping or *operator* from $\mathbb R^n$ to $\mathbb R^n$.

In what follows we denote this operator by a matrix $P$

Expand All @@ -192,7 +196,7 @@ The operator $P$ is called the **orthogonal projection mapping onto** $S$.

```

It is immediate from the OPT that for any $y \in \mathbb R^n$
It is immediate from the {prf:ref}`opt` that for any $y \in \mathbb R^n$

1. $P y \in S$ and
1. $y - P y \perp S$
Expand Down Expand Up @@ -224,16 +228,20 @@ such that $y = x_1 + x_2$.

Moreover, $x_1 = \hat E_S y$ and $x_2 = y - \hat E_S y$.

This amounts to another version of the OPT:
This amounts to another version of the {prf:ref}`opt`:

```{prf:theorem} Orthogonal Projection Theorem (another version)
:label: opt_another

**Theorem**. If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then
If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then

$$
P y \perp M y
\quad \text{and} \quad
y = P y + M y
\quad \text{for all } \, y \in \mathbb R^n
$$
```

The next figure illustrates

Expand Down Expand Up @@ -285,7 +293,9 @@ Combining this result with {eq}`pob` verifies the claim.

When a subspace onto which we project is orthonormal, computing the projection simplifies:

**Theorem** If $\{u_1, \ldots, u_k\}$ is an orthonormal basis for $S$, then
```{prf:theorem}

If $\{u_1, \ldots, u_k\}$ is an orthonormal basis for $S$, then

```{math}
:label: exp_for_op
Expand All @@ -294,14 +304,15 @@ P y = \sum_{i=1}^k \langle y, u_i \rangle u_i,
\quad
\forall \; y \in \mathbb R^n
```
```

Proof: Fix $y \in \mathbb R^n$ and let $P y$ be defined as in {eq}`exp_for_op`.
```{prf:proof} Fix $y \in \mathbb{R}^n$ and let $P y$ be defined as in {eq}`exp_for_op`.

Clearly, $P y \in S$.

We claim that $y - P y \perp S$ also holds.

It sufficies to show that $y - P y \perp$ any basis vector $u_i$.
It suffices to show that $y - P y \perp$ any basis vector $u_i$.

This is true because

Expand All @@ -310,9 +321,11 @@ $$
= \langle y, u_j \rangle - \sum_{i=1}^k \langle y, u_i \rangle
\langle u_i, u_j \rangle = 0
$$
```

(Why is this sufficient to establish the claim that $y - P y \perp S$?)


## Projection Via Matrix Algebra

Let $S$ be a linear subspace of $\mathbb R^n$ and let $y \in \mathbb R^n$.
Expand All @@ -327,13 +340,17 @@ Evidently $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathb

[This reference](https://en.wikipedia.org/wiki/Linear_map#Matrices) is useful.

**Theorem.** Let the columns of $n \times k$ matrix $X$ form a basis of $S$. Then
```{prf:theorem}
:label: proj_matrix

Let the columns of $n \times k$ matrix $X$ form a basis of $S$. Then

$$
P = X (X'X)^{-1} X'
$$
```

Proof: Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that
```{prf:proof} Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that

1. $P y \in S$, and
2. $y - P y \perp S$
Expand Down Expand Up @@ -367,18 +384,19 @@ y]
$$

The proof is now complete.
```

### Starting with the Basis

It is common in applications to start with $n \times k$ matrix $X$ with linearly independent columns and let

$$
S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\col_1 X, \ldots, \col_k X \}
S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\mathop{\mathrm{col}}_i X, \ldots, \mathop{\mathrm{col}}_k X \}
$$

Then the columns of $X$ form a basis of $S$.

From the preceding theorem, $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$.
From the {prf:ref}`proj_matrix`, $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$.

In this context, $P$ is often called the **projection matrix**

Expand All @@ -388,7 +406,7 @@ In this context, $P$ is often called the **projection matrix**

Suppose that $U$ is $n \times k$ with orthonormal columns.

Let $u_i := \mathop{\mathrm{col}} U_i$ for each $i$, let $S := \mathop{\mathrm{span}} U$ and let $y \in \mathbb R^n$.
Let $u_i := \mathop{\mathrm{col}}_i U$ for each $i$, let $S := \mathop{\mathrm{span}} U$ and let $y \in \mathbb R^n$.

We know that the projection of $y$ onto $S$ is

Expand Down Expand Up @@ -428,15 +446,18 @@ By approximate solution, we mean a $b \in \mathbb R^k$ such that $X b$ is close

The next theorem shows that a best approximation is well defined and unique.

The proof uses the OPT.
The proof uses the {prf:ref}`opt`.

```{prf:theorem}

**Theorem** The unique minimizer of $\| y - X b \|$ over $b \in \mathbb R^K$ is
The unique minimizer of $\| y - X b \|$ over $b \in \mathbb R^K$ is

$$
\hat \beta := (X' X)^{-1} X' y
$$
```

Proof: Note that
```{prf:proof} Note that

$$
X \hat \beta = X (X' X)^{-1} X' y =
Expand All @@ -458,6 +479,7 @@ $$
$$

This is what we aimed to show.
```

## Least Squares Regression

Expand Down Expand Up @@ -594,9 +616,9 @@ Here are some more standard definitions:

> TSS = ESS + SSR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more commonly seen definitions of (centered) TSS, ESS and SSR are given here where the decomposition only holds when $X$ includes intercept term.

So, I think it might be clearer if we replace

Here are some more standard definitions:

* The **total sum of squares** is $:=  \| y \|^2$.
* The **sum of squared residuals** is $:= \| \hat u \|^2$.
* The **explained sum of squares** is $:= \| \hat y \|^2$.

> TSS = ESS + SSR

with

Define:

* The (uncentered) **total sum of squares** (TSS) is $:=  \| y \|^2$.
* The (uncentered) **sum of squared residuals** (SSR) is $:= \| \hat u \|^2$.
* The **explained sum of squares** (ESS) is $:= \| \hat y \|^2$.

 We have the relationship:

$$
\text{TSS} = \text{ESS} + \text{SSR}
$$

```{note}
For the centered case, see [here](https://en.wikipedia.org/wiki/Explained_sum_of_squares).
```

Please let me know your thoughts on this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Humphrey @HumphreyYang ,

Thank you for your review.
Maybe we can put this part into a separable issue and discuss with John first.

Best,
Longye

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HumphreyYang I am not familiar with the differences between centred and uncentred. In most econometrics books I have read it focuses on ESS, TSS etc. without specifying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think

$$
\text{TSS} = \text{ESS} + \text{SSR}
$$

would be a nice tidy up addition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmcky , I'll update this part to tidy up


We can prove this easily using the OPT.
We can prove this easily using the {prf:ref}`opt`.

From the OPT we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$.
From the {prf:ref}`opt` we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$.

Applying the Pythagorean law completes the proof.

Expand All @@ -611,7 +633,9 @@ The next section gives details.
(gram_schmidt)=
### Gram-Schmidt Orthogonalization

**Theorem** For each linearly independent set $\{x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an
```{prf:theorem}

For each linearly independent set $\{x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an
orthonormal set $\{u_1, \ldots, u_k\}$ with

$$
Expand All @@ -620,6 +644,7 @@ $$
\quad \text{for} \quad
i = 1, \ldots, k
$$
```

The **Gram-Schmidt orthogonalization** procedure constructs an orthogonal set $\{ u_1, u_2, \ldots, u_n\}$.

Expand All @@ -639,16 +664,19 @@ In some exercises below, you are asked to implement this algorithm and test it u

The following result uses the preceding algorithm to produce a useful decomposition.

**Theorem** If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where
```{prf:theorem}

If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where

* $R$ is $k \times k$, upper triangular, and nonsingular
* $Q$ is $n \times k$ with orthonormal columns
```

Proof sketch: Let
```{prf:proof} Let

* $x_j := \col_j (X)$
* $x_j := \mathop{\mathrm{col}}_j (X)$
* $\{u_1, \ldots, u_k\}$ be orthonormal with the same span as $\{x_1, \ldots, x_k\}$ (to be constructed using Gram--Schmidt)
* $Q$ be formed from cols $u_i$
* $Q$ be formed from columns $u_i$

Since $x_j \in \mathop{\mathrm{span}}\{u_1, \ldots, u_j\}$, we have

Expand All @@ -658,6 +686,7 @@ x_j = \sum_{i=1}^j \langle u_i, x_j \rangle u_i
$$

Some rearranging gives $X = Q R$.
```

### Linear Regression via QR Decomposition

Expand Down Expand Up @@ -788,7 +817,7 @@ def gram_schmidt(X):
U = np.empty((n, k))
I = np.eye(n)

# The first col of U is just the normalized first col of X
# The first columns of U is just the normalized first columns of X
v1 = X[:,0]
U[:, 0] = v1 / np.sqrt(np.sum(v1 * v1))

Expand All @@ -797,7 +826,7 @@ def gram_schmidt(X):
b = X[:, i] # The vector we're going to project
Z = X[:, 0:i] # First i-1 columns of X

# Project onto the orthogonal complement of the col span of Z
# Project onto the orthogonal complement of the columns span of Z
M = I - Z @ np.linalg.inv(Z.T @ Z) @ Z.T
u = M @ b

Expand Down
Loading