-
-
Notifications
You must be signed in to change notification settings - Fork 24
[org_proj] review and update #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 13 commits
e91ad0e
38fe9d8
f6f176c
f464177
6be0e65
66ec069
f536db9
4f49529
003f4ad
9fd82d1
ba60b3c
e140130
7dda37b
7d89732
57b8fc7
157d32d
8c72e6c
850cfbd
25a1c97
cbd6046
73ba54b
c401340
0be6746
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -60,7 +60,7 @@ For an advanced treatment of projection in the context of least squares predicti | |
|
|
||
| ## Key Definitions | ||
|
|
||
| Assume $x, z \in \mathbb R^n$. | ||
| Assume $x, z \in \mathbb R^n$. | ||
|
|
||
| Define $\langle x, z\rangle = \sum_i x_i z_i$. | ||
|
|
||
|
|
@@ -86,7 +86,7 @@ The **orthogonal complement** of linear subspace $S \subset \mathbb R^n$ is the | |
|
|
||
| ``` | ||
|
|
||
| $S^\perp$ is a linear subspace of $\mathbb R^n$ | ||
| $S^\perp$ is a linear subspace of $\mathbb R^n$ | ||
|
|
||
| * To see this, fix $x, y \in S^{\perp}$ and $\alpha, \beta \in \mathbb R$. | ||
| * Observe that if $z \in S$, then | ||
|
|
@@ -131,7 +131,10 @@ What vector within a linear subspace of $\mathbb R^n$ best approximates a given | |
|
|
||
| The next theorem answers this question. | ||
|
|
||
| **Theorem** (OPT) Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$, | ||
| ```{prf:theorem} Orthogonal Projection Theorem | ||
| :label: opt | ||
|
|
||
| Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$, | ||
| there exists a unique solution to the minimization problem | ||
|
|
||
| $$ | ||
|
|
@@ -144,6 +147,7 @@ The minimizer $\hat y$ is the unique vector in $\mathbb R^n$ that satisfies | |
| * $y - \hat y \perp S$ | ||
|
|
||
| The vector $\hat y$ is called the **orthogonal projection** of $y$ onto $S$. | ||
| ``` | ||
|
|
||
| The next figure provides some intuition | ||
|
|
||
|
|
@@ -179,7 +183,7 @@ $$ | |
| y \in Y\; \mapsto \text{ its orthogonal projection } \hat y \in S | ||
| $$ | ||
|
|
||
| By the OPT, this is a well-defined mapping or *operator* from $\mathbb R^n$ to $\mathbb R^n$. | ||
| By the {prf:ref}`opt`, this is a well-defined mapping or *operator* from $\mathbb R^n$ to $\mathbb R^n$. | ||
|
|
||
| In what follows we denote this operator by a matrix $P$ | ||
|
|
||
|
|
@@ -192,7 +196,7 @@ The operator $P$ is called the **orthogonal projection mapping onto** $S$. | |
|
|
||
| ``` | ||
|
|
||
| It is immediate from the OPT that for any $y \in \mathbb R^n$ | ||
| It is immediate from the {prf:ref}`opt` that for any $y \in \mathbb R^n$ | ||
|
|
||
| 1. $P y \in S$ and | ||
| 1. $y - P y \perp S$ | ||
|
|
@@ -224,16 +228,20 @@ such that $y = x_1 + x_2$. | |
|
|
||
| Moreover, $x_1 = \hat E_S y$ and $x_2 = y - \hat E_S y$. | ||
|
|
||
| This amounts to another version of the OPT: | ||
| This amounts to another version of the {prf:ref}`opt`: | ||
|
|
||
| ```{prf:theorem} Orthogonal Projection Theorem (another version) | ||
| :label: opt_another | ||
|
|
||
| **Theorem**. If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then | ||
| If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then | ||
|
|
||
| $$ | ||
| P y \perp M y | ||
| \quad \text{and} \quad | ||
| y = P y + M y | ||
| \quad \text{for all } \, y \in \mathbb R^n | ||
| $$ | ||
| ``` | ||
|
|
||
| The next figure illustrates | ||
|
|
||
|
|
@@ -285,7 +293,9 @@ Combining this result with {eq}`pob` verifies the claim. | |
|
|
||
| When a subspace onto which we project is orthonormal, computing the projection simplifies: | ||
|
|
||
| **Theorem** If $\{u_1, \ldots, u_k\}$ is an orthonormal basis for $S$, then | ||
| ```{prf:theorem} | ||
|
|
||
| If $\{u_1, \ldots, u_k\}$ is an orthonormal basis for $S$, then | ||
|
|
||
| ```{math} | ||
| :label: exp_for_op | ||
|
|
@@ -294,14 +304,15 @@ P y = \sum_{i=1}^k \langle y, u_i \rangle u_i, | |
| \quad | ||
| \forall \; y \in \mathbb R^n | ||
| ``` | ||
| ``` | ||
|
|
||
| Proof: Fix $y \in \mathbb R^n$ and let $P y$ be defined as in {eq}`exp_for_op`. | ||
| ```{prf:proof} Fix $y \in \mathbb{R}^n$ and let $P y$ be defined as in {eq}`exp_for_op`. | ||
|
|
||
| Clearly, $P y \in S$. | ||
|
|
||
| We claim that $y - P y \perp S$ also holds. | ||
|
|
||
| It sufficies to show that $y - P y \perp$ any basis vector $u_i$. | ||
| It suffices to show that $y - P y \perp u_i$ for any basis vector $u_i$. | ||
|
|
||
| This is true because | ||
|
|
||
|
|
@@ -310,9 +321,11 @@ $$ | |
| = \langle y, u_j \rangle - \sum_{i=1}^k \langle y, u_i \rangle | ||
| \langle u_i, u_j \rangle = 0 | ||
| $$ | ||
| ``` | ||
|
|
||
| (Why is this sufficient to establish the claim that $y - P y \perp S$?) | ||
|
|
||
|
|
||
| ## Projection Via Matrix Algebra | ||
|
|
||
| Let $S$ be a linear subspace of $\mathbb R^n$ and let $y \in \mathbb R^n$. | ||
|
|
@@ -323,17 +336,21 @@ $$ | |
| \hat E_S y = P y | ||
| $$ | ||
|
|
||
| Evidently $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathbb R^n$. | ||
| Evidently $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathbb R^n$. | ||
|
|
||
| [This reference](https://en.wikipedia.org/wiki/Linear_map#Matrices) is useful. | ||
|
|
||
| **Theorem.** Let the columns of $n \times k$ matrix $X$ form a basis of $S$. Then | ||
| ```{prf:theorem} | ||
| :label: proj_matrix | ||
|
|
||
| Let the columns of $n \times k$ matrix $X$ form a basis of $S$. Then | ||
|
|
||
| $$ | ||
| P = X (X'X)^{-1} X' | ||
| $$ | ||
| ``` | ||
|
|
||
| Proof: Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that | ||
| ```{prf:proof} Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that | ||
mmcky marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| 1. $P y \in S$, and | ||
| 2. $y - P y \perp S$ | ||
|
|
@@ -367,18 +384,19 @@ y] | |
| $$ | ||
|
|
||
| The proof is now complete. | ||
| ``` | ||
|
|
||
| ### Starting with the Basis | ||
|
|
||
| It is common in applications to start with $n \times k$ matrix $X$ with linearly independent columns and let | ||
|
|
||
| $$ | ||
| S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\col_1 X, \ldots, \col_k X \} | ||
| S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\mathop{\mathrm{col}}_1 X, \ldots, \mathop{\mathrm{col}}_k X \} | ||
| $$ | ||
|
|
||
| Then the columns of $X$ form a basis of $S$. | ||
|
|
||
| From the preceding theorem, $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$. | ||
| From the {prf:ref}`proj_matrix`, $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$. | ||
|
|
||
| In this context, $P$ is often called the **projection matrix** | ||
|
|
||
|
|
@@ -388,7 +406,7 @@ In this context, $P$ is often called the **projection matrix** | |
|
|
||
| Suppose that $U$ is $n \times k$ with orthonormal columns. | ||
|
|
||
| Let $u_i := \mathop{\mathrm{col}} U_i$ for each $i$, let $S := \mathop{\mathrm{span}} U$ and let $y \in \mathbb R^n$. | ||
| Let $u_i := \mathop{\mathrm{col}}_i U$ for each $i$, let $S := \mathop{\mathrm{span}} U$ and let $y \in \mathbb R^n$. | ||
|
|
||
| We know that the projection of $y$ onto $S$ is | ||
|
|
||
|
|
@@ -415,7 +433,7 @@ Let $y \in \mathbb R^n$ and let $X$ be $n \times k$ with linearly independent co | |
|
|
||
| Given $X$ and $y$, we seek $b \in \mathbb R^k$ that satisfies the system of linear equations $X b = y$. | ||
|
|
||
| If $n > k$ (more equations than unknowns), then $b$ is said to be **overdetermined**. | ||
| If $n > k$ (more equations than unknowns), then the system is said to be **overdetermined**. | ||
|
|
||
| Intuitively, we may not be able to find a $b$ that satisfies all $n$ equations. | ||
|
|
||
|
|
@@ -428,15 +446,18 @@ By approximate solution, we mean a $b \in \mathbb R^k$ such that $X b$ is close | |
|
|
||
| The next theorem shows that a best approximation is well defined and unique. | ||
|
|
||
| The proof uses the OPT. | ||
| The proof uses the {prf:ref}`opt`. | ||
|
|
||
| ```{prf:theorem} | ||
|
|
||
| **Theorem** The unique minimizer of $\| y - X b \|$ over $b \in \mathbb R^K$ is | ||
| The unique minimizer of $\| y - X b \|$ over $b \in \mathbb R^k$ is | ||
|
|
||
| $$ | ||
| \hat \beta := (X' X)^{-1} X' y | ||
| $$ | ||
| ``` | ||
|
|
||
| Proof: Note that | ||
| ```{prf:proof} Note that | ||
mmcky marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| $$ | ||
| X \hat \beta = X (X' X)^{-1} X' y = | ||
|
|
@@ -454,16 +475,17 @@ Because $Xb \in \mathop{\mathrm{span}}(X)$ | |
|
|
||
| $$ | ||
| \| y - X \hat \beta \| | ||
| \leq \| y - X b \| \text{ for any } b \in \mathbb R^K | ||
| \leq \| y - X b \| \text{ for any } b \in \mathbb R^k | ||
| $$ | ||
|
|
||
| This is what we aimed to show. | ||
| ``` | ||
|
|
||
| ## Least Squares Regression | ||
|
|
||
| Let's apply the theory of orthogonal projection to least squares regression. | ||
|
|
||
| This approach provides insights about many geometric properties of linear regression. | ||
| This approach provides insights about many geometric properties of linear regression. | ||
|
|
||
| We treat only some examples. | ||
|
|
||
|
|
@@ -594,9 +616,9 @@ Here are some more standard definitions: | |
|
|
||
| > TSS = ESS + SSR | ||
|
||
|
|
||
| We can prove this easily using the OPT. | ||
| We can prove this easily using the {prf:ref}`opt`. | ||
|
|
||
| From the OPT we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$. | ||
| From the {prf:ref}`opt` we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$. | ||
|
|
||
| Applying the Pythagorean law completes the proof. | ||
|
|
||
|
|
@@ -611,7 +633,9 @@ The next section gives details. | |
| (gram_schmidt)= | ||
| ### Gram-Schmidt Orthogonalization | ||
|
|
||
| **Theorem** For each linearly independent set $\{x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an | ||
| ```{prf:theorem} | ||
|
|
||
| For each linearly independent set $\{x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an | ||
| orthonormal set $\{u_1, \ldots, u_k\}$ with | ||
|
|
||
| $$ | ||
|
|
@@ -620,6 +644,7 @@ $$ | |
| \quad \text{for} \quad | ||
| i = 1, \ldots, k | ||
| $$ | ||
| ``` | ||
|
|
||
| The **Gram-Schmidt orthogonalization** procedure constructs an orthogonal set $\{ u_1, u_2, \ldots, u_n\}$. | ||
|
|
||
|
|
@@ -639,16 +664,19 @@ In some exercises below, you are asked to implement this algorithm and test it u | |
|
|
||
| The following result uses the preceding algorithm to produce a useful decomposition. | ||
|
|
||
| **Theorem** If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where | ||
| ```{prf:theorem} | ||
|
|
||
| If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where | ||
|
|
||
| * $R$ is $k \times k$, upper triangular, and nonsingular | ||
| * $Q$ is $n \times k$ with orthonormal columns | ||
| ``` | ||
|
|
||
| Proof sketch: Let | ||
| ```{prf:proof} Let | ||
|
|
||
| * $x_j := \col_j (X)$ | ||
| * $x_j := \mathop{\mathrm{col}}_j (X)$ | ||
| * $\{u_1, \ldots, u_k\}$ be orthonormal with the same span as $\{x_1, \ldots, x_k\}$ (to be constructed using Gram--Schmidt) | ||
| * $Q$ be formed from cols $u_i$ | ||
| * $Q$ be formed from columns $u_i$ | ||
|
|
||
| Since $x_j \in \mathop{\mathrm{span}}\{u_1, \ldots, u_j\}$, we have | ||
|
|
||
|
|
@@ -658,6 +686,7 @@ x_j = \sum_{i=1}^j \langle u_i, x_j \rangle u_i | |
| $$ | ||
|
|
||
| Some rearranging gives $X = Q R$. | ||
| ``` | ||
|
|
||
| ### Linear Regression via QR Decomposition | ||
|
|
||
|
|
@@ -671,11 +700,12 @@ $$ | |
| \hat \beta | ||
| & = (R'Q' Q R)^{-1} R' Q' y \\ | ||
| & = (R' R)^{-1} R' Q' y \\ | ||
| & = R^{-1} (R')^{-1} R' Q' y | ||
| = R^{-1} Q' y | ||
| & = R^{-1} Q' y | ||
| \end{aligned} | ||
| $$ | ||
|
|
||
| where the last step uses the fact that $(R' R)^{-1} R' = R^{-1}$ since $R$ is nonsingular. | ||
|
|
||
| Numerical routines would in this case use the alternative form $R \hat \beta = Q' y$ and back substitution. | ||
|
|
||
| ## Exercises | ||
|
|
@@ -788,16 +818,16 @@ def gram_schmidt(X): | |
| U = np.empty((n, k)) | ||
| I = np.eye(n) | ||
|
|
||
| # The first col of U is just the normalized first col of X | ||
| v1 = X[:,0] | ||
| U[:, 0] = v1 / np.sqrt(v1 @ v1) | ||
| # The first column of U is just the normalized first column of X | ||
| v1 = X[:, 0] | ||
| U[:, 0] = v1 / np.sqrt(np.sum(v1 * v1)) | ||
|
|
||
| for i in range(1, k): | ||
| # Set up | ||
| b = X[:, i] # The vector we're going to project | ||
| Z = X[:, 0:i] # First i-1 columns of X | ||
| Z = X[:, :i] # First i-1 columns of X | ||
|
|
||
| # Project onto the orthogonal complement of the col span of Z | ||
| # Project onto the orthogonal complement of the columns span of Z | ||
HumphreyYang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| M = I - Z @ np.linalg.inv(Z.T @ Z) @ Z.T | ||
| u = M @ b | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.