-
-
Notifications
You must be signed in to change notification settings - Fork 24
[org_proj] review and update #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
e91ad0e
38fe9d8
f6f176c
f464177
6be0e65
66ec069
f536db9
4f49529
003f4ad
9fd82d1
ba60b3c
e140130
7dda37b
7d89732
57b8fc7
157d32d
8c72e6c
850cfbd
25a1c97
cbd6046
73ba54b
c401340
0be6746
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -131,7 +131,10 @@ What vector within a linear subspace of $\mathbb R^n$ best approximates a given | |
|
|
||
| The next theorem answers this question. | ||
|
|
||
| **Theorem** (OPT) Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$, | ||
| ```{prf:theorem} Orthogonal Projection Theorem | ||
| :label: opt | ||
|
|
||
| Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$, | ||
| there exists a unique solution to the minimization problem | ||
|
|
||
| $$ | ||
|
|
@@ -144,6 +147,7 @@ The minimizer $\hat y$ is the unique vector in $\mathbb R^n$ that satisfies | |
| * $y - \hat y \perp S$ | ||
|
|
||
| The vector $\hat y$ is called the **orthogonal projection** of $y$ onto $S$. | ||
| ``` | ||
|
|
||
| The next figure provides some intuition | ||
|
|
||
|
|
@@ -179,7 +183,7 @@ $$ | |
| y \in Y\; \mapsto \text{ its orthogonal projection } \hat y \in S | ||
| $$ | ||
|
|
||
| By the OPT, this is a well-defined mapping or *operator* from $\mathbb R^n$ to $\mathbb R^n$. | ||
| By the {prf:ref}`opt`, this is a well-defined mapping or *operator* from $\mathbb R^n$ to $\mathbb R^n$. | ||
|
|
||
| In what follows we denote this operator by a matrix $P$ | ||
|
|
||
|
|
@@ -192,7 +196,7 @@ The operator $P$ is called the **orthogonal projection mapping onto** $S$. | |
|
|
||
| ``` | ||
|
|
||
| It is immediate from the OPT that for any $y \in \mathbb R^n$ | ||
| It is immediate from the {prf:ref}`opt` that for any $y \in \mathbb R^n$ | ||
|
|
||
| 1. $P y \in S$ and | ||
| 1. $y - P y \perp S$ | ||
|
|
@@ -224,16 +228,20 @@ such that $y = x_1 + x_2$. | |
|
|
||
| Moreover, $x_1 = \hat E_S y$ and $x_2 = y - \hat E_S y$. | ||
|
|
||
| This amounts to another version of the OPT: | ||
| This amounts to another version of the {prf:ref}`opt`: | ||
|
|
||
| ```{prf:theorem} Orthogonal Projection Theorem (another version) | ||
| :label: opt_another | ||
|
|
||
| **Theorem**. If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then | ||
| If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then | ||
|
|
||
| $$ | ||
| P y \perp M y | ||
| \quad \text{and} \quad | ||
| y = P y + M y | ||
| \quad \text{for all } \, y \in \mathbb R^n | ||
| $$ | ||
| ``` | ||
|
|
||
| The next figure illustrates | ||
|
|
||
|
|
@@ -285,7 +293,9 @@ Combining this result with {eq}`pob` verifies the claim. | |
|
|
||
| When a subspace onto which we project is orthonormal, computing the projection simplifies: | ||
|
|
||
| **Theorem** If $\{u_1, \ldots, u_k\}$ is an orthonormal basis for $S$, then | ||
| ```{prf:theorem} | ||
|
|
||
| If $\{u_1, \ldots, u_k\}$ is an orthonormal basis for $S$, then | ||
|
|
||
| ```{math} | ||
| :label: exp_for_op | ||
|
|
@@ -294,14 +304,15 @@ P y = \sum_{i=1}^k \langle y, u_i \rangle u_i, | |
| \quad | ||
| \forall \; y \in \mathbb R^n | ||
| ``` | ||
| ``` | ||
|
|
||
| Proof: Fix $y \in \mathbb R^n$ and let $P y$ be defined as in {eq}`exp_for_op`. | ||
| ```{prf:proof} Fix $y \in \mathbb{R}^n$ and let $P y$ be defined as in {eq}`exp_for_op`. | ||
longye-tian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Clearly, $P y \in S$. | ||
|
|
||
| We claim that $y - P y \perp S$ also holds. | ||
|
|
||
| It sufficies to show that $y - P y \perp$ any basis vector $u_i$. | ||
| It suffices to show that $y - P y \perp$ any basis vector $u_i$. | ||
|
|
||
| This is true because | ||
|
|
||
|
|
@@ -310,9 +321,11 @@ $$ | |
| = \langle y, u_j \rangle - \sum_{i=1}^k \langle y, u_i \rangle | ||
| \langle u_i, u_j \rangle = 0 | ||
| $$ | ||
| ``` | ||
|
|
||
| (Why is this sufficient to establish the claim that $y - P y \perp S$?) | ||
|
|
||
|
|
||
| ## Projection Via Matrix Algebra | ||
|
|
||
| Let $S$ be a linear subspace of $\mathbb R^n$ and let $y \in \mathbb R^n$. | ||
|
|
@@ -327,13 +340,17 @@ Evidently $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathb | |
|
|
||
| [This reference](https://en.wikipedia.org/wiki/Linear_map#Matrices) is useful. | ||
|
|
||
| **Theorem.** Let the columns of $n \times k$ matrix $X$ form a basis of $S$. Then | ||
| ```{prf:theorem} | ||
| :label: proj_matrix | ||
|
|
||
| Let the columns of $n \times k$ matrix $X$ form a basis of $S$. Then | ||
|
|
||
| $$ | ||
| P = X (X'X)^{-1} X' | ||
| $$ | ||
| ``` | ||
|
|
||
| Proof: Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that | ||
| ```{prf:proof} Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that | ||
mmcky marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| 1. $P y \in S$, and | ||
| 2. $y - P y \perp S$ | ||
|
|
@@ -367,18 +384,19 @@ y] | |
| $$ | ||
|
|
||
| The proof is now complete. | ||
| ``` | ||
|
|
||
| ### Starting with the Basis | ||
|
|
||
| It is common in applications to start with $n \times k$ matrix $X$ with linearly independent columns and let | ||
|
|
||
| $$ | ||
| S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\col_1 X, \ldots, \col_k X \} | ||
| S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\mathop{\mathrm{col}}_i X, \ldots, \mathop{\mathrm{col}}_k X \} | ||
HumphreyYang marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| $$ | ||
|
|
||
| Then the columns of $X$ form a basis of $S$. | ||
|
|
||
| From the preceding theorem, $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$. | ||
| From the {prf:ref}`proj_matrix`, $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$. | ||
|
|
||
| In this context, $P$ is often called the **projection matrix** | ||
|
|
||
|
|
@@ -388,7 +406,7 @@ In this context, $P$ is often called the **projection matrix** | |
|
|
||
| Suppose that $U$ is $n \times k$ with orthonormal columns. | ||
|
|
||
| Let $u_i := \mathop{\mathrm{col}} U_i$ for each $i$, let $S := \mathop{\mathrm{span}} U$ and let $y \in \mathbb R^n$. | ||
| Let $u_i := \mathop{\mathrm{col}}_i U$ for each $i$, let $S := \mathop{\mathrm{span}} U$ and let $y \in \mathbb R^n$. | ||
|
|
||
| We know that the projection of $y$ onto $S$ is | ||
|
|
||
|
|
@@ -428,15 +446,18 @@ By approximate solution, we mean a $b \in \mathbb R^k$ such that $X b$ is close | |
|
|
||
| The next theorem shows that a best approximation is well defined and unique. | ||
|
|
||
| The proof uses the OPT. | ||
| The proof uses the {prf:ref}`opt`. | ||
|
|
||
| ```{prf:theorem} | ||
|
|
||
| **Theorem** The unique minimizer of $\| y - X b \|$ over $b \in \mathbb R^K$ is | ||
| The unique minimizer of $\| y - X b \|$ over $b \in \mathbb R^K$ is | ||
|
|
||
| $$ | ||
| \hat \beta := (X' X)^{-1} X' y | ||
| $$ | ||
| ``` | ||
|
|
||
| Proof: Note that | ||
| ```{prf:proof} Note that | ||
mmcky marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| $$ | ||
| X \hat \beta = X (X' X)^{-1} X' y = | ||
|
|
@@ -458,6 +479,7 @@ $$ | |
| $$ | ||
|
|
||
| This is what we aimed to show. | ||
| ``` | ||
|
|
||
| ## Least Squares Regression | ||
|
|
||
|
|
@@ -594,9 +616,9 @@ Here are some more standard definitions: | |
|
|
||
| > TSS = ESS + SSR | ||
|
||
|
|
||
| We can prove this easily using the OPT. | ||
| We can prove this easily using the {prf:ref}`opt`. | ||
|
|
||
| From the OPT we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$. | ||
| From the {prf:ref}`opt` we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$. | ||
|
|
||
| Applying the Pythagorean law completes the proof. | ||
|
|
||
|
|
@@ -611,7 +633,9 @@ The next section gives details. | |
| (gram_schmidt)= | ||
| ### Gram-Schmidt Orthogonalization | ||
|
|
||
| **Theorem** For each linearly independent set $\{x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an | ||
| ```{prf:theorem} | ||
|
|
||
| For each linearly independent set $\{x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an | ||
| orthonormal set $\{u_1, \ldots, u_k\}$ with | ||
|
|
||
| $$ | ||
|
|
@@ -620,6 +644,7 @@ $$ | |
| \quad \text{for} \quad | ||
| i = 1, \ldots, k | ||
| $$ | ||
| ``` | ||
|
|
||
| The **Gram-Schmidt orthogonalization** procedure constructs an orthogonal set $\{ u_1, u_2, \ldots, u_n\}$. | ||
|
|
||
|
|
@@ -639,16 +664,19 @@ In some exercises below, you are asked to implement this algorithm and test it u | |
|
|
||
| The following result uses the preceding algorithm to produce a useful decomposition. | ||
|
|
||
| **Theorem** If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where | ||
| ```{prf:theorem} | ||
|
|
||
| If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where | ||
|
|
||
| * $R$ is $k \times k$, upper triangular, and nonsingular | ||
| * $Q$ is $n \times k$ with orthonormal columns | ||
| ``` | ||
|
|
||
| Proof sketch: Let | ||
| ```{prf:proof} Let | ||
|
|
||
| * $x_j := \col_j (X)$ | ||
| * $x_j := \mathop{\mathrm{col}}_j (X)$ | ||
| * $\{u_1, \ldots, u_k\}$ be orthonormal with the same span as $\{x_1, \ldots, x_k\}$ (to be constructed using Gram--Schmidt) | ||
| * $Q$ be formed from cols $u_i$ | ||
| * $Q$ be formed from columns $u_i$ | ||
|
|
||
| Since $x_j \in \mathop{\mathrm{span}}\{u_1, \ldots, u_j\}$, we have | ||
|
|
||
|
|
@@ -658,6 +686,7 @@ x_j = \sum_{i=1}^j \langle u_i, x_j \rangle u_i | |
| $$ | ||
|
|
||
| Some rearranging gives $X = Q R$. | ||
| ``` | ||
|
|
||
| ### Linear Regression via QR Decomposition | ||
|
|
||
|
|
@@ -788,7 +817,7 @@ def gram_schmidt(X): | |
| U = np.empty((n, k)) | ||
| I = np.eye(n) | ||
|
|
||
| # The first col of U is just the normalized first col of X | ||
| # The first columns of U is just the normalized first columns of X | ||
HumphreyYang marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| v1 = X[:,0] | ||
| U[:, 0] = v1 / np.sqrt(np.sum(v1 * v1)) | ||
|
|
||
|
|
@@ -797,7 +826,7 @@ def gram_schmidt(X): | |
| b = X[:, i] # The vector we're going to project | ||
| Z = X[:, 0:i] # First i-1 columns of X | ||
|
|
||
| # Project onto the orthogonal complement of the col span of Z | ||
| # Project onto the orthogonal complement of the columns span of Z | ||
HumphreyYang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| M = I - Z @ np.linalg.inv(Z.T @ Z) @ Z.T | ||
| u = M @ b | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I built this but it still generates the same old figure, so I updated this line to
following the previous lines which gives the blue arrow!