QuantEcon · mmcky · Aug 22, 2025 · Aug 2, 2025 · Aug 2, 2025 · Aug 2, 2025
diff --git a/lectures/_static/lecture_specific/orth_proj/orth_proj_thm2.pdf b/lectures/_static/lecture_specific/orth_proj/orth_proj_thm2.pdf
diff --git a/lectures/_static/lecture_specific/orth_proj/orth_proj_thm2.png b/lectures/_static/lecture_specific/orth_proj/orth_proj_thm2.png
diff --git a/lectures/_static/lecture_specific/orth_proj/orth_proj_thm2.tex b/lectures/_static/lecture_specific/orth_proj/orth_proj_thm2.tex
@@ -19,12 +19,13 @@
     \draw[important line, thick]  (Z1) -- (O) node[right] {};
     \draw[important line, thick]  (Py) -- (Z2) node[right] {$S$};
     \draw[important line, blue,->]  (O) -- (y) node[right] {$y$};
+    \draw[important line, blue,->]  (O) -- (Py') node[anchor = north west, text width=5em] {$P y'$};
     \draw[dotted line] (0.54,0.27) -- (0.51,0.33);
     \draw[dotted line] (0.57,0.36) -- (0.51,0.33);
     \draw[dotted line] (-0.22,-0.11) -- (-0.25,-0.05);
     \draw[dotted line] (-0.31,-0.08) -- (-0.25,-0.05);
     \draw[dashed line, black] (y) -- (Py);
-    \draw[dashed line, black] (y') -- (Py') node[anchor = north west, text width=5em] {$P y'$};
+    \draw[dashed line, black] (y') -- (Py');
 \end{tikzpicture}
 
 \end{document}
diff --git a/lectures/orth_proj.md b/lectures/orth_proj.md
@@ -60,7 +60,7 @@ For an advanced treatment of projection in the context of least squares predicti
 
 ## Key Definitions
 
-Assume  $x, z \in \mathbb R^n$.
+Assume $x, z \in \mathbb R^n$.
 
 Define $\langle x,  z\rangle = \sum_i x_i z_i$.
 
@@ -86,7 +86,7 @@ The **orthogonal complement** of linear subspace $S \subset \mathbb R^n$ is the
 
 ```
 
-$S^\perp$ is  a linear subspace of $\mathbb R^n$
+$S^\perp$ is a linear subspace of $\mathbb R^n$
 
 * To see this, fix $x, y \in S^{\perp}$ and $\alpha, \beta \in \mathbb R$.
 * Observe that if $z \in S$, then
@@ -131,7 +131,10 @@ What vector within a linear subspace of $\mathbb R^n$  best approximates a given
 
 The next theorem answers this question.
 
-**Theorem** (OPT) Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$,
+```{prf:theorem} Orthogonal Projection Theorem
+:label: opt
+
+Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$,
 there exists a unique solution to the minimization problem
 
 $$
@@ -144,6 +147,7 @@ The minimizer $\hat y$ is the unique vector in $\mathbb R^n$ that satisfies
 * $y - \hat y \perp S$
 
 The vector $\hat y$ is called the **orthogonal projection** of $y$ onto $S$.
+```
 
 The next figure provides some intuition
 
@@ -179,7 +183,7 @@ $$
 y \in Y\; \mapsto \text{ its orthogonal projection } \hat y \in S
 $$
 
-By the OPT, this is a well-defined mapping  or *operator* from $\mathbb R^n$ to $\mathbb R^n$.
+By the {prf:ref}`opt`, this is a well-defined mapping  or *operator* from $\mathbb R^n$ to $\mathbb R^n$.
 
 In what follows we denote this operator by a matrix $P$
 
@@ -192,7 +196,7 @@ The operator $P$ is called the **orthogonal projection mapping onto** $S$.
 
 ```
 
-It is immediate from the OPT that for any $y \in \mathbb R^n$
+It is immediate from the {prf:ref}`opt` that for any $y \in \mathbb R^n$
 
 1. $P y \in S$ and
 1. $y - P y \perp S$
@@ -224,16 +228,20 @@ such that $y = x_1 + x_2$.
 
 Moreover, $x_1 = \hat E_S y$ and $x_2 = y - \hat E_S y$.
 
-This amounts to another version of the OPT:
+This amounts to another version of the {prf:ref}`opt`:
+
+```{prf:theorem} Orthogonal Projection Theorem (another version)
+:label: opt_another
 
-**Theorem**.  If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then
+If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then
 
 $$
 P y \perp M y
  \quad \text{and} \quad
 y = P y + M y
  \quad \text{for all } \, y \in \mathbb R^n
 $$
+```
 
 The next figure illustrates
 
@@ -285,7 +293,9 @@ Combining this result with {eq}`pob` verifies the claim.
 
 When a subspace onto which we project is orthonormal, computing the projection simplifies:
 
-**Theorem** If $\{u_1, \ldots, u_k\}$ is an orthonormal basis for $S$, then
+```{prf:theorem} 
+
+If $\{u_1, \ldots, u_k\}$ is an orthonormal basis for $S$, then
 
 ```{math}
 :label: exp_for_op
@@ -294,14 +304,15 @@ P y = \sum_{i=1}^k \langle y, u_i \rangle u_i,
 \quad
 \forall \; y \in \mathbb R^n
 ```
+```
 
-Proof: Fix $y \in \mathbb R^n$ and let $P y$ be  defined as in {eq}`exp_for_op`.
+```{prf:proof}  Fix $y \in \mathbb{R}^n$ and let $P y$ be  defined as in {eq}`exp_for_op`.
 
 Clearly, $P y \in S$.
 
 We claim that $y - P y \perp S$ also holds.
 
-It sufficies to show that $y - P y \perp$ any basis vector $u_i$.
+It suffices to show that $y - P y \perp u_i$ for any basis vector $u_i$.
 
 This is true because
 
@@ -310,9 +321,11 @@ $$
 = \langle y, u_j \rangle  - \sum_{i=1}^k \langle y, u_i \rangle
 \langle u_i, u_j  \rangle = 0
 $$
+```
 
 (Why is this sufficient to establish the claim that $y - P y \perp S$?)
 
+
 ## Projection Via Matrix Algebra
 
 Let $S$ be a linear subspace of $\mathbb R^n$ and  let $y \in \mathbb R^n$.
@@ -323,17 +336,21 @@ $$
 \hat E_S y = P y
 $$
 
-Evidently  $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathbb R^n$.
+Evidently $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathbb R^n$.
 
 [This reference](https://en.wikipedia.org/wiki/Linear_map#Matrices) is useful.
 
-**Theorem.** Let the columns of $n \times k$ matrix $X$ form a basis of $S$.  Then
+```{prf:theorem} 
+:label: proj_matrix
+
+Let the columns of $n \times k$ matrix $X$ form a basis of $S$.  Then
 
 $$
 P = X (X'X)^{-1} X'
 $$
+```
 
-Proof: Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that
+```{prf:proof} Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that
 
 1. $P y \in S$, and
 2. $y - P y \perp S$
@@ -367,18 +384,19 @@ y]
 $$
 
 The proof is now complete.
+```
 
 ### Starting with the Basis
 
 It is common in applications to start with $n \times k$ matrix $X$  with linearly independent columns and let
 
 $$
-S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\col_1 X, \ldots, \col_k X \}
+S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\mathop{\mathrm{col}}_1 X, \ldots, \mathop{\mathrm{col}}_k X \}
 $$
 
 Then the columns of $X$ form a basis of $S$.
 
-From the preceding theorem, $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$.
+From the {prf:ref}`proj_matrix`, $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$.
 
 In this context, $P$ is often called the **projection matrix**
 
@@ -388,7 +406,7 @@ In this context, $P$ is often called the **projection matrix**
 
 Suppose that $U$ is $n \times k$ with orthonormal columns.
 
-Let $u_i := \mathop{\mathrm{col}} U_i$ for each $i$, let $S := \mathop{\mathrm{span}} U$ and let $y \in \mathbb R^n$.
+Let $u_i := \mathop{\mathrm{col}}_i U$ for each $i$, let $S := \mathop{\mathrm{span}} U$ and let $y \in \mathbb R^n$.
 
 We know that the projection of $y$ onto $S$ is
 
@@ -415,7 +433,7 @@ Let $y \in \mathbb R^n$ and let $X$ be $n \times k$ with linearly independent co
 
 Given $X$ and $y$, we seek $b \in \mathbb R^k$ that  satisfies the system of linear equations $X b = y$.
 
-If $n > k$ (more equations than unknowns), then $b$ is said to be **overdetermined**.
+If $n > k$ (more equations than unknowns), then the system is said to be **overdetermined**.
 
 Intuitively, we may not be able to find a $b$ that satisfies all $n$ equations.
 
@@ -428,15 +446,18 @@ By approximate solution, we mean a $b \in \mathbb R^k$ such that $X b$ is  close
 
 The next theorem shows that a best approximation is well defined and unique.
 
-The proof uses the OPT.
+The proof uses the {prf:ref}`opt`.
+
+```{prf:theorem} 
 
-**Theorem** The unique minimizer of  $\| y - X b \|$ over $b \in \mathbb R^K$ is
+The unique minimizer of  $\| y - X b \|$ over $b \in \mathbb R^k$ is
 
 $$
 \hat \beta := (X' X)^{-1} X' y
 $$
+```
 
-Proof:  Note that
+```{prf:proof}  Note that
 
 $$
 X \hat \beta = X (X' X)^{-1} X' y =
@@ -454,16 +475,17 @@ Because $Xb \in \mathop{\mathrm{span}}(X)$
 
 $$
 \| y - X \hat \beta \|
-\leq \| y - X b \| \text{ for any } b \in \mathbb R^K
+\leq \| y - X b \| \text{ for any } b \in \mathbb R^k
 $$
 
 This is what we aimed to show.
+```
 
 ## Least Squares Regression
 
 Let's apply the theory of orthogonal projection to least squares regression.
 
-This approach provides insights about  many geometric  properties of linear regression.
+This approach provides insights about many geometric properties of linear regression.
 
 We treat only some examples.
 
@@ -594,9 +616,9 @@ Here are some more standard definitions:
 
 > TSS = ESS + SSR
 
-We can prove this easily using the OPT.
+We can prove this easily using the {prf:ref}`opt`.
 
-From the OPT we have $y =  \hat y + \hat u$ and $\hat u \perp \hat y$.
+From the {prf:ref}`opt` we have $y =  \hat y + \hat u$ and $\hat u \perp \hat y$.
 
 Applying the Pythagorean law completes the proof.
 
@@ -611,7 +633,9 @@ The next section gives details.
 (gram_schmidt)=
 ### Gram-Schmidt Orthogonalization
 
-**Theorem** For each linearly independent set $\{x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an
+```{prf:theorem} 
+
+For each linearly independent set $\{x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an
 orthonormal set $\{u_1, \ldots, u_k\}$ with
 
 $$
@@ -620,6 +644,7 @@ $$
 \quad \text{for} \quad
 i = 1, \ldots, k
 $$
+```
 
 The **Gram-Schmidt orthogonalization** procedure constructs an orthogonal set $\{ u_1, u_2, \ldots, u_n\}$.
 
@@ -639,16 +664,19 @@ In some exercises below, you are asked to implement this algorithm and test it u
 
 The following result uses the preceding algorithm to produce a useful decomposition.
 
-**Theorem** If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where
+```{prf:theorem} 
+
+If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where
 
 * $R$ is $k \times k$, upper triangular, and nonsingular
 * $Q$ is $n \times k$ with orthonormal columns
+```
 
-Proof sketch: Let
+```{prf:proof} Let
 
-* $x_j := \col_j (X)$
+* $x_j := \mathop{\mathrm{col}}_j (X)$
 * $\{u_1, \ldots, u_k\}$ be orthonormal with the same span as $\{x_1, \ldots, x_k\}$ (to be constructed using Gram--Schmidt)
-* $Q$ be formed from cols $u_i$
+* $Q$ be formed from columns $u_i$
 
 Since $x_j \in \mathop{\mathrm{span}}\{u_1, \ldots, u_j\}$, we have
 
@@ -658,6 +686,7 @@ x_j = \sum_{i=1}^j \langle u_i, x_j  \rangle u_i
 $$
 
 Some rearranging gives $X = Q R$.
+```
 
 ### Linear Regression via QR Decomposition
 
@@ -671,11 +700,12 @@ $$
     \hat \beta
     & = (R'Q' Q R)^{-1} R' Q' y \\
     & = (R' R)^{-1} R' Q' y \\
-    & = R^{-1} (R')^{-1} R' Q' y
-        = R^{-1} Q' y
+    & = R^{-1} Q' y
 \end{aligned}
 $$
 
+where the last step uses the fact that $(R' R)^{-1} R' = R^{-1}$ since $R$ is nonsingular.
+
 Numerical routines would in this case use the alternative form $R \hat \beta = Q' y$ and back substitution.
 
 ## Exercises
@@ -788,16 +818,16 @@ def gram_schmidt(X):
     U = np.empty((n, k))
     I = np.eye(n)
 
-    # The first col of U is just the normalized first col of X
-    v1 = X[:,0]
-    U[:, 0] = v1 / np.sqrt(v1 @ v1)
+    # The first column of U is just the normalized first column of X
+    v1 = X[:, 0]
+    U[:, 0] = v1 / np.sqrt(np.sum(v1 * v1))
 
     for i in range(1, k):
         # Set up
         b = X[:, i]       # The vector we're going to project
-        Z = X[:, 0:i]     # First i-1 columns of X
+        Z = X[:, :i]      # First i-1 columns of X
 
-        # Project onto the orthogonal complement of the col span of Z
+        # Project onto the orthogonal complement of the columns span of Z
         M = I - Z @ np.linalg.inv(Z.T @ Z) @ Z.T
         u = M @ b