QuantEcon · mmcky · Aug 22, 2025 · Aug 2, 2025 · Aug 2, 2025 · Aug 2, 2025
diff --git a/lectures/_static/lecture_specific/orth_proj/orth_proj_thm2.tex b/lectures/_static/lecture_specific/orth_proj/orth_proj_thm2.tex
@@ -3,9 +3,11 @@
 \usetikzlibrary{arrows.meta, arrows}
 \begin{document}
 
-%.. tikz:: 
 \begin{tikzpicture}
-[scale=5, axis/.style={<->, >=stealth'}, important line/.style={thick}, dotted line/.style={dotted, thick,red}, dashed line/.style={dashed, thin}, every node/.style={color=black}] \coordinate(O) at (0,0);
+[scale=5, axis/.style={<->, >=stealth'}, important line/.style={thick}, 
+dotted line/.style={dotted, thick,red}, dashed line/.style={dashed, thin}, 
+every node/.style={color=black}] 
+    \coordinate(O) at (0,0);
     \coordinate (y') at (-0.4,0.1);
     \coordinate (Py) at (0.6,0.3);
     \coordinate (y) at (0.4,0.7);
@@ -14,11 +16,12 @@
     \coordinate (Py') at (-0.28,-0.14);
     \draw[axis] (-0.5,0)  -- (0.9,0) node(xline)[right] {};
     \draw[axis] (0,-0.3) -- (0,0.7) node(yline)[above] {};
+    \draw[important line, thick]  (Z1) -- (O);
+    \draw[important line, thick]  (Py) -- (Z2) node[right] {$S$};
     \draw[important line,blue,thick, ->]  (O) -- (Py) node[anchor = north west, text width=2em] {$P y$};
     \draw[important line,blue, ->]  (O) -- (y') node[left] {$y'$};
-    \draw[important line, thick]  (Z1) -- (O) node[right] {};
-    \draw[important line, thick]  (Py) -- (Z2) node[right] {$S$};
     \draw[important line, blue,->]  (O) -- (y) node[right] {$y$};
+    \draw[important line,blue,thick, ->]  (O) -- (Py');
     \draw[dotted line] (0.54,0.27) -- (0.51,0.33);
     \draw[dotted line] (0.57,0.36) -- (0.51,0.33);
     \draw[dotted line] (-0.22,-0.11) -- (-0.25,-0.05);

diff --git a/lectures/orth_proj.md b/lectures/orth_proj.md
@@ -131,7 +131,10 @@ What vector within a linear subspace of $\mathbb R^n$  best approximates a given
 
 The next theorem answers this question.
 
-**Theorem** (OPT) Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$,
+```{prf:theorem} Orthogonal Projection Theorem
+:label: opt
+
+Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$,
 there exists a unique solution to the minimization problem
 
 $$
@@ -144,6 +147,7 @@ The minimizer $\hat y$ is the unique vector in $\mathbb R^n$ that satisfies
 * $y - \hat y \perp S$
 
 The vector $\hat y$ is called the **orthogonal projection** of $y$ onto $S$.
+```
 
 The next figure provides some intuition
 
@@ -179,7 +183,7 @@ $$
 y \in Y\; \mapsto \text{ its orthogonal projection } \hat y \in S
 $$
 
-By the OPT, this is a well-defined mapping  or *operator* from $\mathbb R^n$ to $\mathbb R^n$.
+By the {prf:ref}`opt`, this is a well-defined mapping  or *operator* from $\mathbb R^n$ to $\mathbb R^n$.
 
 In what follows we denote this operator by a matrix $P$
 
@@ -192,7 +196,7 @@ The operator $P$ is called the **orthogonal projection mapping onto** $S$.
 
 ```
 
-It is immediate from the OPT that for any $y \in \mathbb R^n$
+It is immediate from the {prf:ref}`opt` that for any $y \in \mathbb R^n$
 
 1. $P y \in S$ and
 1. $y - P y \perp S$
@@ -224,16 +228,20 @@ such that $y = x_1 + x_2$.
 
 Moreover, $x_1 = \hat E_S y$ and $x_2 = y - \hat E_S y$.
 
-This amounts to another version of the OPT:
+This amounts to another version of the {prf:ref}`opt`:
+
+```{prf:theorem} Orthogonal Projection Theorem (another version)
+:label: opt_another
 
-**Theorem**.  If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then
+If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then
 
 $$
 P y \perp M y
  \quad \text{and} \quad
 y = P y + M y
  \quad \text{for all } \, y \in \mathbb R^n
 $$
+```
 
 The next figure illustrates
 
@@ -285,7 +293,9 @@ Combining this result with {eq}`pob` verifies the claim.
 
 When a subspace onto which we project is orthonormal, computing the projection simplifies:
 
-**Theorem** If $\{u_1, \ldots, u_k\}$ is an orthonormal basis for $S$, then
+```{prf:theorem} 
+
+If $\{u_1, \ldots, u_k\}$ is an orthonormal basis for $S$, then
 
 ```{math}
 :label: exp_for_op
@@ -294,14 +304,15 @@ P y = \sum_{i=1}^k \langle y, u_i \rangle u_i,
 \quad
 \forall \; y \in \mathbb R^n
 ```
+```
 
-Proof: Fix $y \in \mathbb R^n$ and let $P y$ be  defined as in {eq}`exp_for_op`.
+```{prf:proof}  Fix $y \in \mathbb{R}^n$ and let $P y$ be  defined as in {eq}`exp_for_op`.
 
 Clearly, $P y \in S$.
 
 We claim that $y - P y \perp S$ also holds.
 
-It sufficies to show that $y - P y \perp$ any basis vector $u_i$.
+It suffices to show that $y - P y \perp$ any basis vector $u_i$.
 
 This is true because
 
@@ -310,9 +321,11 @@ $$
 = \langle y, u_j \rangle  - \sum_{i=1}^k \langle y, u_i \rangle
 \langle u_i, u_j  \rangle = 0
 $$
+```
 
 (Why is this sufficient to establish the claim that $y - P y \perp S$?)
 
+
 ## Projection Via Matrix Algebra
 
 Let $S$ be a linear subspace of $\mathbb R^n$ and  let $y \in \mathbb R^n$.
@@ -327,13 +340,17 @@ Evidently  $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathb
 
 [This reference](https://en.wikipedia.org/wiki/Linear_map#Matrices) is useful.
 
-**Theorem.** Let the columns of $n \times k$ matrix $X$ form a basis of $S$.  Then
+```{prf:theorem} 
+:label: proj_matrix
+
+Let the columns of $n \times k$ matrix $X$ form a basis of $S$.  Then
 
 $$
 P = X (X'X)^{-1} X'
 $$
+```
 
-Proof: Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that
+```{prf:proof} Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that
 
 1. $P y \in S$, and
 2. $y - P y \perp S$
@@ -367,18 +384,19 @@ y]
 $$
 
 The proof is now complete.
+```
 
 ### Starting with the Basis
 
 It is common in applications to start with $n \times k$ matrix $X$  with linearly independent columns and let
 
 $$
-S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\col_1 X, \ldots, \col_k X \}
+S := \mathop{\mathrm{span}} X := \mathop{\mathrm{span}} \{\mathop{\mathrm{col}}_i X, \ldots, \mathop{\mathrm{col}}_k X \}
 $$
 
 Then the columns of $X$ form a basis of $S$.
 
-From the preceding theorem, $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$.
+From the {prf:ref}`proj_matrix`, $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$.
 
 In this context, $P$ is often called the **projection matrix**
 
@@ -388,7 +406,7 @@ In this context, $P$ is often called the **projection matrix**
 
 Suppose that $U$ is $n \times k$ with orthonormal columns.
 
-Let $u_i := \mathop{\mathrm{col}} U_i$ for each $i$, let $S := \mathop{\mathrm{span}} U$ and let $y \in \mathbb R^n$.
+Let $u_i := \mathop{\mathrm{col}}_i U$ for each $i$, let $S := \mathop{\mathrm{span}} U$ and let $y \in \mathbb R^n$.
 
 We know that the projection of $y$ onto $S$ is
 
@@ -428,15 +446,18 @@ By approximate solution, we mean a $b \in \mathbb R^k$ such that $X b$ is  close
 
 The next theorem shows that a best approximation is well defined and unique.
 
-The proof uses the OPT.
+The proof uses the {prf:ref}`opt`.
+
+```{prf:theorem} 
 
-**Theorem** The unique minimizer of  $\| y - X b \|$ over $b \in \mathbb R^K$ is
+The unique minimizer of  $\| y - X b \|$ over $b \in \mathbb R^K$ is
 
 $$
 \hat \beta := (X' X)^{-1} X' y
 $$
+```
 
-Proof:  Note that
+```{prf:proof}  Note that
 
 $$
 X \hat \beta = X (X' X)^{-1} X' y =
@@ -458,6 +479,7 @@ $$
 $$
 
 This is what we aimed to show.
+```
 
 ## Least Squares Regression
 
@@ -594,9 +616,9 @@ Here are some more standard definitions:
 
 > TSS = ESS + SSR
 
-We can prove this easily using the OPT.
+We can prove this easily using the {prf:ref}`opt`.
 
-From the OPT we have $y =  \hat y + \hat u$ and $\hat u \perp \hat y$.
+From the {prf:ref}`opt` we have $y =  \hat y + \hat u$ and $\hat u \perp \hat y$.
 
 Applying the Pythagorean law completes the proof.
 
@@ -611,7 +633,9 @@ The next section gives details.
 (gram_schmidt)=
 ### Gram-Schmidt Orthogonalization
 
-**Theorem** For each linearly independent set $\{x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an
+```{prf:theorem} 
+
+For each linearly independent set $\{x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an
 orthonormal set $\{u_1, \ldots, u_k\}$ with
 
 $$
@@ -620,6 +644,7 @@ $$
 \quad \text{for} \quad
 i = 1, \ldots, k
 $$
+```
 
 The **Gram-Schmidt orthogonalization** procedure constructs an orthogonal set $\{ u_1, u_2, \ldots, u_n\}$.
 
@@ -639,16 +664,19 @@ In some exercises below, you are asked to implement this algorithm and test it u
 
 The following result uses the preceding algorithm to produce a useful decomposition.
 
-**Theorem** If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where
+```{prf:theorem} 
+
+If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where
 
 * $R$ is $k \times k$, upper triangular, and nonsingular
 * $Q$ is $n \times k$ with orthonormal columns
+```
 
-Proof sketch: Let
+```{prf:proof} Let
 
-* $x_j := \col_j (X)$
+* $x_j := \mathop{\mathrm{col}}_j (X)$
 * $\{u_1, \ldots, u_k\}$ be orthonormal with the same span as $\{x_1, \ldots, x_k\}$ (to be constructed using Gram--Schmidt)
-* $Q$ be formed from cols $u_i$
+* $Q$ be formed from columns $u_i$
 
 Since $x_j \in \mathop{\mathrm{span}}\{u_1, \ldots, u_j\}$, we have
 
@@ -658,6 +686,7 @@ x_j = \sum_{i=1}^j \langle u_i, x_j  \rangle u_i
 $$
 
 Some rearranging gives $X = Q R$.
+```
 
 ### Linear Regression via QR Decomposition
 
@@ -788,7 +817,7 @@ def gram_schmidt(X):
     U = np.empty((n, k))
     I = np.eye(n)
 
-    # The first col of U is just the normalized first col of X
+    # The first columns of U is just the normalized first columns of X
     v1 = X[:,0]
     U[:, 0] = v1 / np.sqrt(np.sum(v1 * v1))
 
@@ -797,7 +826,7 @@ def gram_schmidt(X):
         b = X[:, i]       # The vector we're going to project
         Z = X[:, 0:i]     # First i-1 columns of X
 
-        # Project onto the orthogonal complement of the col span of Z
+        # Project onto the orthogonal complement of the columns span of Z
         M = I - Z @ np.linalg.inv(Z.T @ Z) @ Z.T
         u = M @ b