@@ -131,7 +131,10 @@ What vector within a linear subspace of $\mathbb R^n$ best approximates a given
131131
132132The next theorem answers this question.
133133
134- ** Theorem** (OPT) Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$,
134+ ``` {prf:theorem} Orthogonal Projection Theorem
135+ :label: opt
136+
137+ Given $y \in \mathbb R^n$ and linear subspace $S \subset \mathbb R^n$,
135138there exists a unique solution to the minimization problem
136139
137140$$
@@ -144,6 +147,7 @@ The minimizer $\hat y$ is the unique vector in $\mathbb R^n$ that satisfies
144147* $y - \hat y \perp S$
145148
146149The vector $\hat y$ is called the **orthogonal projection** of $y$ onto $S$.
150+ ```
147151
148152The next figure provides some intuition
149153
179183y \in Y\; \mapsto \text{ its orthogonal projection } \hat y \in S
180184$$
181185
182- By the OPT , this is a well-defined mapping or * operator* from $\mathbb R^n$ to $\mathbb R^n$.
186+ By the {prf : ref } ` opt ` , this is a well-defined mapping or * operator* from $\mathbb R^n$ to $\mathbb R^n$.
183187
184188In what follows we denote this operator by a matrix $P$
185189
@@ -192,7 +196,7 @@ The operator $P$ is called the **orthogonal projection mapping onto** $S$.
192196
193197```
194198
195- It is immediate from the OPT that for any $y \in \mathbb R^n$
199+ It is immediate from the {prf : ref } ` opt ` that for any $y \in \mathbb R^n$
196200
1972011 . $P y \in S$ and
1982021 . $y - P y \perp S$
@@ -224,16 +228,20 @@ such that $y = x_1 + x_2$.
224228
225229Moreover, $x_1 = \hat E_S y$ and $x_2 = y - \hat E_S y$.
226230
227- This amounts to another version of the OPT :
231+ This amounts to another version of the {prf : ref } ` opt ` :
228232
229- ** Theorem** . If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_ {S^{\perp}} y = M y$, then
233+ ``` {prf:theorem} Orthogonal Projection Theorem (another version)
234+ :label: opt_another
235+
236+ If $S$ is a linear subspace of $\mathbb R^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then
230237
231238$$
232239P y \perp M y
233240 \quad \text{and} \quad
234241y = P y + M y
235242 \quad \text{for all } \, y \in \mathbb R^n
236243$$
244+ ```
237245
238246The next figure illustrates
239247
@@ -285,7 +293,7 @@ Combining this result with {eq}`pob` verifies the claim.
285293
286294When a subspace onto which we project is orthonormal, computing the projection simplifies:
287295
288- ** Theorem ** If $\{ u_1, \ldots, u_k\} $ is an orthonormal basis for $S$, then
296+ ``` {prf:theorem} If $\{ u_1, \ldots, u_k\} $ is an orthonormal basis for $S$, then
289297
290298```{math}
291299:label: exp_for_op
@@ -294,8 +302,9 @@ P y = \sum_{i=1}^k \langle y, u_i \rangle u_i,
294302\quad
295303\forall \; y \in \mathbb R^n
296304```
305+ ```
297306
298- Proof: Fix $y \in \mathbb R^n$ and let $P y$ be defined as in {eq}` exp_for_op ` .
307+ ```{prf:proof} Fix $y \in \mathbb R^n$ and let $P y$ be defined as in {eq}`exp_for_op`.
299308
300309Clearly, $P y \in S$.
301310
312321$$
313322
314323(Why is this sufficient to establish the claim that $y - P y \perp S$?)
324+ ```
315325
316326## Projection Via Matrix Algebra
317327
@@ -327,13 +337,17 @@ Evidently $Py$ is a linear function from $y \in \mathbb R^n$ to $P y \in \mathb
327337
328338[ This reference] ( https://en.wikipedia.org/wiki/Linear_map#Matrices ) is useful.
329339
330- ** Theorem.** Let the columns of $n \times k$ matrix $X$ form a basis of $S$. Then
340+ ``` {prf:theorem}
341+ :label: proj_matrix
342+
343+ Let the columns of $n \times k$ matrix $X$ form a basis of $S$. Then
331344
332345$$
333346P = X (X'X)^{-1} X'
334347$$
348+ ```
335349
336- Proof: Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that
350+ ``` {prf:proof} Given arbitrary $y \in \mathbb R^n$ and $P = X (X'X)^{-1} X'$, our claim is that
337351
3383521. $P y \in S$, and
3393532. $y - P y \perp S$
367381$$
368382
369383The proof is now complete.
384+ ```
370385
371386### Starting with the Basis
372387
378393
379394Then the columns of $X$ form a basis of $S$.
380395
381- From the preceding theorem , $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$.
396+ From the {prf : ref } ` proj_matrix ` , $P = X (X' X)^{-1} X' y$ projects $y$ onto $S$.
382397
383398In this context, $P$ is often called the ** projection matrix**
384399
@@ -428,15 +443,16 @@ By approximate solution, we mean a $b \in \mathbb R^k$ such that $X b$ is close
428443
429444The next theorem shows that a best approximation is well defined and unique.
430445
431- The proof uses the OPT .
446+ The proof uses the {prf : ref } ` opt ` .
432447
433- ** Theorem ** The unique minimizer of $\| y - X b \| $ over $b \in \mathbb R^K$ is
448+ ``` {prf:theorem} The unique minimizer of $\| y - X b \| $ over $b \in \mathbb R^K$ is
434449
435450$$
436451\hat \beta := (X' X)^{-1} X' y
437452$$
453+ ```
438454
439- Proof: Note that
455+ ``` {prf:proof} Note that
440456
441457$$
442458X \hat \beta = X (X' X)^{-1} X' y =
458474$$
459475
460476This is what we aimed to show.
477+ ```
461478
462479## Least Squares Regression
463480
@@ -594,9 +611,9 @@ Here are some more standard definitions:
594611
595612> TSS = ESS + SSR
596613
597- We can prove this easily using the OPT .
614+ We can prove this easily using the {prf : ref } ` opt ` .
598615
599- From the OPT we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$.
616+ From the {prf : ref } ` opt ` we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$.
600617
601618Applying the Pythagorean law completes the proof.
602619
@@ -611,7 +628,7 @@ The next section gives details.
611628(gram_schmidt)=
612629### Gram-Schmidt Orthogonalization
613630
614- ** Theorem ** For each linearly independent set $\{ x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an
631+ ``` {prf:theorem} For each linearly independent set $\{ x_1, \ldots, x_k\} \subset \mathbb R^n$, there exists an
615632orthonormal set $\{u_1, \ldots, u_k\}$ with
616633
617634$$
620637\quad \text{for} \quad
621638i = 1, \ldots, k
622639$$
640+ ```
623641
624642The ** Gram-Schmidt orthogonalization** procedure constructs an orthogonal set $\{ u_1, u_2, \ldots, u_n\} $.
625643
@@ -639,12 +657,13 @@ In some exercises below, you are asked to implement this algorithm and test it u
639657
640658The following result uses the preceding algorithm to produce a useful decomposition.
641659
642- ** Theorem ** If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where
660+ ``` {prf:theorem} If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where
643661
644662* $R$ is $k \times k$, upper triangular, and nonsingular
645663* $Q$ is $n \times k$ with orthonormal columns
664+ ```
646665
647- Proof sketch: Let
666+ ``` {prf:proof} Let
648667
649668* $x_j := \col_j (X)$
650669* $\{u_1, \ldots, u_k\}$ be orthonormal with the same span as $\{x_1, \ldots, x_k\}$ (to be constructed using Gram--Schmidt)
@@ -658,6 +677,7 @@ x_j = \sum_{i=1}^j \langle u_i, x_j \rangle u_i
658677$$
659678
660679Some rearranging gives $X = Q R$.
680+ ```
661681
662682### Linear Regression via QR Decomposition
663683
0 commit comments