Skip to content

Commit 9371a4f

Browse files
authored
FIX: Minor Fixes to Entropy Lecture (#105)
* FIX: Minor Fixes to Entropy Lecture * remove additional \label
1 parent 7fda828 commit 9371a4f

File tree

1 file changed

+33
-56
lines changed

1 file changed

+33
-56
lines changed

lectures/entropy.md

Lines changed: 33 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ with probabilities $p_i = \textrm{Prob}(X = x_i) \geq 0, \sum_i p_i =1$.
4848
Claude Shannon's {cite}`Shannon_1949` definition of entropy is
4949

5050
$$
51-
H(p) = \sum_i p_i \log_b (p_i^{-1}) = - \sum_i p_i \log_b (p_i) .
51+
H(p) = \sum_i p_i \log_b (p_i^{-1}) = - \sum_i p_i \log_b (p_i) .
5252
$$ (eq:Shannon1)
5353
5454
where $\log_b$ denotes the log function with base $b$.
@@ -78,8 +78,6 @@ For a discrete random variable $X$ with probability density $p = \{p_i\}_{i=1}^n
7878
for state $i$ is $ s_i = \log\left(\frac{1}{p_i}\right) $.
7979
8080
81-
82-
8381
The quantity $ \log\left(\frac{1}{p_i}\right) $ is called the **surprisal** because it is inversely related to the likelihood that state
8482
$i$ will occur.
8583
@@ -122,9 +120,6 @@ Entropy as a function of $\hat \pi_1$ when $\pi_1 = .5$.
122120
```
123121
124122
125-
126-
127-
128123
### Example
129124
130125
Take an $n$-sided possibly unfair die with a probability distribution $\{p_i\}_{i=1}^n$.
@@ -135,8 +130,6 @@ Among all dies, a fair die maximizes entropy.
135130
For a fair die,
136131
entropy equals $H(p) = - n^{-1} \sum_i \log \left( \frac{1}{n} \right) = \log(n)$.
137132
138-
139-
140133
To specify the expected number of bits needed to isolate the outcome of one roll of a fair $n$-sided die requires $\log_2 (n)$ bits of information.
141134
142135
For example,
@@ -146,18 +139,16 @@ For $n=3$, $\log_2(3) = 1.585$.
146139
147140
148141
149-
150142
## Mathematical Properties of Entropy
151143
152144
For a discrete random variable with probability vector $p$, entropy $H(p)$ is
153145
a function that satisfies
154-
* $H$ is *continuous*.
155-
* $H$ is *symmetric*: $H(p_1, p_2, \ldots, p_n) = H(p_{r_1}, \ldots, p_{r_n})$ for any permutation $r_1, \ldots, r_n$ of $1,\ldots, n$.
156-
* A uniform distribution maximizes $H(p)$:
157-
$ H(p_1, \ldots, p_n) \leq H(\frac{1}{n}, \ldots, \frac{1}{n}) .$
158-
* Maximum entropy increases with the number of states:
146+
* $H$ is *continuous*.
147+
* $H$ is *symmetric*: $H(p_1, p_2, \ldots, p_n) = H(p_{r_1}, \ldots, p_{r_n})$ for any permutation $r_1, \ldots, r_n$ of $1,\ldots, n$.
148+
* A uniform distribution maximizes $H(p)$: $ H(p_1, \ldots, p_n) \leq H(\frac{1}{n}, \ldots, \frac{1}{n}) .$
149+
* Maximum entropy increases with the number of states:
159150
$ H(\frac{1}{n}, \ldots, \frac{1}{n} ) \leq H(\frac{1}{n+1} , \ldots, \frac{1}{n+1})$.
160-
* Entropy is not affected by events zero probability.
151+
* Entropy is not affected by events zero probability.
161152
162153
163154
## Conditional Entropy
@@ -168,7 +159,7 @@ occurring with probability density $p(x_i, y_i)$.
168159
Conditional entropy $H(X| Y)$ is
169160
defined as
170161
171-
$$ \label{Shannon2}
162+
$$
172163
H(X | Y) = \sum_{i,j} p(x_i,y_j) \log \frac{p(y_j)}{p(x_i,y_j)}.
173164
$$ (eq:Shannon2)
174165
@@ -229,7 +220,7 @@ Assume that $\frac{p_i}{q_t} \in (0,\infty)$ for all $i$ for which $p_i >0$.
229220
Then the Kullback-Leibler statistical divergence, also called **relative entropy**,
230221
is defined as
231222
232-
$$ \label{Shannon3}
223+
$$
233224
D(p|q) = \sum_i p_i \log \left(\frac{p_i}{q_i}\right) = \sum_i q_i \left( \frac{p_i}{q_i}\right) \log\left( \frac{p_i}{q_i}\right) .
234225
$$ (eq:Shanno3)
235226
@@ -262,39 +253,39 @@ $$
262253
## Relative entropy and Gaussian distributions
263254
264255
We want to compute relative entropy for two continuous densities $\phi$ and $\hat \phi$ when
265-
$\phi$ is ${\cal N}(0,I)$
266-
and ${\hat \phi}$ is ${\cal N}(w, \Sigma)$, where the covariance matrix $\Sigma$ is nonsingular.
256+
$\phi$ is ${\cal N}(0,I)$ and ${\hat \phi}$ is ${\cal N}(w, \Sigma)$, where the covariance matrix $\Sigma$ is nonsingular.
267257
268-
We seek
269-
a formula for
258+
We seek a formula for
270259
271-
$$ \textrm{ent} = \int (\log {\hat \phi(\varepsilon)} - \log \phi(\varepsilon) ){\hat \phi(\varepsilon)} d \varepsilon.
272-
$$
260+
$$
261+
\textrm{ent} = \int (\log {\hat \phi(\varepsilon)} - \log \phi(\varepsilon) ){\hat \phi(\varepsilon)} d \varepsilon.
262+
$$
273263
274264
**Claim**
275265
276266
$$
277267
\textrm{ent} = %\int (\log {\hat \phi} - \log \phi ){\hat \phi} d \varepsilon=
278268
-{1 \over 2} \log
279269
\det \Sigma + {1 \over 2}w'w + {1 \over 2}\mathrm{trace} (\Sigma - I)
280-
. \label{relentropy101}
270+
.
281271
$$ (eq:relentropy101)
282272
283273
**Proof**
284274
285275
The log likelihood ratio is
286276
287-
\begin{equation} \log {\hat \phi}(\varepsilon) - \log \phi(\varepsilon) =
277+
$$
278+
\log {\hat \phi}(\varepsilon) - \log \phi(\varepsilon) =
288279
{1 \over 2} \left[ - (\varepsilon - w)' \Sigma^{-1} (\varepsilon - w)
289280
+ \varepsilon' \varepsilon - \log \det
290-
\Sigma\right] .\label{footnote2} \end{equation}
281+
\Sigma\right] .
282+
$$ (footnote2)
291283
292284
293-
Observe
294-
that
285+
Observe that
295286
296287
$$
297-
- \int {1 \over 2} (\varepsilon - w)' \Sigma^{-1} (\varepsilon -
288+
- \int {1 \over 2} (\varepsilon - w)' \Sigma^{-1} (\varepsilon -
298289
w) {\hat \phi}(\varepsilon) d\varepsilon = - {1 \over 2}\mathrm{trace}(I).
299290
$$
300291
@@ -318,7 +309,7 @@ Combining terms gives
318309
$$
319310
\textrm{ent} = \int (\log {\hat \phi} - \log \phi ){\hat \phi} d \varepsilon= -{1 \over 2} \log
320311
\det \Sigma + {1 \over 2}w'w + {1 \over 2}\mathrm{trace} (\Sigma - I)
321-
. \label{relentropy}
312+
.
322313
$$ (eq:relentropy)
323314
324315
which agrees with equation {eq}`eq:relentropy101`.
@@ -334,7 +325,7 @@ distributions.
334325
335326
Then
336327
337-
$$ \label{Shannon5}
328+
$$
338329
D(N_0|N_1) = \frac{1}{2} \left(\mathrm {trace} (\Sigma_1^{-1} \Sigma_0)
339330
+ (\mu_1 -\mu_0)' \Sigma_1^{-1} (\mu_1 - \mu_0) - \log\left( \frac{ \mathrm {det }\Sigma_0 }{\mathrm {det}\Sigma_1}\right)
340331
- k \right).
@@ -369,17 +360,15 @@ After flipping signs, {cite}`Backus_Chernov_Zin` use Kullback-Leibler relative
369360
assert is useful for characterizing features of both the data and various theoretical models of stochastic discount factors.
370361
371362
Where $p_{t+1}$ is the physical or true measure, $p_{t+1}^*$ is the risk-neutral measure, and $E_t$ denotes conditional
372-
expectation under the $p_{t+1}$ measure,
373-
{cite}`Backus_Chernov_Zin`
374-
define entropy as
363+
expectation under the $p_{t+1}$ measure, {cite}`Backus_Chernov_Zin` define entropy as
375364
376-
$$ \label{eq:BCZ1}
365+
$$
377366
L_t (p_{t+1}^*/p_{t+1}) = - E_t \log( p_{t+1}^*/p_{t+1}).
378367
$$ (eq:BCZ1)
379368
380369
Evidently, by virtue of the minus sign in equation {eq}`eq:BCZ1`,
381370
382-
$$ \label{eqn:BCZ2}
371+
$$
383372
L_t (p_{t+1}^*/p_{t+1}) = D_{KL,t}( p_{t+1}^*|p_{t+1}),
384373
$$ (eq:BCZ2)
385374
@@ -420,7 +409,7 @@ $$
420409
421410
As described in chapter XIV of {cite}`Sargent1987`, the Wiener-Kolmogorov formula for the one-period ahead prediction error is
422411
423-
$$\label{Shannon6}
412+
$$
424413
\sigma_\epsilon^2 = \exp\left[\left( \frac{1}{2\pi}\right) \int_{-\pi}^\pi \log S_x (\omega) d \omega \right].
425414
$$ (eq:Shannon6)
426415
@@ -434,11 +423,10 @@ Consider the following problem reminiscent of one described earlier.
434423
Among all covariance stationary univariate processes with unconditional variance $\sigma_x^2$, find a process with maximal
435424
one-step-ahead prediction error.
436425
437-
438-
The maximizer is a process with spectral density
426+
The maximizer is a process with spectral density
439427
440428
$$
441-
S_x(\omega) = 2 \pi \sigma_x^2.
429+
S_x(\omega) = 2 \pi \sigma_x^2.
442430
$$
443431
444432
Thus, among
@@ -475,7 +463,7 @@ $$
475463
$$ (eq:Shannon22)
476464
477465
Being a measure of the unpredictability of an $n \times 1$ vector covariance stationary stochastic process,
478-
the left side of {eq}`eq:Shannon22` is sometimes called entropy.
466+
the left side of {eq}`eq:Shannon22` is sometimes called entropy.
479467
480468
481469
## Frequency Domain Robust Control
@@ -484,7 +472,6 @@ Chapter 8 of {cite}`hansen2008robustness` adapts work in the control theory lit
484472
**frequency domain entropy** criterion for robust control as
485473
486474
$$
487-
\label{Shannon21}
488475
\int_\Gamma \log \det [ \theta I - G_F(\zeta)' G_F(\zeta) ] d \lambda(\zeta) ,
489476
$$ (eq:Shannon21)
490477
@@ -494,7 +481,6 @@ objective function.
494481
Hansen and Sargent {cite}`hansen2008robustness` show that criterion {eq}`eq:Shannon21` can be represented as
495482
496483
$$
497-
\label{Shannon220}
498484
\log \det [ D(0)' D(0)] = \int_\Gamma \log \det [ \theta I - G_F(\zeta)' G_F(\zeta) ] d \lambda(\zeta) ,
499485
$$ (eq:Shannon220)
500486
@@ -504,8 +490,6 @@ This explains the
504490
moniker **maximum entropy** robust control for decision rules $F$ designed to maximize criterion {eq}`eq:Shannon21`.
505491
506492
507-
508-
509493
## Relative Entropy for a Continuous Random Variable
510494
511495
Let $x$ be a continuous random variable with density $\phi(x)$, and let $g(x) $ be a nonnegative random variable satisfying $\int g(x) \phi(x) dx =1$.
@@ -521,29 +505,24 @@ $$
521505
over the interval $g \geq 0$.
522506
523507
524-
That relative entropy $\textrm{ent}(g) \geq 0$ can be established by noting (a) that $g \log g \geq g-1$ (see {numref}`figure-example2`)
525-
and (b) that under $\phi$, $E g =1$.
508+
That relative entropy $\textrm{ent}(g) \geq 0$ can be established by noting (a) that $g \log g \geq g-1$ (see {numref}`figure-example2`)
509+
and (b) that under $\phi$, $E g =1$.
526510
527511
528-
{numref}`figure-example3` and {numref}`figure-example4` display aspects of relative entropy visually for a continuous random variable $x$ for
512+
{numref}`figure-example3` and {numref}`figure-example4` display aspects of relative entropy visually for a continuous random variable $x$ for
529513
two densities with likelihood ratio $g \geq 0$.
530514
531515
Where the numerator density is ${\mathcal N}(0,1)$, for two denominator Gaussian densities ${\mathcal N}(0,1.5)$ and ${\mathcal N}(0,.95)$, respectively, {numref}`figure-example3` and {numref}`figure-example4` display the functions $g \log g$ and $g -1$ as functions of $x$.
532516
533517
534518
535-
536-
537-
538519
```{figure} entropy_glogg.png
539520
:height: 350px
540521
:name: figure-example2
541522
542523
The function $g \log g$ for $g \geq 0$. For a random variable $g$ with $E g =1$, $E g \log g \geq 0$.
543524
```
544525
545-
546-
547526
```{figure} entropy_1_over_15.jpg
548527
:height: 350px
549528
:name: figure-example3
@@ -553,13 +532,11 @@ Under the ${\mathcal N}(0,1.5)$ density, $E g =1$.
553532
```
554533
555534
556-
557-
558535
```{figure} entropy_1_over_95.png
559536
:height: 350px
560537
:name: figure-example4
561538
562-
$g \log g$ and $g-1$ where $g$ is the ratio of the density of a ${\mathcal N}(0,1)$ random variable to the density of a ${\mathcal N}(0,1.5)$ random variable.
539+
$g \log g$ and $g-1$ where $g$ is the ratio of the density of a ${\mathcal N}(0,1)$ random variable to the density of a ${\mathcal N}(0,1.5)$ random variable.
563540
Under the ${\mathcal N}(0,1.5)$ density, $E g =1$.
564541
```
565542

0 commit comments

Comments
 (0)