add docstring references

mileslucas · mileslucas · commit 4422b65b2390 · 2022-01-08T12:21:38.000-06:00
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -5,16 +5,20 @@ CurrentModule = NestedSamplers
 # NestedSamplers.jl
 
 [![GitHub](https://img.shields.io/badge/Code-GitHub-black.svg)](https://github.com/TuringLang/NestedSamplers.jl)
-[![Build Status](https://github.com/turinglang/NestedSamplers.jl/workflows/CI/badge.svg?branch=main)](https://github.com/turinglang/NestedSamplers.jl/actions)
+[![Build Status](https://github.com/TuringLang/NestedSamplers.jl/workflows/CI/badge.svg?branch=main)](https://github.com/TuringLang/NestedSamplers.jl/actions)
 [![PkgEval](https://juliaci.github.io/NanosoldierReports/pkgeval_badges/N/NestedSamplers.svg)](https://juliaci.github.io/NanosoldierReports/pkgeval_badges/report.html)
-[![Coverage](https://codecov.io/gh/turinglang/NestedSamplers.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/turinglang/NestedSamplers.jl)
-
-[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3950594.svg)](https://doi.org/10.5281/zenodo.3950594)
+[![Coverage](https://codecov.io/gh/TuringLang/NestedSamplers.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/turinglang/NestedSamplers.jl)
+[![LICENSE](https://img.shields.io/github/license/TuringLang/NestedSamplers.jl?color=yellow)](https://github.com/TuringLang/NestedSamplers.jl/blob/main/LICENSE)
 
 A Julian implementation of single- and multi-ellipsoidal nested sampling algorithms using the [AbstractMCMC](https://github.com/turinglang/abstractmcmc.jl) interface.
 
 This package was heavily influenced by [`nestle`](https://github.com/kbarbary/nestle), [`dynesty`](https://github.com/joshspeagle/dynesty), and [`NestedSampling.jl`](https://github.com/kbarbary/NestedSampling.jl).
 
+## Citing
+
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3950594.svg)](https://doi.org/10.5281/zenodo.3950594)
+
+If you use this library, or a derivative of it, in your work, please consider citing it. This code is built off a multitude of academic works, which have been noted in the docstrings where appropriate. These references, along with references for the more general calculations, can all be found in [CITATIONS.bib](https://github.com/TuringLang/NestedSamplers.jl/blob/main/CITATIONS.bib)
 
 ## Installation
 
diff --git a/docs/src/intro.md b/docs/src/intro.md
@@ -0,0 +1,48 @@
+# Introduction
+
+Nested sampling is a statistical technique first described in Skilling (2004)[^1] as a method for estimating the Bayesian evidence. Conveniently, it also produces samples with importance weighting proportional to the posterior distribution. To understand what this means, we need to comprehend [Bayes' theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem).
+
+## Bayes' theorem
+
+Bayes' theorem, in our nomenclature, is described as the relationship between the *prior*, the *likelihood*, the *evidence*, and the *posterior*. In its entirety-
+
+```math
+p(\theta | x) = \frac{p(x | \theta)p(\theta)}{p(x)}
+```
+
+### Posterior
+
+ ``p(\theta | x)`` - the probability of the model parameters (``\theta``) conditioned on the data (``x``)
+
+### Likelihood
+
+``p(x | \theta)`` - the probability of the data (``x``) conditioned on the model parameters (``\theta``)
+
+### Prior
+
+``p(\theta)`` - the probability of the model parameters
+
+### Evidence
+
+``p(x)`` - the probability of the data
+
+If you are familiar with Bayesian statistics and Markov Chain Monte Carlo (MCMC) techniques, you should be somewhat familiar with the relationships between the posterior, the likelihood, and the prior. The evidence, though, is somewhat hard to describe; what does "the probability of the data" mean? Well, another way of writing the evidence, is this integral
+
+```math
+p(x)  \equiv Z = \int_\Omega{p(x | \theta) \mathrm{d}\theta}
+```
+
+which is like saying "the likelihood of the data [``p(x | \theta)``] integrated over *all of parameter space* [``\Omega``]". We have to write the probability this way, because the data are statistically dependent on the model parameters. This integral is intractable for all but the simplest combinations of distributions ([conjugate distributions](https://en.wikipedia.org/wiki/Conjugate_prior)), and therefore it must be estimated or approximated in some way.
+
+## What can we do with the evidence?
+
+Before we get into approximating the Bayesian evidence, let's talk about why it's important. After all, for most MCMC applications it is simply a normalization factor to be ignored (how convenient!).
+
+## Further reading
+
+For further reading, I recommend reading the cited sources in the footnotes, as well as the references below
+
+* [dynesty documentation](https://dynesty.readthedocs.io)
+
+
+[^1]: Skilling 2004
diff --git a/src/bounds/Bounds.jl b/src/bounds/Bounds.jl
@@ -77,7 +77,9 @@ end
 """
     Bounds.NoBounds([T=Float64], N)
 
-Unbounded prior volume; equivalent to the unit cube in `N` dimensions.
+Unbounded prior volume; equivalent to the unit cube in `N` dimensions. This matches the original nested sampling derivation in Skilling (2004).[^1]
+
+[^1]: John Skilling, 2004, AIP 735, 395  ["Nested Sampling"](https://aip.scitation.org/doi/abs/10.1063/1.1835238)
 """
 struct NoBounds{T} <: AbstractBoundingSpace{T}
     ndims::Int
diff --git a/src/bounds/ellipsoid.jl b/src/bounds/ellipsoid.jl
@@ -9,6 +9,10 @@ An `N`-dimensional ellipsoid defined by
 ```
 
 where `size(center) == (N,)` and `size(A) == (N,N)`.
+
+This implementation follows the algorithm presented in Mukherjee et al. (2006).[^1]
+
+[^1]: Pia Mukherjee, et al., 2006, ApJ 638 L51 ["A Nested Sampling Algorithm for Cosmological Model Selection"](https://iopscience.iop.org/article/10.1086/501068)
 """
 mutable struct Ellipsoid{T} <: AbstractBoundingSpace{T}
     center::Vector{T}
diff --git a/src/bounds/multiellipsoid.jl b/src/bounds/multiellipsoid.jl
@@ -3,7 +3,21 @@
     Bounds.MultiEllipsoid([T=Float64], ndims)
     Bounds.MultiEllipsoid(::AbstractVector{Ellipsoid})
 
-Use multiple [`Ellipsoid`](@ref)s in an optimal clustering to bound prior space. For more details about the bounding algorithm, see the extended help (`??Bounds.MultiEllipsoid`)
+Use multiple [`Ellipsoid`](@ref)s in an optimal clustering to bound prior space. This implementation follows the MultiNest implementation outlined in Feroz et al. (2009).[^1] For more details about the bounding algorithm, see the extended help (`??Bounds.MultiEllipsoid`)
+
+[^1]: Feroz et al., 2009, MNRAS 398, 4 ["MultiNest: an efficient and robust Bayesian inference tool for cosmology and particle physics"](https://academic.oup.com/mnras/article/398/4/1601/981502)
+
+## Extended help
+
+The multiple-ellipsoidal implementation is defined as follows:
+
+1. Fit a [`Bounds.Ellipsoid`](@ref) to the sample.
+2. Perform K-means clustering (here using [Clustering.jl](https://github.com/JuliaStats/Clustering.jl)) centered at the endpoints of the bounding ellipsoid. This defines two clusters within the sample.
+3. If either cluster has fewer than two points, consider it ill-defined and end any recursion.
+4. Fit [`Bounds.Ellipsoid`](@ref) to each of the clusters assigned in (2).
+5. If the volume of the parent ellipsoid is more than twice the volume of the two child ellipsoids, recurse (1-5) to each child.
+
+To sample from this distribution, a random ellipsoid is selected, and a random sample is sampled from that ellipsoid. We then reverse this and find all of the ellipsoids which enclose the sampled point, and select one of those randomly for the enclosing bound.
 """
 struct MultiEllipsoid{T} <: AbstractBoundingSpace{T}
     ellipsoids::Vector{Ellipsoid{T}}
diff --git a/src/proposals/Proposals.jl b/src/proposals/Proposals.jl
@@ -44,7 +44,9 @@ unitcheck(us) = all(u -> 0 < u < 1, us)
 """
     Proposals.Rejection(;maxiter=100_000)
 
-Propose a new live point by uniformly sampling within the bounding volume and rejecting samples that do not meet the likelihood constraints.
+Propose a new live point by uniformly sampling within the bounding volume and rejecting samples that do not meet the likelihood constraints. This follows the original nested sampling algorithm proposed in Skilling (2004)[^1]
+
+[^1]: John Skilling, 2004, AIP 735, 395  ["Nested Sampling"](https://aip.scitation.org/doi/abs/10.1063/1.1835238)
 
 ## Parameters
 - `maxiter` is the maximum number of samples that can be rejected before giving up and throwing an error.
@@ -79,7 +81,9 @@ Base.show(io::IO, p::Rejection) = print(io, "NestedSamplers.Proposals.Rejection"
 """
     Proposals.RWalk(;ratio=0.5, walks=25, scale=1)
 
-Propose a new live point by random walking away from an existing live point.
+Propose a new live point by random walking away from an existing live point. This follows the algorithm outlined in Skilling (2006).[^1]
+
+[^1] Skilling, 2006, Bayesian Anal. 1(4), ["Nested sampling for general Bayesian computation"](https://projecteuclid.org/journals/bayesian-analysis/volume-1/issue-4/Nested-sampling-for-general-Bayesian-computation/10.1214/06-BA127.full)
 
 ## Parameters
 - `ratio` is the target acceptance ratio
@@ -172,7 +176,9 @@ end
 Propose a new live point by random staggering away from an existing live point. 
 This differs from the random walk proposal in that the step size here is exponentially adjusted
 to reach a target acceptance rate _during_ each proposal, in addition to _between_
-proposals.
+proposals. This follows the algorithm outlined in Skilling (2006).[^1]
+
+[^1] Skilling, 2006, Bayesian Anal. 1(4), ["Nested sampling for general Bayesian computation"](https://projecteuclid.org/journals/bayesian-analysis/volume-1/issue-4/Nested-sampling-for-general-Bayesian-computation/10.1214/06-BA127.full)
 
 ## Parameters
 - `ratio` is the target acceptance ratio
@@ -263,7 +269,9 @@ end
     Proposals.Slice(;slices=5, scale=1)
 
 Propose a new live point by a series of random slices away from an existing live point.
-This is a standard _Gibbs-like_ implementation where a single multivariate slice is a combination of `slices` univariate slices through each axis.
+This is a standard _Gibbs-like_ implementation where a single multivariate slice is a combination of `slices` univariate slices through each axis. This follows the algorithm outlined in Neal (2003).[^1]
+
+[^1]: Neal, 2003, Ann. Statist. 31(3), ["Slice Sampling"](https://projecteuclid.org/journals/annals-of-statistics/volume-31/issue-3/Slice-sampling/10.1214/aos/1056562461.full)
 
 ## Parameters
 - `slices` is the minimum number of slices
@@ -321,8 +329,12 @@ end   # end of function Slice
 
 """
     Proposals.RSlice(;slices=5, scale=1)
-Propose a new live point by a series of random slices away from an existing live point.
-This is a standard _random_ implementation where each slice is along a random direction based on the provided axes.
+
+Propose a new live point by a series of random slices away from an existing live point. This is a standard _random_ implementation where each slice is along a random direction based on the provided axes. This more closely matches the PolyChord implementation outlined in Handley et al. (2015a,b).[^1][^2]
+
+[^1]: Handley, et al., 2015a, MNRAS 450(1), ["polychord: nested sampling for cosmology"](https://academic.oup.com/mnrasl/article/450/1/L61/986122)
+[^2]: Handley, et al., 2015b, MNRAS 453(4), ["polychord: next-generation nested sampling"](https://academic.oup.com/mnras/article/453/4/4384/2593718)
+
 ## Parameters
 - `slices` is the minimum number of slices
 - `scale` is the proposal distribution scale, which will update _between_ proposals.