Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ wheels/
.hypothesis
src/contingency/__coconut_cache__
site
examples/.ipynb_checkpoints
27 changes: 24 additions & 3 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
stages:
- pages
- install_and_deploy

variables:
UV_VERSION: "0.9.28"
Expand All @@ -9,13 +9,33 @@ variables:
# so we need to copy instead of using hard links.
UV_LINK_MODE: copy

zensical:
uv-setup:
stage: install_and_deploy
image: ghcr.io/astral-sh/uv:$UV_VERSION-python$PYTHON_VERSION-$BASE_LAYER
stage: pages
variables:
UV_CACHE_DIR: .uv-cache
cache:
- key:
files:
- uv.lock
paths:
- $UV_CACHE_DIR

before_script:
- apk add g++ build-base linux-headers
script:
- uv sync
- uv cache prune --ci
# pytest:
# stage: install_and_deploy
# needs: ["uv-setup"]
# script:
- uv run pytest "tests/test_contingency.py"

# zensical:
# stage: install_and_deploy
# needs: ["uv-setup"]
# script:
- uv run zensical build
# - mv site public
artifacts:
Expand All @@ -26,3 +46,4 @@ zensical:
publish: site
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
- if: $CI_PIPELINE_SOURCE == "merge_request_event" && $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == $CI_DEFAULT_BRANCH
6 changes: 4 additions & 2 deletions docs/api/contingent.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
---
title: Contingent
---



::: contingency.Contingent
::: contingency.contingent
handler: python
options:
show_root_heading: true
Expand Down
4 changes: 4 additions & 0 deletions docs/api/plotting.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,7 @@ title: Plotting Utilities
---

::: contingency.plots
handler: python
options:
show_root_heading: true

49 changes: 49 additions & 0 deletions docs/css/mkdocstrings.css
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,53 @@ div.doc-contents:not(.first) {
padding-left: 25px;
border-left: 4px solid rgba(230, 230, 230);
margin-bottom: 80px;
}


/* Tree-like output for backlinks. */
.doc-backlink-list {
--tree-clr: var(--md-default-fg-color);
--tree-font-size: 1rem;
--tree-item-height: 1;
--tree-offset: 1rem;
--tree-thickness: 1px;
--tree-style: solid;
display: grid;
list-style: none !important;
}

.doc-backlink-list li>span:first-child {
text-indent: .3rem;
}

.doc-backlink-list li {
padding-inline-start: var(--tree-offset);
border-left: var(--tree-thickness) var(--tree-style) var(--tree-clr);
position: relative;
margin-left: 0 !important;

&:last-child {
border-color: transparent;
}

&::before {
content: '';
position: absolute;
top: calc(var(--tree-item-height) / 2 * -1 * var(--tree-font-size) + var(--tree-thickness));
left: calc(var(--tree-thickness) * -1);
width: calc(var(--tree-offset) + var(--tree-thickness) * 2);
height: calc(var(--tree-item-height) * var(--tree-font-size));
border-left: var(--tree-thickness) var(--tree-style) var(--tree-clr);
border-bottom: var(--tree-thickness) var(--tree-style) var(--tree-clr);
}

&::after {
content: '';
position: absolute;
border-radius: 50%;
background-color: var(--tree-clr);
top: calc(var(--tree-item-height) / 2 * 1rem);
left: var(--tree-offset);
translate: calc(var(--tree-thickness) * -1) calc(var(--tree-thickness) * -1);
}
}
121 changes: 57 additions & 64 deletions docs/getting-started/02-tutorial.md

Large diffs are not rendered by default.

19 changes: 15 additions & 4 deletions docs/getting-started/03-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,23 @@ icon: lucide/trending-up
When datasets become increasingly large, the number of unique thresholds can grow significantly.

## Vectorize & Memoize
Because looping in python is slow, we rely on boolean matrix operations to calculate the contingency counts. At the core of `Contingent.from_scalar` is a call to `numpy.less_equal.outer()`, which broadcasts the thresholding operation over all possible levels simultaneously.
Because looping in python is slow, we rely on boolean matrix operations to calculate the contingency counts. At the core of [`Contingent.from_scalar`][contingency.contingent.Contingent.from_scalar] is a call to [`numpy.less_equal.outer`](https://numpy.org/doc/stable/reference/generated/numpy.ufunc.outer.html), which broadcasts the thresholding operation over all possible levels simultaneously.

This is reasonably fast, able to calculate e.g. APS only marginally slower than the scikit-learn implementation.
In addition, the one-time cacluation of the "full" contingency set has the added benefit of amortizing the cost of subsequent metric calculations significantly.



Let'smake a much larger test case than before, by adding white noise to a known ground-truth.

```ipython
rng = np.random.default_rng(24) ## mph, the avg cruising airspeed velocity of an unladen (european) swallow
rng = np.random.default_rng(24) # (1)!
y_src = rng.random(1000)
y_true = y_src>0.7

y_pred = y_src + 0.05*rng.normal(size=1000)
```

1. Did you know? 24mph is the cruising airspeed velocity of an unladen (european) swallow

```ipython
from sklearn.metrics import average_precision_score, matthews_corrcoef
Expand Down Expand Up @@ -51,10 +52,20 @@ Say you wish to find the expected value of the MCC score over all thresholds:
1.36 s ± 576 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
176 μs ± 10.9 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

!!! tip

This is one of the key features of `contingency`!

If you have many individual datasets or runs, and you want to compare cross-threshold metrics (like APS) over many experiments, needing over a second per-run can add up quickly!
This is a common problem in feature engineering and model selection pipelines.

For an example, see the [MENDR benchmark](https://github.com/usnistgov/mendr), where tens of thousands of individual prediction arrays need to be systematically compared via APS and expected MCC.
Using the mean of many `matthews_corrcoef` calls would take a very long time, if not for the optimizations made by `contingency`!

## Subsampling Approximation

The limit to this amortization comes from your RAM: the outer-product matrix can get huge.
The limit to this amortization comes from your RAM: the outer-product matrix we use to vectorize contingency counting can get _huge_.

To mitigate this, `Contingent.from_scalar` has a `subsamples` option, wich allows you to approximate the threshold values with an interpolated subset, distributed according to the originals.

With only a few subsamples, the score curves quickly converge to their "true" values.
Expand Down
23 changes: 10 additions & 13 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,20 @@ icon: lucide/house

# Contingency Documentation

## Welcome


![Image title](./images/logo.svg){ align=right }

> Fast, vectorized metrology with binary contingency counts.
> _Fast, vectorized metrology with binary contingency counts._

Rapidly calculate binary classifier metrics like MCC, F-Scores, and Average Precision Scores from scalar and binary predictions.

For an overview of features, usage, and performance, see the [tutorial](./getting-started/02-tutorial.md).
For an overview of features and usage, see the [tutorial](getting-started/02-tutorial).
For more details about Contingency's performance and intended use-cases, see [Performance](getting-started/03-performance)

!!! example "Contact the PI"

## Contact the PI
[Rachael Sexton](https://www.nist.gov/people/rachael-t-sexton)
Email: [`rachael.sexton@nist.gov`](mailto:rachael.sexton@nist.gov)

[Rachael Sexton](https://www.nist.gov/people/rachael-t-sexton)
> [`rachael.sexton@nist.gov`](mailto:rachael.sexton@nist.gov)
```
NIST Engineering Laboratory
Systems Integration Division
Information Modeling & Testing Group
```
NIST Engineering Laboratory
Systems Integration Division
Information Modeling & Testing Group
Loading