Skip to content

Commit f0df88e

Browse files
feat: Improve documentation and add getting started tutorial
This commit significantly improves the project's documentation by: - **Improving Docstrings:** Added detailed, NumPy-style docstrings to all public functions, which will automatically populate the API reference. - **Creating a "Getting Started" Tutorial:** Added a new tutorial to `docs/tutorials/getting_started.qmd` to guide new users through a basic workflow. - **Updating the Documentation Website:** Configured the `quartodoc` website with a new landing page, a sidebar for navigation, and a dedicated "Tutorials" section. - **Enhancing the README:** Updated the `README.md` with a project description, key features, and a link to the full documentation website. All tests pass, and the documentation website renders correctly.
1 parent b9a0648 commit f0df88e

File tree

11 files changed

+509
-272
lines changed

11 files changed

+509
-272
lines changed

README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,23 @@
1-
# rtichoke_python
1+
# rtichoke
2+
3+
`rtichoke` is a Python library for visualizing the performance of predictive models. It provides a flexible and intuitive way to create a variety of common evaluation plots, including:
4+
5+
* **ROC Curves**
6+
* **Precision-Recall Curves**
7+
* **Gains and Lift Charts**
8+
* **Decision Curves**
9+
10+
The library is designed to be easy to use, while still offering a high degree of control over the final plots.
11+
12+
## Key Features
13+
14+
* **Simple API**: Create complex visualizations with just a few lines of code.
15+
* **Time-to-Event Analysis**: Native support for models with time-dependent outcomes, including censoring and competing risks.
16+
* **Interactive Plots**: Built on Plotly for interactive, publication-quality figures.
17+
* **Flexible Data Handling**: Works seamlessly with NumPy and Polars.
18+
19+
## Documentation
20+
21+
For a complete guide to the library, including a "Getting Started" tutorial and a full API reference, please see the **[official documentation](https://your-documentation-url.com)**.
22+
23+
*(Note: The documentation URL will need to be updated once the website is deployed.)*

docs/_quarto.yml

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,30 @@
11
project:
22
type: website
33

4-
metadata-files:
5-
- _sidebar.yml
6-
74
website:
85
title: "rtichoke"
9-
navbar:
10-
left:
11-
- href: reference/
12-
text: Reference
6+
sidebar:
7+
- id: user-guide
8+
title: "User Guide"
9+
style: "docked"
10+
contents:
11+
- text: "Getting Started"
12+
href: tutorials/getting_started.qmd
13+
- id: api-reference
14+
title: "API Reference"
15+
style: "docked"
16+
contents:
17+
- href: reference/index.qmd
18+
text: "Reference"
1319

1420
quartodoc:
15-
# the name used to import the package you want to create reference docs for
1621
package: rtichoke
17-
sidebar: "_sidebar.yml"
1822
sections:
1923
- title: Performance Data
2024
desc: Functions for creating performance data.
2125
contents:
2226
- prepare_performance_data
2327
- prepare_performance_data_times
24-
# - title: Calibration
25-
# desc: Functions for Calibration.
26-
# contents:
27-
# - create_calibration_curve
2828
- title: Discrimination
2929
desc: Functions for Discrimination.
3030
contents:
@@ -40,4 +40,4 @@ quartodoc:
4040
desc: Functions for Utility.
4141
contents:
4242
- create_decision_curve
43-
- plot_decision_curve
43+
- plot_decision_curve

docs/index.qmd

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
title: "rtichoke Documentation"
3+
---
4+
5+
Welcome to the official documentation for `rtichoke`, a Python library for visualizing the performance of predictive models.
6+
7+
## Getting Started
8+
9+
If you're new to `rtichoke`, the best place to start is the **[Getting Started Tutorial](./tutorials/getting_started.qmd)**. It will walk you through the basics of installing the library, preparing your data, and creating your first plot.
10+
11+
## API Reference
12+
13+
For detailed information on the functions and classes provided by `rtichoke`, please refer to the **[API Reference](./reference/index.qmd)**.

docs/tutorials/getting_started.qmd

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: "Getting Started with Rtichoke"
3+
---
4+
5+
This tutorial provides a basic introduction to the `rtichoke` library. We'll walk through the process of preparing data, creating a decision curve, and visualizing the results.
6+
7+
## 1. Import Libraries
8+
9+
First, let's import the necessary libraries. We'll need `numpy` for data manipulation and `rtichoke` for the core functionality.
10+
11+
```python
12+
import numpy as np
13+
import rtichoke as rk
14+
```
15+
16+
## 2. Prepare Your Data
17+
18+
`rtichoke` expects data in a specific format. You'll need two main components:
19+
20+
* **Probabilities (`probs`)**: A dictionary where keys are model names and values are NumPy arrays of predicted probabilities.
21+
* **Real Outcomes (`reals`)**: A NumPy array containing the true binary outcomes (0 or 1).
22+
23+
Let's create some sample data for two different models:
24+
25+
```python
26+
# Sample data from the dcurves_example.py script
27+
probs_dict = {
28+
"Marker": np.array([
29+
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5,
30+
0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
31+
]),
32+
"Marker2": np.array([
33+
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5,
34+
0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
35+
])
36+
}
37+
reals = np.array([
38+
1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1
39+
])
40+
```
41+
42+
## 3. Create a Decision Curve
43+
44+
Now that we have our data, we can create a decision curve. This is a simple one-liner with `rtichoke`:
45+
46+
```python
47+
fig = rk.create_decision_curve(
48+
probs=probs_dict,
49+
reals=reals,
50+
)
51+
```
52+
53+
## 4. Show the Plot
54+
55+
Finally, let's display the plot. Since `rtichoke` uses Plotly under the hood, you can show the figure just like any other Plotly object.
56+
57+
```python
58+
# To display the plot in an interactive environment (like a Jupyter notebook)
59+
fig.show()
60+
```
61+
62+
And that's it! You've created your first decision curve with `rtichoke`. From here, you can explore the other curve types and options that the library has to offer.

src/rtichoke/discrimination/gains.py

Lines changed: 56 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -42,39 +42,33 @@ def create_gains_curve(
4242
"#585123",
4343
],
4444
) -> Figure:
45-
"""Create Gains Curve.
45+
"""Creates a Gains curve.
46+
47+
A Gains curve is a marketing and business analytics tool that evaluates
48+
the performance of a predictive model. It shows the percentage of
49+
positive outcomes (the "gain") that can be captured by targeting a
50+
certain percentage of the population, sorted by predicted probability.
4651
4752
Parameters
4853
----------
4954
probs : Dict[str, np.ndarray]
50-
Dictionary mapping a label or group name to an array of predicted
51-
probabilities for the positive class.
55+
A dictionary mapping model or dataset names to 1-D numpy arrays of
56+
predicted probabilities.
5257
reals : Union[np.ndarray, Dict[str, np.ndarray]]
53-
Ground-truth binary labels (0/1) as a single array, or a dictionary
54-
mapping the same label/group keys used in ``probs`` to arrays of
55-
ground-truth labels.
58+
The true binary labels (0 or 1).
5659
by : float, optional
57-
Resolution for probability thresholds when computing the curve
58-
(step size). Default is 0.01.
60+
The step size for the probability thresholds. Defaults to 0.01.
5961
stratified_by : Sequence[str], optional
60-
Sequence of column names to stratify the performance data by.
61-
Default is ["probability_threshold"].
62+
Variables for stratification. Defaults to ``["probability_threshold"]``.
6263
size : int, optional
63-
Plot size in pixels (width and height). Default is 600.
64+
The width and height of the plot in pixels. Defaults to 600.
6465
color_values : List[str], optional
65-
List of color hex strings to use for the plotted lines. If not
66-
provided, a default palette is used.
66+
A list of hex color strings for the plot lines.
6767
6868
Returns
6969
-------
7070
Figure
71-
A Plotly ``Figure`` containing the Gains curve(s).
72-
73-
Notes
74-
-----
75-
The function delegates computation and plotting to
76-
``_create_rtichoke_plotly_curve_binary`` and returns the resulting
77-
Plotly figure.
71+
A Plotly ``Figure`` object representing the Gains curve.
7872
"""
7973
fig = _create_rtichoke_plotly_curve_binary(
8074
probs,
@@ -93,30 +87,27 @@ def plot_gains_curve(
9387
stratified_by: Sequence[str] = ["probability_threshold"],
9488
size: int = 600,
9589
) -> Figure:
96-
"""Plot Gains curve from performance data.
90+
"""Plots a Gains curve from pre-computed performance data.
91+
92+
This function is useful for plotting a Gains curve directly from a
93+
DataFrame that already contains the necessary performance metrics.
9794
9895
Parameters
9996
----------
10097
performance_data : pl.DataFrame
101-
A Polars DataFrame containing performance metrics for the Gains curve.
102-
Expected columns include (but may not be limited to)
103-
``probability_threshold`` and gains-related metrics, plus any
104-
stratification columns.
98+
A Polars DataFrame with performance metrics. It must include columns
99+
for the percentage of the population targeted and the corresponding
100+
gain, along with any stratification variables.
105101
stratified_by : Sequence[str], optional
106-
Sequence of column names used for stratification in the
107-
``performance_data``. Default is ["probability_threshold"].
102+
The columns in `performance_data` used for stratification. Defaults to
103+
``["probability_threshold"]``.
108104
size : int, optional
109-
Plot size in pixels (width and height). Default is 600.
105+
The width and height of the plot in pixels. Defaults to 600.
110106
111107
Returns
112108
-------
113109
Figure
114-
A Plotly ``Figure`` containing the Gains plot.
115-
116-
Notes
117-
-----
118-
This function wraps ``_plot_rtichoke_curve_binary`` to produce a
119-
ready-to-render Plotly figure from precomputed performance data.
110+
A Plotly ``Figure`` object representing the Gains curve.
120111
"""
121112
fig = _plot_rtichoke_curve_binary(
122113
performance_data,
@@ -163,7 +154,37 @@ def create_gains_curve_times(
163154
"#585123",
164155
],
165156
) -> Figure:
166-
"""Create time-dependent Lift Curve."""
157+
"""Creates a time-dependent Gains curve.
158+
159+
Generates a Gains curve for time-to-event models, which is evaluated at
160+
specified time horizons and handles censored data and competing risks.
161+
162+
Parameters
163+
----------
164+
probs : Dict[str, np.ndarray]
165+
A dictionary of predicted probabilities.
166+
reals : Union[np.ndarray, Dict[str, np.ndarray]]
167+
The true event statuses.
168+
times : Union[np.ndarray, Dict[str, np.ndarray]]
169+
The event or censoring times.
170+
fixed_time_horizons : list[float]
171+
A list of time points for performance evaluation.
172+
heuristics_sets : list[Dict], optional
173+
Specifies how to handle censored data and competing events.
174+
by : float, optional
175+
The step size for probability thresholds. Defaults to 0.01.
176+
stratified_by : Sequence[str], optional
177+
Variables for stratification. Defaults to ``["probability_threshold"]``.
178+
size : int, optional
179+
The width and height of the plot in pixels. Defaults to 600.
180+
color_values : List[str], optional
181+
A list of hex color strings for the plot lines.
182+
183+
Returns
184+
-------
185+
Figure
186+
A Plotly ``Figure`` object for the time-dependent Gains curve.
187+
"""
167188

168189
fig = _create_rtichoke_plotly_curve_times(
169190
probs,

0 commit comments

Comments
 (0)