Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions docs/data-tests/data-freshness-sla.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
title: "data_freshness_sla"
sidebarTitle: "Data Freshness SLA"
---

import AiGenerateTest from '/snippets/ai-generate-test.mdx';


<AiGenerateTest />

`elementary.data_freshness_sla`

Verifies that data in a model was updated before a specified SLA deadline time.

This test checks the maximum timestamp value of a specified column in your data to determine whether the data was refreshed before your deadline. Unlike `freshness_anomalies` (which uses ML-based anomaly detection), this test validates against a fixed, explicit SLA time — making it ideal when you have a concrete contractual or operational deadline.

### Use Case

"Was the data in my model updated before 7 AM Pacific today?"

### Test Logic

1. If today is not a scheduled check day → **PASS** (skip)
2. Query the model for the maximum value of `timestamp_column`
3. If the max timestamp is from today → **PASS** (data is fresh)
4. If the SLA deadline hasn't passed yet → **PASS** (still time)
5. If the max timestamp is from a previous day → **FAIL** (DATA_STALE)
6. If no data exists in the table → **FAIL** (NO_DATA)

### Test configuration

_Required configuration: `timestamp_column`, `sla_time`, `timezone`_

{/* prettier-ignore */}
<pre>
<code>
data_tests:
&nbsp;&nbsp;-- elementary.data_freshness_sla:
&nbsp;&nbsp;&nbsp;&nbsp;arguments:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">timestamp_column: column name</font> # Required - timestamp column to check for freshness
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">sla_time: string</font> # Required - e.g., "07:00", "7am", "2:30pm", "14:30"
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">timezone: string</font> # Required - IANA timezone name, e.g., "America/Los_Angeles"
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">day_of_week: string | array</font> # Optional - Day(s) to check: "Monday" or ["Monday", "Wednesday"]
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">day_of_month: int | array</font> # Optional - Day(s) of month to check: 1 or [1, 15]
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/data-tests/anomaly-detection-configuration/where-expression"><font color="#CD7D55">where_expression: sql expression</font></a> # Optional - filter the data before checking
</code>
</pre>

<RequestExample>

```yml Models
models:
- name: < model name >
data_tests:
- elementary.data_freshness_sla:
arguments:
timestamp_column: < column name > # Required
sla_time: < deadline time > # Required - e.g., "07:00", "7am", "2:30pm"
timezone: < IANA timezone > # Required - e.g., "America/Los_Angeles"
day_of_week: < day or array > # Optional
day_of_month: < day or array > # Optional
where_expression: < sql expression > # Optional
```

```yml Daily check
models:
- name: daily_revenue
data_tests:
- elementary.data_freshness_sla:
arguments:
timestamp_column: updated_at
sla_time: "07:00"
timezone: "America/Los_Angeles"
config:
tags: ["elementary"]
severity: error
```

```yml With filter expression
models:
- name: daily_events
data_tests:
- elementary.data_freshness_sla:
arguments:
timestamp_column: event_timestamp
sla_time: "6am"
timezone: "Europe/Amsterdam"
where_expression: "event_type = 'completed'"
config:
tags: ["elementary"]
```

```yml Weekly - only Mondays
models:
- name: weekly_report_data
data_tests:
- elementary.data_freshness_sla:
arguments:
timestamp_column: report_date
sla_time: "09:00"
timezone: "Asia/Tokyo"
day_of_week: ["Monday"]
config:
tags: ["elementary"]
```

</RequestExample>

### Features

- **Data-level freshness**: Checks actual data timestamps, not just pipeline execution time
- **Flexible time formats**: Supports `"07:00"`, `"7am"`, `"2:30pm"`, `"14:30"`, and other common formats
- **IANA timezone support**: Uses standard timezone names like `"America/Los_Angeles"`, `"Europe/Amsterdam"`, etc.
- **Automatic DST handling**: Uses `pytz` for timezone conversions with automatic daylight saving time handling
- **Database-agnostic**: All timezone logic happens at compile time
- **Schedule filters**: Optional `day_of_week` and `day_of_month` parameters to check only specific days
- **Filter support**: Use `where_expression` to check freshness of a specific subset of data

### Parameters

| Parameter | Required | Description |
| ------------------ | -------- | -------------------------------------------------------------- |
| `timestamp_column` | Yes | Column name containing timestamps to check for freshness |
| `sla_time` | Yes | Deadline time (e.g., `"07:00"`, `"7am"`, `"2:30pm"`) |
| `timezone` | Yes | IANA timezone name (e.g., `"America/Los_Angeles"`) |
| `day_of_week` | No | Day(s) to check: `"Monday"` or `["Monday", "Wednesday"]` |
| `day_of_month` | No | Day(s) of month to check: `1` or `[1, 15]` |
| `where_expression` | No | SQL expression to filter the data before checking |

### Comparison with other freshness tests

| Feature | `data_freshness_sla` | `freshness_anomalies` | `execution_sla` |
| --- | --- | --- | --- |
| What it checks | Data timestamps | Data timestamps | Pipeline run time |
| Detection method | Fixed SLA deadline | ML-based anomaly detection | Fixed SLA deadline |
| Best for | Contractual/operational deadlines | Detecting unexpected delays | Pipeline execution deadlines |
| Works with sources | Yes | Yes | No (models only) |

### Notes

- The `timestamp_column` values are assumed to be in **UTC** (or timezone-naive timestamps that represent UTC). If your data stores local timestamps, the comparison may be incorrect.
- If both `day_of_week` and `day_of_month` are set, the test uses OR logic (checks if either matches)
- The test passes if the SLA deadline hasn't been reached yet, giving your data time to be updated
152 changes: 152 additions & 0 deletions docs/data-tests/volume-threshold.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
title: "volume_threshold"
sidebarTitle: "Volume Threshold"
---

import AiGenerateTest from '/snippets/ai-generate-test.mdx';


<AiGenerateTest />

`elementary.volume_threshold`

Monitors row count changes between time buckets using configurable percentage thresholds with multiple severity levels.

Unlike `volume_anomalies` (which uses ML-based anomaly detection to determine what's "normal"), this test lets you define explicit percentage thresholds for warnings and errors — giving you precise control over when to be alerted. It uses Elementary's metric caching infrastructure to avoid recalculating row counts for buckets that have already been computed.

### Use Case

"Alert me if my table's row count drops or spikes by more than 10% compared to the previous period."

### Test Logic

1. Collect row count metrics per time bucket (using Elementary's incremental metric caching)
2. Compare the most recent completed bucket against the previous bucket
3. Calculate the percentage change between the two
4. If the previous bucket has fewer rows than `min_row_count` → **PASS** (insufficient baseline)
5. If the absolute change exceeds `error_threshold_percent` → **ERROR**
6. If the absolute change exceeds `warn_threshold_percent` → **WARN**
7. Otherwise → **PASS**

### Test configuration

_Required configuration: `timestamp_column`_

{/* prettier-ignore */}
<pre>
<code>
data_tests:
&nbsp;&nbsp;-- elementary.volume_threshold:
&nbsp;&nbsp;&nbsp;&nbsp;arguments:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/data-tests/anomaly-detection-configuration/timestamp-column"><font color="#CD7D55">timestamp_column: column name</font></a> # Required
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">warn_threshold_percent: int</font> # Optional - default: 5
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">error_threshold_percent: int</font> # Optional - default: 10
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">direction: [both | spike | drop]</font> # Optional - default: both
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/data-tests/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">time_bucket:</font></a> # Optional
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/data-tests/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">period: [hour | day | week | month]</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/data-tests/anomaly-detection-configuration/time-bucket"><font color="#CD7D55">count: int</font></a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/data-tests/anomaly-detection-configuration/where-expression"><font color="#CD7D55">where_expression: sql expression</font></a> # Optional
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">days_back: int</font> # Optional - default: 14
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">backfill_days: int</font> # Optional - default: 2
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color="#CD7D55">min_row_count: int</font> # Optional - default: 100
</code>
</pre>

<RequestExample>

```yml Models
models:
- name: < model name >
data_tests:
- elementary.volume_threshold:
arguments:
timestamp_column: < column name > # Required
warn_threshold_percent: < int > # Optional - default: 5
error_threshold_percent: < int > # Optional - default: 10
direction: < both | spike | drop > # Optional - default: both
```

```yml Default thresholds (5% warn, 10% error)
models:
- name: daily_orders
data_tests:
- elementary.volume_threshold:
arguments:
timestamp_column: created_at
config:
tags: ["elementary"]
```

```yml Custom thresholds
models:
- name: critical_transactions
data_tests:
- elementary.volume_threshold:
arguments:
timestamp_column: transaction_time
warn_threshold_percent: 3
error_threshold_percent: 8
direction: drop
config:
tags: ["elementary"]
severity: error
```

```yml With time bucket and filter
models:
- name: hourly_events
data_tests:
- elementary.volume_threshold:
arguments:
timestamp_column: event_timestamp
warn_threshold_percent: 10
error_threshold_percent: 25
direction: both
time_bucket:
period: hour
count: 1
where_expression: "event_type = 'purchase'"
config:
tags: ["elementary"]
```

</RequestExample>

### Features

- **Dual severity levels**: Separate thresholds for warnings and errors, giving you graduated alerting
- **Directional monitoring**: Choose to monitor `both` directions, only `spike` (increases), or only `drop` (decreases)
- **Incremental metric caching**: Uses Elementary's `data_monitoring_metrics` table to avoid recalculating row counts for previously computed time buckets
- **Minimum baseline protection**: The `min_row_count` parameter prevents false alerts when the baseline is too small
- **Configurable time buckets**: Works with hourly, daily, weekly, or monthly buckets

### Parameters

| Parameter | Required | Default | Description |
| ------------------------- | -------- | ------- | ---------------------------------------------------------------------------- |
| `timestamp_column` | Yes | — | Column to determine time periods |
| `warn_threshold_percent` | No | 5 | Percentage change that triggers a warning |
| `error_threshold_percent` | No | 10 | Percentage change that triggers an error |
| `direction` | No | `both` | Direction to monitor: `both`, `spike`, or `drop` |
| `time_bucket` | No | `{period: day, count: 1}` | Time bucket configuration |
| `where_expression` | No | — | SQL expression to filter the data |
| `days_back` | No | 14 | Days of metric history to retain |
| `backfill_days` | No | 2 | Days to recalculate on each run |
| `min_row_count` | No | 100 | Minimum rows in the previous bucket required to trigger the check |

### Comparison with volume_anomalies

| Feature | `volume_threshold` | `volume_anomalies` |
| --- | --- | --- |
| Detection method | Fixed percentage thresholds | ML-based anomaly detection |
| Severity levels | Dual (warn + error) | Single (pass/fail) |
| Best for | Known acceptable ranges | Unknown/variable patterns |
| Configuration | Explicit thresholds | Sensitivity tuning |
| Baseline | Previous bucket | Training period average |

### Notes

- The `warn_threshold_percent` must be less than or equal to `error_threshold_percent`
- The test uses Elementary's metric caching infrastructure — row counts for previously computed time buckets are reused across runs
- If the previous bucket has fewer rows than `min_row_count`, the test passes (insufficient data for a meaningful comparison)
- The test only evaluates completed time buckets
4 changes: 3 additions & 1 deletion docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,9 @@
"group": "Other Tests",
"pages": [
"data-tests/python-tests",
"data-tests/execution-sla"
"data-tests/execution-sla",
"data-tests/data-freshness-sla",
"data-tests/volume-threshold"
]
},
{
Expand Down
Loading