-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Did you come across smoke tests in the Turing way? https://book.the-turing-way.org/reproducible-research/testing/testing-smoketest/
They are typically the first test that you run and are usually very fast. They test that your code can "run end to end" and not much more i.e they don't care if results are correct - just that it runs. If they fail then all other tests are cancelled.
For analysis code I think this is very similar to your functional tests, but it would purposely use a very small dummy dataset for speed to make sure it is fast and perform the most basic asserts at the end to just test it ran. If that fails then the remaining tests would be abandoned.
I asked GPT5.2 to convert your functional example to a smoke test where all is does is check that its produced the final dict. Note I think you could also consider monkey patching (#16 ) this so that there is no real write and read from the users directory. See below.
We need a way to run this so that if it fails then the remaining tests are ignored. If we assume your smoke test is in test_smoke.py think this approach would work:
pytest test_smoke.py && pytest --ignore=test_smoke.pyimport pandas as pd
# from waitingtimes.patient_analysis import (
# import_patient_data, calculate_wait_times, summary_stats
# )
def test_workflow_smoke_end_product(tmp_path):
"""Smoke: end-to-end workflow produces the expected final output shape."""
# Create test data
test_data = pd.DataFrame(
{
"PATIENT_ID": ["p1", "p2", "p3"],
"ARRIVAL_DATE": ["2024-01-01", "2024-01-01", "2024-01-02"],
"ARRIVAL_TIME": ["0800", "0930", "1015"],
"SERVICE_DATE": ["2024-01-01", "2024-01-01", "2024-01-02"],
"SERVICE_TIME": ["0830", "1000", "1045"],
}
)
# Write test CSV
csv_path = tmp_path / "patients.csv"
test_data.to_csv(csv_path, index=False)
# Run complete workflow
df = import_patient_data(csv_path)
df = calculate_wait_times(df)
stats = summary_stats(df["waittime"])
# Final “end product” check only
# TM note I think you could also just use `assert stats is not None` for a smoke test.
assert isinstance(stats, dict)
assert {"mean", "std_dev", "ci_lower", "ci_upper"}.issubset(stats)Again this could just be a caveat or note? It useful for simulation testing imo.