diff --git a/README.md b/README.md index 4275ebf0..1fde93b5 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,9 @@ PyEarthTools is a Python framework, containing modules for loading data; pre-processing, normalising and standardising data; defining machine learning (ML) models; training ML models; performing inference with ML models; and evaluating ML models. It contains specialised support for weather and climate data sources and models. It has an emphasis on reproducibility, shareable pipelines, and human-readable low-code pipeline definition. -Source Code: [github.com/ACCESS-Community-Hub/PyEarthTools](https://github.com/ACCESS-Community-Hub/PyEarthTools) -Documentation: [pyearthtools.readthedocs.io](https://pyearthtools.readthedocs.io) -Tutorial Gallery: [available here](https://pyearthtools.readthedocs.io/en/latest/notebooks/Gallery.html) +Source Code: [github.com/ACCESS-Community-Hub/PyEarthTools](https://github.com/ACCESS-Community-Hub/PyEarthTools) +Documentation: [pyearthtools.readthedocs.io](https://pyearthtools.readthedocs.io) +Tutorial Gallery: [available here](https://pyearthtools.readthedocs.io/en/latest/notebooks/Gallery.html) > [!NOTE] diff --git a/docs/data/data_api.md b/docs/data/data_api.md index 1ebc8516..f81adfce 100644 --- a/docs/data/data_api.md +++ b/docs/data/data_api.md @@ -79,9 +79,9 @@ :members: .. autoclass:: pyearthtools.data.indexes.IntakeIndex - :members: + :members: .. autoclass:: pyearthtools.data.indexes.IntakeIndexCache - :members: + :members: -``` \ No newline at end of file +``` diff --git a/docs/data/data_index.md b/docs/data/data_index.md index 5193acad..0cdf17ab 100644 --- a/docs/data/data_index.md +++ b/docs/data/data_index.md @@ -11,7 +11,7 @@ The use of the data package within PyEarthTools includes: - Loading that data into memory for efficient use in machine learning - Performing scientific operations on that data as part of data pre-processing -These tasks are aided by the API presented by the data package. Users looking for "how-to guides" or worked examples should review the [Tutorial Gallery](https://pyearthtools.readthedocs.io/en/latest/notebooks/Gallery.html). +These tasks are aided by the API presented by the data package. Users looking for "how-to guides" or worked examples should review the [Tutorial Gallery](https://pyearthtools.readthedocs.io/en/latest/notebooks/Gallery.html). The rest of this page contains reference information for the components of the Data package. The entire data API docs can be viewed at [Data API](data_api.md) @@ -38,6 +38,3 @@ The rest of this page contains reference information for the components of the D | | | - [CachingForecastIndex](data_api.md#pyearthtools.data.indexes.CachingForecastIndex) | | | | - [IntakeIndex](data_api.md#pyearthtools.data.indexes.IntakeIndex) | | | | - [IntakeIndexCache](data_api.md#pyearthtools.data.indexes.IntakeIndexCache) | - - - diff --git a/docs/index.md b/docs/index.md index 25447d00..71d82f49 100644 --- a/docs/index.md +++ b/docs/index.md @@ -7,9 +7,9 @@ PyEarthTools is a Python framework, containing modules for loading data; pre-processing, normalising and standardising data; defining machine learning (ML) models; training ML models; performing inference with ML models; and evaluating ML models. It contains specialised support for weather and climate data sources and models. It has an emphasis on reproducibility, shareable pipelines, and human-readable low-code pipeline definition. -Source Code: [github.com/ACCESS-Community-Hub/PyEarthTools](https://github.com/ACCESS-Community-Hub/PyEarthTools) -Documentation: [pyearthtools.readthedocs.io](https://pyearthtools.readthedocs.io) -Tutorial Gallery: [available here](https://pyearthtools.readthedocs.io/en/latest/notebooks/Gallery.html) +Source Code: [github.com/ACCESS-Community-Hub/PyEarthTools](https://github.com/ACCESS-Community-Hub/PyEarthTools) +Documentation: [pyearthtools.readthedocs.io](https://pyearthtools.readthedocs.io) +Tutorial Gallery: [available here](https://pyearthtools.readthedocs.io/en/latest/notebooks/Gallery.html) > [!NOTE] diff --git a/notebooks/Gallery.ipynb b/notebooks/Gallery.ipynb index fb826ed8..bd37d333 100644 --- a/notebooks/Gallery.ipynb +++ b/notebooks/Gallery.ipynb @@ -10,12 +10,26 @@ "(Please note - many of these tutorials are being migrated from a previous code version. They are being re-tested and should be corrected in the coming days). Each tutorial will be marked with its last-tested date to make it clear what state the notebooks are in. Testing is done at NCI with a data archive already established. Some notebooks also draw data from cloud hosted data sources. A subsequent update effort will make it clearer which notebooks rely on an on-disk catalogue and which ones run from cloud sources." ] }, + { + "cell_type": "markdown", + "id": "7b089a07-cb3a-4edb-9bd4-3ffdc60b2d86", + "metadata": {}, + "source": [ + "## Demonstrations\n", + "\n", + "These notebooks demonstrate some useful things that can be done with little setup effort, highlighting major capabilities and giving an illustration of the main functionality of the package.\n", + "\n", + "- [Make a weather prediction with FourCastNeXt](./demo/FourCastNeXt_Inference.ipynb) (working as at 7/5/2025)\n" + ] + }, { "cell_type": "markdown", "id": "e3fa9f72-2f10-4200-963c-2cf06e9e6487", "metadata": {}, "source": [ - "## PyEarthTools - Tutorial Examples" + "## PyEarthTools - Tutorial Examples\n", + "\n", + "These notebooks start with the basics and work up towards more complex examples, showing how to work with the classes and functions within the package to achieve objectives." ] }, { @@ -37,7 +51,9 @@ "id": "34ae92a9-a5d7-4465-945b-092228ad358f", "metadata": {}, "source": [ - "## PyEarthTools - Data Module" + "## PyEarthTools - Data Module\n", + "\n", + "These notebooks demonstrate how to work with some of the specific functionality within the data module that users may wish to know about" ] }, { @@ -61,7 +77,9 @@ "id": "2a1824da-0fb6-4e9b-8382-957cf26effe2", "metadata": {}, "source": [ - "## PyEarthTools - Pipeline Module" + "## PyEarthTools - Pipeline Module\n", + "\n", + "These notebooks demonstrate the concepts included in the `pipeline` modules, which users may need to construct more complex data processing logic for multi-modal models." ] }, { @@ -100,7 +118,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.9" + "version": "3.11.7" } }, "nbformat": 4, diff --git a/notebooks/demo/FourCastNeXt_Inference.ipynb b/notebooks/demo/FourCastNeXt_Inference.ipynb new file mode 100644 index 00000000..6cfcec32 --- /dev/null +++ b/notebooks/demo/FourCastNeXt_Inference.ipynb @@ -0,0 +1,1330 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b400e532-c7dd-41df-b802-dcf45f70c9d7", + "metadata": {}, + "source": [ + "# FourCastNeXt Inference Demonstration" + ] + }, + { + "cell_type": "markdown", + "id": "f8484971-a92e-47a7-9e98-97481be6e747", + "metadata": {}, + "source": [ + "!Note! This model is not identical to the model from the paper. This version uses a reduced set of variables, has a slightly different grid definition, and uses a simpler training strategy which is easier to explain.\n", + "\n", + "The goal is eventually to provide a version which reproduces the results of the paper, and also a modified version which can support some specific use cases.\n", + "\n", + "Trained models are typically \"registered\", in similar fashion to the data catalogue, so they can be easily accessed by name using a common API. Models can be accessed either by registered name, or using a lower-level API interaction. For models under development where there may be many different checkpoint files and versions to choose from, it may be preferable to supply various configuration overrides to easily switch between experiment runs.\n", + "\n", + "Model inferencing can be done:\n", + " 1. Inside a Jupyter notebook, as Python code\n", + " 2. From the command-line or in a Jupyter notebook using the command-line execution magic, leveraging \"Hydra\" for experiment tracking\n", + " 3. Interactively from the command-line using the 'pet predict' command which will ask the user about data and configuration preferences\n", + " 4. Using a supercomputer job scheduler to submit training jobs as part of a queuing system\n", + "\n", + "This notebook will start with the first approach to illuminate the process, but for those involved in research into new model archictures, (2) and (3) offer more flexibility for validating multiple models at once, and for operating across multiple HPC nodes or cloud instances to accelerate training and discovery.\n", + "\n", + "Model researchers will also be interested in validating trained models to ensure they perform well on a scorecard of metrics, and not limit their analysis to loss function fitting performance. Standardised validation will be added to PyEarthTools in coming months and validation is beyond the scope of this notebook." + ] + }, + { + "cell_type": "markdown", + "id": "b6b1f850-425a-4388-bf7c-64706ce8996a", + "metadata": {}, + "source": [ + "## Inferencing full model" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "84401a84-f733-4a29-b242-ee2d4e13013d", + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "# %%capture is included so that debugging output isn't presented in the docs,\n", + "# users may find the output helpful\n", + "\n", + "# Note - this would more commonly be loaded from a supplied configuration file, but this approach spells \n", + "# out how to do it using code if wanted, and users can experiment with modifying the pipeline more easily.\n", + "\n", + "import pyearthtools.pipeline\n", + "import site_archive_nci\n", + "import fourcastnext\n", + "data_pipeline = pyearthtools.pipeline.Pipeline(\n", + " pyearthtools.data.archive.ERA5([\"msl\", \"10u\", \"10v\", \"2t\"]),\n", + " pyearthtools.data.transforms.coordinates.StandardLongitude(type=\"-180-180\"),\n", + " fourcastnext.CropToRectangle(),\n", + " pyearthtools.pipeline.modifications.TemporalRetrieval(\n", + " concat=True, samples=((-6, 1), (6, 2, 6))\n", + " ), \n", + " pyearthtools.pipeline.operations.xarray.conversion.ToNumpy(),\n", + " pyearthtools.pipeline.operations.numpy.reshape.Rearrange(\"c t h w -> t c h w\"),\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "45194bc3-e107-418c-85e0-9af20ff92be5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Pipeline\n",
+ "\tDescription `pyearthtools.pipeline` Data Pipeline\n",
+ "\n",
+ "\n",
+ "\tInitialisation \n",
+ "\t\t exceptions_to_ignore None\n",
+ "\t\t iterator None\n",
+ "\t\t sampler None\n",
+ "\tSteps \n",
+ "\t\t ERA5 {'ERA5': {'level_value': 'None', 'product': "'reanalysis'", 'variables': "['msl', '10u', '10v', '2t']"}}\n",
+ "\t\t coordinates.StandardLongitude {'StandardLongitude': {'longitude_name': "'longitude'", 'type': "'-180-180'"}}\n",
+ "\t\t fourcastnext.CropToRectangle {'CropToRectangle': {'warn': 'True'}}\n",
+ "\t\t idx_modification.TemporalRetrieval {'TemporalRetrieval': {'concat': 'True', 'delta_unit': 'None', 'merge_function': 'None', 'merge_kwargs': 'None', 'samples': '((-6, 1), (6, 2, 6))'}}\n",
+ "\t\t conversion.ToNumpy {'ToNumpy': {'reference_dataset': 'None', 'run_parallel': 'False', 'saved_records': 'None', 'warn': 'True'}}\n",
+ "\t\t reshape.Rearrange {'Rearrange': {'rearrange': "'c t h w -> t c h w'", 'rearrange_kwargs': 'None', 'reverse_rearrange': 'None', 'skip': 'False'}}<xarray.Dataset> Size: 66MB\n", + "Dimensions: (time: 4, latitude: 720, longitude: 1440)\n", + "Coordinates:\n", + " * latitude (latitude) float32 3kB 90.0 89.75 89.5 ... -89.25 -89.5 -89.75\n", + " * longitude (longitude) float32 6kB -180.0 -179.8 -179.5 ... 179.5 179.8\n", + " * time (time) datetime64[ns] 32B 2001-01-01T06:00:00 ... 2001-01-02\n", + "Data variables:\n", + " msl (time, latitude, longitude) float32 17MB 1.032e+05 ... 1.018e+05\n", + " t2m (time, latitude, longitude) float32 17MB 250.6 251.6 ... 243.4\n", + " u10 (time, latitude, longitude) float32 17MB -1.14 ... -0.04982\n", + " v10 (time, latitude, longitude) float32 17MB -2.262 -1.345 ... 1.987\n", + "Attributes:\n", + " Conventions: CF-1.6\n", + " license: Licence to use Copernicus Products: https://apps.ec...\n", + " summary: ERA5 is the fifth generation ECMWF atmospheric rean...\n", + " pyearthtools_models: Development/FourCastNextRM: 0.1.0\n", + " purpose: Research Use Only.\n", + " contact: For further information or support, contact the Dat...\n", + " crpyearthtools: Generated with `pyearthtools`, a research endeavour...