diff --git a/CHANGELOG.md b/CHANGELOG.md index 0176bced1..68d3d6b92 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -51,9 +51,12 @@ If upgrading from v2.x, see the [v3.0.0 release notes](https://github.com/flixOp Until here --> -## [5.1.0] - Upcoming +## [6.0.0] - Upcoming -**Summary**: Time-series clustering for faster optimization with configurable storage behavior across typical periods. Improved weights API with always-normalized scenario weights. +**Summary**: Major release introducing time-series clustering with storage inter-cluster linking, the new `fxplot` accessor for universal xarray plotting, and removal of deprecated v5.0 classes. Includes configurable storage behavior across typical periods and improved weights API. + +!!! warning "Breaking Changes" + This release removes `ClusteredOptimization` and `ClusteringParameters` which were deprecated in v5.0.0. Use `flow_system.transform.cluster()` instead. See [Migration](#migration-from-clusteredoptimization) below. ### ✨ Added @@ -121,6 +124,44 @@ charge_state = fs_expanded.solution['SeasonalPit|charge_state'] Use `'cyclic'` for short-term storage like batteries or hot water tanks where only daily patterns matter. Use `'independent'` for quick estimates when storage behavior isn't critical. +**FXPlot Accessor**: New global xarray accessors for universal plotting with automatic faceting and smart dimension handling. Works on any xarray Dataset, not just flixopt results. + +```python +import flixopt as fx # Registers accessors automatically + +# Plot any xarray Dataset with automatic faceting +dataset.fxplot.bar(x='component') +dataset.fxplot.area(x='time') +dataset.fxplot.heatmap(x='time', y='component') +dataset.fxplot.line(x='time', facet_col='scenario') + +# DataArray support +data_array.fxplot.line() + +# Statistics transformations +dataset.fxstats.to_duration_curve() +``` + +**Available Plot Methods**: + +| Method | Description | +|--------|-------------| +| `.fxplot.bar()` | Grouped bar charts | +| `.fxplot.stacked_bar()` | Stacked bar charts | +| `.fxplot.line()` | Line charts with faceting | +| `.fxplot.area()` | Stacked area charts | +| `.fxplot.heatmap()` | Heatmap visualizations | +| `.fxplot.scatter()` | Scatter plots | +| `.fxplot.pie()` | Pie charts with faceting | +| `.fxstats.to_duration_curve()` | Transform to duration curve format | + +**Key Features**: + +- **Auto-faceting**: Automatically assigns extra dimensions (period, scenario, cluster) to `facet_col`, `facet_row`, or `animation_frame` +- **Smart x-axis**: Intelligently selects x dimension based on priority (time > duration > period > scenario) +- **Universal**: Works on any xarray Dataset/DataArray, not limited to flixopt +- **Configurable**: Customize via `CONFIG.Plotting` (colorscales, facet columns, line shapes) + ### 💥 Breaking Changes - `FlowSystem.scenario_weights` are now always normalized to sum to 1 when set (including after `.sel()` subsetting) @@ -132,12 +173,94 @@ charge_state = fs_expanded.solution['SeasonalPit|charge_state'] ### 🗑️ Deprecated +The following items are deprecated and will be removed in **v7.0.0**: + +**Classes** (use FlowSystem methods instead): + +- `Optimization` class → Use `flow_system.optimize(solver)` +- `SegmentedOptimization` class → Use `flow_system.optimize.rolling_horizon()` +- `Results` class → Use `flow_system.solution` and `flow_system.statistics` +- `SegmentedResults` class → Use segment FlowSystems directly + +**FlowSystem methods** (use `transform` or `topology` accessor instead): + +- `flow_system.sel()` → Use `flow_system.transform.sel()` +- `flow_system.isel()` → Use `flow_system.transform.isel()` +- `flow_system.resample()` → Use `flow_system.transform.resample()` +- `flow_system.plot_network()` → Use `flow_system.topology.plot()` +- `flow_system.start_network_app()` → Use `flow_system.topology.start_app()` +- `flow_system.stop_network_app()` → Use `flow_system.topology.stop_app()` +- `flow_system.network_infos()` → Use `flow_system.topology.infos()` + +**Parameters:** + - `normalize_weights` parameter in `create_model()`, `build_model()`, `optimize()` +**Topology method name simplifications** (old names still work with deprecation warnings, removal in v7.0.0): + +| Old (v5.x) | New (v6.0.0) | +|------------|--------------| +| `topology.plot_network()` | `topology.plot()` | +| `topology.start_network_app()` | `topology.start_app()` | +| `topology.stop_network_app()` | `topology.stop_app()` | +| `topology.network_infos()` | `topology.infos()` | + +Note: `topology.plot()` now renders a Sankey diagram. The old PyVis visualization is available via `topology.plot_legacy()`. + +### 🔥 Removed + +**Clustering classes removed** (deprecated in v5.0.0): + +- `ClusteredOptimization` class - Use `flow_system.transform.cluster()` then `optimize()` +- `ClusteringParameters` class - Parameters are now passed directly to `transform.cluster()` +- `flixopt/clustering.py` module - Restructured to `flixopt/clustering/` package with new classes + +#### Migration from ClusteredOptimization + +=== "v5.x (Old - No longer works)" + ```python + from flixopt import ClusteredOptimization, ClusteringParameters + + params = ClusteringParameters(hours_per_period=24, nr_of_periods=8) + calc = ClusteredOptimization('model', flow_system, params) + calc.do_modeling_and_solve(solver) + results = calc.results + ``` + +=== "v6.0.0 (New)" + ```python + # Cluster using transform accessor + fs_clustered = flow_system.transform.cluster( + n_clusters=8, # was: nr_of_periods + cluster_duration='1D', # was: hours_per_period=24 + ) + fs_clustered.optimize(solver) + + # Results on the clustered FlowSystem + costs = fs_clustered.solution['costs'].item() + + # Expand back to full resolution if needed + fs_expanded = fs_clustered.transform.expand_solution() + ``` + ### 🐛 Fixed - `temporal_weight` and `sum_temporal()` now use consistent implementation +### 📝 Docs + +**New Documentation Pages:** + +- [Time-Series Clustering Guide](https://flixopt.github.io/flixopt/latest/user-guide/optimization/clustering/) - Comprehensive guide to clustering workflows + +**New Jupyter Notebooks:** + +- **08c-clustering.ipynb** - Introduction to time-series clustering +- **08c2-clustering-storage-modes.ipynb** - Comparison of all 4 storage cluster modes +- **08d-clustering-multiperiod.ipynb** - Clustering with periods and scenarios +- **08e-clustering-internals.ipynb** - Understanding clustering internals +- **fxplot_accessor_demo.ipynb** - Demo of the new fxplot accessor + ### 👷 Development **New Test Suites for Clustering**: @@ -147,6 +270,11 @@ charge_state = fs_expanded.solution['SeasonalPit|charge_state'] - `TestMultiPeriodClustering`: Tests for clustering with periods and scenarios dimensions - `TestPeakSelection`: Tests for `time_series_for_high_peaks` and `time_series_for_low_peaks` parameters +**New Test Suites for Other Features**: + +- `test_clustering_io.py` - Tests for clustering serialization roundtrip +- `test_sel_isel_single_selection.py` - Tests for transform selection methods + --- ## [5.0.4] - 2026-01-05 diff --git a/docs/notebooks/01-quickstart.ipynb b/docs/notebooks/01-quickstart.ipynb index 9e6850214..1500bce77 100644 --- a/docs/notebooks/01-quickstart.ipynb +++ b/docs/notebooks/01-quickstart.ipynb @@ -34,6 +34,7 @@ "outputs": [], "source": [ "import pandas as pd\n", + "import plotly.express as px\n", "import xarray as xr\n", "\n", "import flixopt as fx\n", @@ -58,7 +59,8 @@ "metadata": {}, "outputs": [], "source": [ - "timesteps = pd.date_range('2024-01-15 08:00', periods=4, freq='h')" + "timesteps = pd.date_range('2024-01-15 08:00', periods=4, freq='h')\n", + "print(f'Optimizing from {timesteps[0]} to {timesteps[-1]}')" ] }, { @@ -86,8 +88,9 @@ " name='Heat Demand [kW]',\n", ")\n", "\n", - "# Visualize the demand with fxplot accessor\n", - "heat_demand.to_dataset().fxplot.bar(title='Heat Demand')" + "# Visualize the demand with plotly\n", + "fig = px.bar(x=heat_demand.time.values, y=heat_demand.values, labels={'x': 'Time', 'y': 'Heat Demand [kW]'})\n", + "fig" ] }, { @@ -200,18 +203,14 @@ "metadata": {}, "outputs": [], "source": [ + "total_costs = flow_system.solution['costs'].item()\n", "total_heat = float(heat_demand.sum())\n", "gas_consumed = total_heat / 0.9 # Account for boiler efficiency\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'Total heat demand [kWh]': total_heat,\n", - " 'Gas consumed [kWh]': gas_consumed,\n", - " 'Total costs [EUR]': flow_system.solution['costs'].item(),\n", - " 'Average cost [EUR/kWh_heat]': flow_system.solution['costs'].item() / total_heat,\n", - " },\n", - " index=['Value'],\n", - ").T" + "print(f'Total heat demand: {total_heat:.1f} kWh')\n", + "print(f'Gas consumed: {gas_consumed:.1f} kWh')\n", + "print(f'Total costs: {total_costs:.2f} €')\n", + "print(f'Average cost: {total_costs / total_heat:.3f} €/kWh_heat')" ] }, { diff --git a/docs/notebooks/02-heat-system.ipynb b/docs/notebooks/02-heat-system.ipynb index 9d71dc69a..3ff933ec3 100644 --- a/docs/notebooks/02-heat-system.ipynb +++ b/docs/notebooks/02-heat-system.ipynb @@ -32,7 +32,9 @@ "metadata": {}, "outputs": [], "source": [ + "import numpy as np\n", "import pandas as pd\n", + "import plotly.express as px\n", "import xarray as xr\n", "\n", "import flixopt as fx\n", @@ -57,12 +59,33 @@ "metadata": {}, "outputs": [], "source": [ - "from data.tutorial_data import get_heat_system_data\n", + "# One week, hourly resolution\n", + "timesteps = pd.date_range('2024-01-15', periods=168, freq='h')\n", "\n", - "data = get_heat_system_data()\n", - "timesteps = data['timesteps']\n", - "heat_demand = data['heat_demand']\n", - "gas_price = data['gas_price']" + "# Create realistic office heat demand pattern\n", + "hours = np.arange(168)\n", + "hour_of_day = hours % 24\n", + "day_of_week = (hours // 24) % 7\n", + "\n", + "# Base demand pattern (kW)\n", + "base_demand = np.where(\n", + " (hour_of_day >= 7) & (hour_of_day <= 18), # Office hours\n", + " 80, # Daytime\n", + " 30, # Night setback\n", + ")\n", + "\n", + "# Reduce on weekends (days 5, 6)\n", + "weekend_factor = np.where(day_of_week >= 5, 0.5, 1.0)\n", + "heat_demand = base_demand * weekend_factor\n", + "\n", + "# Add some random variation\n", + "np.random.seed(42)\n", + "heat_demand = heat_demand + np.random.normal(0, 5, len(heat_demand))\n", + "heat_demand = np.clip(heat_demand, 20, 100)\n", + "\n", + "print(f'Time range: {timesteps[0]} to {timesteps[-1]}')\n", + "print(f'Peak demand: {heat_demand.max():.1f} kW')\n", + "print(f'Total demand: {heat_demand.sum():.0f} kWh')" ] }, { @@ -72,13 +95,15 @@ "metadata": {}, "outputs": [], "source": [ - "# Visualize the demand pattern with fxplot\n", - "demand_ds = xr.Dataset(\n", - " {\n", - " 'Heat Demand [kW]': xr.DataArray(heat_demand, dims=['time'], coords={'time': timesteps}),\n", - " }\n", + "# Visualize the demand pattern with plotly\n", + "demand_series = xr.DataArray(heat_demand, dims=['time'], coords={'time': timesteps}, name='Heat Demand [kW]')\n", + "fig = px.line(\n", + " x=demand_series.time.values,\n", + " y=demand_series.values,\n", + " title='Office Heat Demand Profile',\n", + " labels={'x': 'Time', 'y': 'kW'},\n", ")\n", - "demand_ds.fxplot.line(title='Office Heat Demand Profile')" + "fig" ] }, { @@ -98,13 +123,15 @@ "metadata": {}, "outputs": [], "source": [ - "# Visualize gas price with fxplot\n", - "price_ds = xr.Dataset(\n", - " {\n", - " 'Gas Price [EUR/kWh]': xr.DataArray(gas_price, dims=['time'], coords={'time': timesteps}),\n", - " }\n", + "# Time-of-use gas prices (€/kWh)\n", + "gas_price = np.where(\n", + " (hour_of_day >= 6) & (hour_of_day <= 22),\n", + " 0.08, # Peak: 6am-10pm\n", + " 0.05, # Off-peak: 10pm-6am\n", ")\n", - "price_ds.fxplot.line(title='Gas Price')" + "\n", + "fig = px.line(x=timesteps, y=gas_price, title='Gas Price [€/kWh]', labels={'x': 'Time', 'y': '€/kWh'})\n", + "fig" ] }, { @@ -282,16 +309,12 @@ "metadata": {}, "outputs": [], "source": [ + "total_costs = flow_system.solution['costs'].item()\n", "total_heat = heat_demand.sum()\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'Total operating costs [EUR]': flow_system.solution['costs'].item(),\n", - " 'Total heat delivered [kWh]': total_heat,\n", - " 'Average cost [ct/kWh]': flow_system.solution['costs'].item() / total_heat * 100,\n", - " },\n", - " index=['Value'],\n", - ").T" + "print(f'Total operating costs: {total_costs:.2f} €')\n", + "print(f'Total heat delivered: {total_heat:.0f} kWh')\n", + "print(f'Average cost: {total_costs / total_heat * 100:.2f} ct/kWh')" ] }, { diff --git a/docs/notebooks/03-investment-optimization.ipynb b/docs/notebooks/03-investment-optimization.ipynb index 9ddb17df4..349c84ccf 100644 --- a/docs/notebooks/03-investment-optimization.ipynb +++ b/docs/notebooks/03-investment-optimization.ipynb @@ -32,7 +32,9 @@ "metadata": {}, "outputs": [], "source": [ + "import numpy as np\n", "import pandas as pd\n", + "import plotly.express as px\n", "import xarray as xr\n", "\n", "import flixopt as fx\n", @@ -82,15 +84,26 @@ "metadata": {}, "outputs": [], "source": [ - "from data.tutorial_data import get_investment_data\n", - "\n", - "data = get_investment_data()\n", - "timesteps = data['timesteps']\n", - "solar_profile = data['solar_profile']\n", - "pool_demand = data['pool_demand']\n", - "GAS_PRICE = data['gas_price']\n", - "SOLAR_COST_WEEKLY = data['solar_cost_per_kw_week']\n", - "TANK_COST_WEEKLY = data['tank_cost_per_kwh_week']" + "# One week in summer, hourly\n", + "timesteps = pd.date_range('2024-07-15', periods=168, freq='h')\n", + "hours = np.arange(168)\n", + "hour_of_day = hours % 24\n", + "\n", + "# Solar radiation profile (kW/m² equivalent, simplified)\n", + "# Peak around noon, zero at night\n", + "solar_profile = np.maximum(0, np.sin((hour_of_day - 6) * np.pi / 12)) * 0.8\n", + "solar_profile = np.where((hour_of_day >= 6) & (hour_of_day <= 20), solar_profile, 0)\n", + "\n", + "# Add some cloud variation\n", + "np.random.seed(42)\n", + "cloud_factor = np.random.uniform(0.6, 1.0, len(timesteps))\n", + "solar_profile = solar_profile * cloud_factor\n", + "\n", + "# Pool operates 8am-10pm, constant demand when open\n", + "pool_demand = np.where((hour_of_day >= 8) & (hour_of_day <= 22), 150, 50) # kW\n", + "\n", + "print(f'Peak solar: {solar_profile.max():.2f} kW/kW_installed')\n", + "print(f'Pool demand: {pool_demand.max():.0f} kW (open), {pool_demand.min():.0f} kW (closed)')" ] }, { @@ -100,14 +113,20 @@ "metadata": {}, "outputs": [], "source": [ - "# Visualize profiles with fxplot\n", + "# Visualize profiles with plotly - using xarray and faceting\n", "profiles = xr.Dataset(\n", " {\n", " 'Solar Profile [kW/kW]': xr.DataArray(solar_profile, dims=['time'], coords={'time': timesteps}),\n", " 'Pool Demand [kW]': xr.DataArray(pool_demand, dims=['time'], coords={'time': timesteps}),\n", " }\n", ")\n", - "profiles.fxplot.line(title='Input Profiles')" + "\n", + "# Convert to long format for faceting\n", + "df = profiles.to_dataframe().reset_index().melt(id_vars='time', var_name='variable', value_name='value')\n", + "fig = px.line(df, x='time', y='value', facet_col='variable', height=300)\n", + "fig.update_yaxes(matches=None, showticklabels=True)\n", + "fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1]))\n", + "fig" ] }, { @@ -121,9 +140,36 @@ ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "id": "8", "metadata": {}, + "outputs": [], + "source": [ + "# Cost parameters\n", + "GAS_PRICE = 0.12 # €/kWh - high gas price makes solar attractive\n", + "\n", + "# Solar collectors: 400 €/kW installed, 20-year lifetime → ~25 €/kW/year annualized\n", + "# (simplified, real calculation would include interest rate)\n", + "SOLAR_COST_PER_KW = 20 # €/kW/year\n", + "\n", + "# Buffer tank: 50 €/kWh capacity, 30-year lifetime → ~2 €/kWh/year\n", + "TANK_COST_PER_KWH = 1.5 # €/kWh/year\n", + "\n", + "# Scale factor: We model 1 week, but costs are annual\n", + "# So we scale investment costs to weekly equivalent\n", + "WEEKS_PER_YEAR = 52\n", + "SOLAR_COST_WEEKLY = SOLAR_COST_PER_KW / WEEKS_PER_YEAR\n", + "TANK_COST_WEEKLY = TANK_COST_PER_KWH / WEEKS_PER_YEAR\n", + "\n", + "print(f'Solar cost: {SOLAR_COST_WEEKLY:.3f} €/kW/week')\n", + "print(f'Tank cost: {TANK_COST_WEEKLY:.4f} €/kWh/week')" + ] + }, + { + "cell_type": "markdown", + "id": "9", + "metadata": {}, "source": [ "## Build the System with Investment Options\n", "\n", @@ -133,7 +179,7 @@ { "cell_type": "code", "execution_count": null, - "id": "9", + "id": "10", "metadata": {}, "outputs": [], "source": [ @@ -204,7 +250,7 @@ }, { "cell_type": "markdown", - "id": "10", + "id": "11", "metadata": {}, "source": [ "## Run Optimization" @@ -213,7 +259,7 @@ { "cell_type": "code", "execution_count": null, - "id": "11", + "id": "12", "metadata": {}, "outputs": [], "source": [ @@ -222,7 +268,7 @@ }, { "cell_type": "markdown", - "id": "12", + "id": "13", "metadata": {}, "source": [ "## Analyze Investment Decisions\n", @@ -233,26 +279,21 @@ { "cell_type": "code", "execution_count": null, - "id": "13", + "id": "14", "metadata": {}, "outputs": [], "source": [ "solar_size = flow_system.statistics.sizes['SolarCollectors(Heat)'].item()\n", "tank_size = flow_system.statistics.sizes['BufferTank'].item()\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'Solar [kW]': solar_size,\n", - " 'Tank [kWh]': tank_size,\n", - " 'Ratio [kWh/kW]': tank_size / solar_size,\n", - " },\n", - " index=['Optimal Size'],\n", - ").T" + "print(\n", + " f'Optimal sizes: Solar {solar_size:.0f} kW, Tank {tank_size:.0f} kWh (ratio: {tank_size / solar_size:.1f} kWh/kW)'\n", + ")" ] }, { "cell_type": "markdown", - "id": "14", + "id": "15", "metadata": {}, "source": [ "### Visualize Sizes" @@ -261,7 +302,7 @@ { "cell_type": "code", "execution_count": null, - "id": "15", + "id": "16", "metadata": {}, "outputs": [], "source": [ @@ -270,7 +311,7 @@ }, { "cell_type": "markdown", - "id": "16", + "id": "17", "metadata": {}, "source": [ "### Cost Breakdown" @@ -279,7 +320,7 @@ { "cell_type": "code", "execution_count": null, - "id": "17", + "id": "18", "metadata": {}, "outputs": [], "source": [ @@ -290,19 +331,14 @@ "tank_invest = tank_size * TANK_COST_WEEKLY\n", "gas_costs = total_costs - solar_invest - tank_invest\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'Solar Investment': {'EUR': solar_invest, '%': solar_invest / total_costs * 100},\n", - " 'Tank Investment': {'EUR': tank_invest, '%': tank_invest / total_costs * 100},\n", - " 'Gas Costs': {'EUR': gas_costs, '%': gas_costs / total_costs * 100},\n", - " 'Total': {'EUR': total_costs, '%': 100.0},\n", - " }\n", + "print(\n", + " f'Weekly costs: Solar {solar_invest:.1f}€ ({solar_invest / total_costs * 100:.0f}%) + Tank {tank_invest:.1f}€ ({tank_invest / total_costs * 100:.0f}%) + Gas {gas_costs:.1f}€ ({gas_costs / total_costs * 100:.0f}%) = {total_costs:.1f}€'\n", ")" ] }, { "cell_type": "markdown", - "id": "18", + "id": "19", "metadata": {}, "source": [ "### System Operation" @@ -311,7 +347,7 @@ { "cell_type": "code", "execution_count": null, - "id": "19", + "id": "20", "metadata": {}, "outputs": [], "source": [ @@ -321,7 +357,7 @@ { "cell_type": "code", "execution_count": null, - "id": "20", + "id": "21", "metadata": {}, "outputs": [], "source": [ @@ -331,7 +367,7 @@ { "cell_type": "code", "execution_count": null, - "id": "21", + "id": "22", "metadata": {}, "outputs": [], "source": [ @@ -340,7 +376,7 @@ }, { "cell_type": "markdown", - "id": "22", + "id": "23", "metadata": {}, "source": [ "## Compare: What if No Solar?\n", @@ -351,30 +387,23 @@ { "cell_type": "code", "execution_count": null, - "id": "23", + "id": "24", "metadata": {}, "outputs": [], "source": [ - "# Gas-only scenario for comparison\n", + "# Gas-only scenario\n", "total_demand = pool_demand.sum()\n", "gas_only_cost = total_demand / 0.92 * GAS_PRICE # All heat from gas boiler\n", - "savings = gas_only_cost - total_costs\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'Gas-only [EUR/week]': gas_only_cost,\n", - " 'With Solar [EUR/week]': total_costs,\n", - " 'Savings [EUR/week]': savings,\n", - " 'Savings [%]': savings / gas_only_cost * 100,\n", - " 'Savings [EUR/year]': savings * 52,\n", - " },\n", - " index=['Value'],\n", - ").T" + "savings = gas_only_cost - total_costs\n", + "print(\n", + " f'Solar saves {savings:.1f}€/week ({savings / gas_only_cost * 100:.0f}%) vs gas-only ({gas_only_cost:.1f}€) → {savings * 52:.0f}€/year'\n", + ")" ] }, { "cell_type": "markdown", - "id": "24", + "id": "25", "metadata": {}, "source": [ "### Energy Flow Sankey\n", @@ -385,7 +414,7 @@ { "cell_type": "code", "execution_count": null, - "id": "25", + "id": "26", "metadata": {}, "outputs": [], "source": [ @@ -394,7 +423,7 @@ }, { "cell_type": "markdown", - "id": "26", + "id": "27", "metadata": {}, "source": [ "## Key Concepts\n", diff --git a/docs/notebooks/04-operational-constraints.ipynb b/docs/notebooks/04-operational-constraints.ipynb index 2f2163886..fbb611d1c 100644 --- a/docs/notebooks/04-operational-constraints.ipynb +++ b/docs/notebooks/04-operational-constraints.ipynb @@ -32,6 +32,7 @@ "metadata": {}, "outputs": [], "source": [ + "import numpy as np\n", "import pandas as pd\n", "import plotly.express as px\n", "import xarray as xr\n", @@ -72,11 +73,32 @@ "metadata": {}, "outputs": [], "source": [ - "from data.tutorial_data import get_constraints_data\n", + "# 3 days, hourly resolution\n", + "timesteps = pd.date_range('2024-03-11', periods=72, freq='h')\n", + "hours = np.arange(72)\n", + "hour_of_day = hours % 24\n", + "\n", + "# Factory operates in shifts:\n", + "# - Day shift (6am-2pm): 400 kW\n", + "# - Evening shift (2pm-10pm): 350 kW\n", + "# - Night (10pm-6am): 80 kW (maintenance heating only)\n", + "\n", + "steam_demand = np.select(\n", + " [\n", + " (hour_of_day >= 6) & (hour_of_day < 14), # Day shift\n", + " (hour_of_day >= 14) & (hour_of_day < 22), # Evening shift\n", + " ],\n", + " [400, 350],\n", + " default=80, # Night\n", + ")\n", + "\n", + "# Add some variation\n", + "np.random.seed(123)\n", + "steam_demand = steam_demand + np.random.normal(0, 20, len(steam_demand))\n", + "steam_demand = np.clip(steam_demand, 50, 450).astype(float)\n", "\n", - "data = get_constraints_data()\n", - "timesteps = data['timesteps']\n", - "steam_demand = data['steam_demand']" + "print(f'Peak demand: {steam_demand.max():.0f} kW')\n", + "print(f'Min demand: {steam_demand.min():.0f} kW')" ] }, { @@ -86,13 +108,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Visualize the demand with fxplot\n", - "demand_ds = xr.Dataset(\n", - " {\n", - " 'Steam Demand [kW]': xr.DataArray(steam_demand, dims=['time'], coords={'time': timesteps}),\n", - " }\n", - ")\n", - "demand_ds.fxplot.line(title='Factory Steam Demand')" + "px.line(x=timesteps, y=steam_demand, title='Factory Steam Demand', labels={'x': 'Time', 'y': 'kW'})" ] }, { @@ -278,12 +294,8 @@ "startup_costs = total_startups * 50\n", "gas_costs = total_costs - startup_costs\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'Startups': {'Count': total_startups, 'EUR': startup_costs},\n", - " 'Gas': {'Count': '-', 'EUR': gas_costs},\n", - " 'Total': {'Count': '-', 'EUR': total_costs},\n", - " }\n", + "print(\n", + " f'{total_startups} startups × 50€ = {startup_costs:.0f}€ startup + {gas_costs:.0f}€ gas = {total_costs:.0f}€ total'\n", ")" ] }, @@ -365,16 +377,8 @@ "fs_unconstrained.optimize(fx.solvers.HighsSolver())\n", "unconstrained_costs = fs_unconstrained.solution['costs'].item()\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'Without Constraints': {'Cost [EUR]': unconstrained_costs},\n", - " 'With Constraints': {'Cost [EUR]': total_costs},\n", - " 'Overhead': {\n", - " 'Cost [EUR]': total_costs - unconstrained_costs,\n", - " '%': (total_costs - unconstrained_costs) / unconstrained_costs * 100,\n", - " },\n", - " }\n", - ")" + "constraint_overhead = (total_costs - unconstrained_costs) / unconstrained_costs * 100\n", + "print(f'Constraints add {constraint_overhead:.1f}% cost: {unconstrained_costs:.0f}€ → {total_costs:.0f}€')" ] }, { diff --git a/docs/notebooks/05-multi-carrier-system.ipynb b/docs/notebooks/05-multi-carrier-system.ipynb index 707ef517a..a1a9543fa 100644 --- a/docs/notebooks/05-multi-carrier-system.ipynb +++ b/docs/notebooks/05-multi-carrier-system.ipynb @@ -32,7 +32,9 @@ "metadata": {}, "outputs": [], "source": [ + "import numpy as np\n", "import pandas as pd\n", + "import plotly.express as px\n", "import xarray as xr\n", "\n", "import flixopt as fx\n", @@ -83,15 +85,40 @@ "metadata": {}, "outputs": [], "source": [ - "from data.tutorial_data import get_multicarrier_data\n", - "\n", - "data = get_multicarrier_data()\n", - "timesteps = data['timesteps']\n", - "electricity_demand = data['electricity_demand']\n", - "heat_demand = data['heat_demand']\n", - "elec_buy_price = data['elec_buy_price']\n", - "elec_sell_price = data['elec_sell_price']\n", - "gas_price = data['gas_price']" + "# One week, hourly\n", + "timesteps = pd.date_range('2024-02-05', periods=168, freq='h')\n", + "hours = np.arange(168)\n", + "hour_of_day = hours % 24\n", + "\n", + "# Hospital electricity demand (kW)\n", + "# Base load + daily pattern (higher during day for equipment, lighting)\n", + "elec_base = 150 # 24/7 critical systems\n", + "elec_daily = 100 * np.sin((hour_of_day - 6) * np.pi / 12) # Peak at noon\n", + "elec_daily = np.maximum(0, elec_daily)\n", + "electricity_demand = elec_base + elec_daily\n", + "\n", + "# Hospital heat demand (kW)\n", + "# Higher in morning, drops during day, increases for hot water in evening\n", + "heat_pattern = np.select(\n", + " [\n", + " (hour_of_day >= 5) & (hour_of_day < 9), # Morning warmup\n", + " (hour_of_day >= 9) & (hour_of_day < 17), # Daytime\n", + " (hour_of_day >= 17) & (hour_of_day < 22), # Evening\n", + " ],\n", + " [350, 250, 300],\n", + " default=200, # Night\n", + ")\n", + "heat_demand = heat_pattern.astype(float)\n", + "\n", + "# Add random variation\n", + "np.random.seed(456)\n", + "electricity_demand += np.random.normal(0, 15, len(timesteps))\n", + "heat_demand += np.random.normal(0, 20, len(timesteps))\n", + "electricity_demand = np.clip(electricity_demand, 100, 300)\n", + "heat_demand = np.clip(heat_demand, 150, 400)\n", + "\n", + "print(f'Electricity: {electricity_demand.min():.0f} - {electricity_demand.max():.0f} kW')\n", + "print(f'Heat: {heat_demand.min():.0f} - {heat_demand.max():.0f} kW')" ] }, { @@ -101,20 +128,47 @@ "metadata": {}, "outputs": [], "source": [ - "# Visualize demands and prices with fxplot\n", + "# Electricity prices (€/kWh)\n", + "# Time-of-use: expensive during day, cheaper at night\n", + "elec_buy_price = np.where(\n", + " (hour_of_day >= 7) & (hour_of_day <= 21),\n", + " 0.35, # Peak - high electricity prices make CHP attractive\n", + " 0.20, # Off-peak\n", + ")\n", + "\n", + "# Feed-in tariff (sell price) - allows selling excess CHP electricity\n", + "elec_sell_price = 0.12 # Fixed feed-in rate\n", + "\n", + "# Gas price - relatively low, favoring gas-based generation\n", + "gas_price = 0.05 # €/kWh" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7", + "metadata": {}, + "outputs": [], + "source": [ + "# Visualize demands and prices with plotly - using xarray and faceting\n", "profiles = xr.Dataset(\n", " {\n", " 'Electricity Demand [kW]': xr.DataArray(electricity_demand, dims=['time'], coords={'time': timesteps}),\n", " 'Heat Demand [kW]': xr.DataArray(heat_demand, dims=['time'], coords={'time': timesteps}),\n", - " 'Elec. Buy Price [EUR/kWh]': xr.DataArray(elec_buy_price, dims=['time'], coords={'time': timesteps}),\n", + " 'Elec. Buy Price [€/kWh]': xr.DataArray(elec_buy_price, dims=['time'], coords={'time': timesteps}),\n", " }\n", ")\n", - "profiles.fxplot.line(title='Input Profiles')" + "\n", + "df = profiles.to_dataframe().reset_index().melt(id_vars='time', var_name='variable', value_name='value')\n", + "fig = px.line(df, x='time', y='value', facet_col='variable', height=300)\n", + "fig.update_yaxes(matches=None, showticklabels=True)\n", + "fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1]))\n", + "fig" ] }, { "cell_type": "markdown", - "id": "7", + "id": "8", "metadata": {}, "source": [ "## Build the Multi-Carrier System" @@ -123,7 +177,7 @@ { "cell_type": "code", "execution_count": null, - "id": "8", + "id": "9", "metadata": {}, "outputs": [], "source": [ @@ -216,7 +270,7 @@ }, { "cell_type": "markdown", - "id": "9", + "id": "10", "metadata": {}, "source": [ "## Run Optimization" @@ -225,7 +279,7 @@ { "cell_type": "code", "execution_count": null, - "id": "10", + "id": "11", "metadata": {}, "outputs": [], "source": [ @@ -234,7 +288,7 @@ }, { "cell_type": "markdown", - "id": "11", + "id": "12", "metadata": {}, "source": [ "## Analyze Results\n", @@ -245,7 +299,7 @@ { "cell_type": "code", "execution_count": null, - "id": "12", + "id": "13", "metadata": {}, "outputs": [], "source": [ @@ -254,7 +308,7 @@ }, { "cell_type": "markdown", - "id": "13", + "id": "14", "metadata": {}, "source": [ "### Heat Balance" @@ -263,7 +317,7 @@ { "cell_type": "code", "execution_count": null, - "id": "14", + "id": "15", "metadata": {}, "outputs": [], "source": [ @@ -272,7 +326,7 @@ }, { "cell_type": "markdown", - "id": "15", + "id": "16", "metadata": {}, "source": [ "### Gas Balance" @@ -281,7 +335,7 @@ { "cell_type": "code", "execution_count": null, - "id": "16", + "id": "17", "metadata": {}, "outputs": [], "source": [ @@ -290,7 +344,7 @@ }, { "cell_type": "markdown", - "id": "17", + "id": "18", "metadata": {}, "source": [ "### CHP Operation Pattern" @@ -299,7 +353,7 @@ { "cell_type": "code", "execution_count": null, - "id": "18", + "id": "19", "metadata": {}, "outputs": [], "source": [ @@ -308,7 +362,7 @@ }, { "cell_type": "markdown", - "id": "19", + "id": "20", "metadata": {}, "source": [ "### Cost and Emissions Summary" @@ -317,10 +371,13 @@ { "cell_type": "code", "execution_count": null, - "id": "20", + "id": "21", "metadata": {}, "outputs": [], "source": [ + "total_costs = flow_system.solution['costs'].item()\n", + "total_co2 = flow_system.solution['CO2'].item()\n", + "\n", "# Energy flows\n", "flow_rates = flow_system.statistics.flow_rates\n", "grid_buy = flow_rates['GridBuy(Electricity)'].sum().item()\n", @@ -332,25 +389,17 @@ "total_elec = electricity_demand.sum()\n", "total_heat = heat_demand.sum()\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'CHP Electricity [kWh]': chp_elec,\n", - " 'CHP Electricity [%]': chp_elec / total_elec * 100,\n", - " 'Grid Buy [kWh]': grid_buy,\n", - " 'Grid Sell [kWh]': grid_sell,\n", - " 'CHP Heat [kWh]': chp_heat,\n", - " 'CHP Heat [%]': chp_heat / total_heat * 100,\n", - " 'Boiler Heat [kWh]': boiler_heat,\n", - " 'Total Costs [EUR]': flow_system.solution['costs'].item(),\n", - " 'Total CO2 [kg]': flow_system.solution['CO2'].item(),\n", - " },\n", - " index=['Value'],\n", - ").T" + "# Display as compact summary\n", + "print(\n", + " f'Electricity: {chp_elec:.0f} kWh CHP ({chp_elec / total_elec * 100:.0f}%) + {grid_buy:.0f} kWh grid, {grid_sell:.0f} kWh sold'\n", + ")\n", + "print(f'Heat: {chp_heat:.0f} kWh CHP ({chp_heat / total_heat * 100:.0f}%) + {boiler_heat:.0f} kWh boiler')\n", + "print(f'Costs: {total_costs:.2f} € | CO2: {total_co2:.0f} kg')" ] }, { "cell_type": "markdown", - "id": "21", + "id": "22", "metadata": {}, "source": [ "### Compare: What if No CHP?\n", @@ -361,7 +410,7 @@ { "cell_type": "code", "execution_count": null, - "id": "22", + "id": "23", "metadata": {}, "outputs": [], "source": [ @@ -405,30 +454,19 @@ "\n", "fs_no_chp.optimize(fx.solvers.HighsSolver())\n", "\n", - "total_costs = flow_system.solution['costs'].item()\n", - "total_co2 = flow_system.solution['CO2'].item()\n", "no_chp_costs = fs_no_chp.solution['costs'].item()\n", "no_chp_co2 = fs_no_chp.solution['CO2'].item()\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'Without CHP': {'Cost [EUR]': no_chp_costs, 'CO2 [kg]': no_chp_co2},\n", - " 'With CHP': {'Cost [EUR]': total_costs, 'CO2 [kg]': total_co2},\n", - " 'Savings': {\n", - " 'Cost [EUR]': no_chp_costs - total_costs,\n", - " 'CO2 [kg]': no_chp_co2 - total_co2,\n", - " },\n", - " 'Savings [%]': {\n", - " 'Cost [EUR]': (no_chp_costs - total_costs) / no_chp_costs * 100,\n", - " 'CO2 [kg]': (no_chp_co2 - total_co2) / no_chp_co2 * 100,\n", - " },\n", - " }\n", + "cost_saving = (no_chp_costs - total_costs) / no_chp_costs * 100\n", + "co2_saving = (no_chp_co2 - total_co2) / no_chp_co2 * 100\n", + "print(\n", + " f'CHP saves {cost_saving:.1f}% costs ({no_chp_costs:.0f}→{total_costs:.0f} €) and {co2_saving:.1f}% CO2 ({no_chp_co2:.0f}→{total_co2:.0f} kg)'\n", ")" ] }, { "cell_type": "markdown", - "id": "23", + "id": "24", "metadata": {}, "source": [ "### Energy Flow Sankey\n", @@ -439,7 +477,7 @@ { "cell_type": "code", "execution_count": null, - "id": "24", + "id": "25", "metadata": {}, "outputs": [], "source": [ @@ -448,7 +486,7 @@ }, { "cell_type": "markdown", - "id": "25", + "id": "26", "metadata": {}, "source": [ "## Key Concepts\n", diff --git a/docs/notebooks/06a-time-varying-parameters.ipynb b/docs/notebooks/06a-time-varying-parameters.ipynb index 4eaba9854..5c833b2ea 100644 --- a/docs/notebooks/06a-time-varying-parameters.ipynb +++ b/docs/notebooks/06a-time-varying-parameters.ipynb @@ -32,6 +32,7 @@ "outputs": [], "source": [ "import numpy as np\n", + "import pandas as pd\n", "import plotly.express as px\n", "import xarray as xr\n", "\n", @@ -77,13 +78,20 @@ "metadata": {}, "outputs": [], "source": [ - "from data.tutorial_data import get_time_varying_data\n", - "\n", - "data = get_time_varying_data()\n", - "timesteps = data['timesteps']\n", - "outdoor_temp = data['outdoor_temp']\n", - "heat_demand = data['heat_demand']\n", - "cop = data['cop']" + "# One winter week\n", + "timesteps = pd.date_range('2024-01-22', periods=168, freq='h')\n", + "hours = np.arange(168)\n", + "hour_of_day = hours % 24\n", + "\n", + "# Outdoor temperature: daily cycle with cold nights\n", + "temp_base = 2 # Average temp in °C\n", + "temp_amplitude = 5 # Daily variation\n", + "outdoor_temp = temp_base + temp_amplitude * np.sin((hour_of_day - 6) * np.pi / 12)\n", + "\n", + "# Add day-to-day variation for realism\n", + "np.random.seed(789)\n", + "daily_offset = np.repeat(np.random.uniform(-3, 3, 7), 24)\n", + "outdoor_temp = outdoor_temp + daily_offset" ] }, { @@ -93,24 +101,41 @@ "metadata": {}, "outputs": [], "source": [ - "# Visualize input profiles with fxplot\n", + "# Heat demand: inversely related to outdoor temp (higher demand when colder)\n", + "heat_demand = 200 - 8 * outdoor_temp\n", + "heat_demand = np.clip(heat_demand, 100, 300)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7", + "metadata": {}, + "outputs": [], + "source": [ + "# Visualize input profiles\n", "profiles = xr.Dataset(\n", " {\n", " 'Outdoor Temp [°C]': xr.DataArray(outdoor_temp, dims=['time'], coords={'time': timesteps}),\n", " 'Heat Demand [kW]': xr.DataArray(heat_demand, dims=['time'], coords={'time': timesteps}),\n", " }\n", ")\n", - "profiles.fxplot.line(title='Input Profiles')" + "\n", + "df = profiles.to_dataframe().reset_index().melt(id_vars='time', var_name='variable', value_name='value')\n", + "fig = px.line(df, x='time', y='value', facet_col='variable', height=300)\n", + "fig.update_yaxes(matches=None, showticklabels=True)\n", + "fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1]))\n", + "fig" ] }, { "cell_type": "markdown", - "id": "7", + "id": "8", "metadata": {}, "source": [ - "## Time-Varying COP\n", + "## Calculate Time-Varying COP\n", "\n", - "The COP depends on outdoor temperature. The helper function uses a simplified Carnot-based formula:\n", + "The COP depends on outdoor temperature. We use a simplified Carnot-based formula:\n", "\n", "$$\\text{COP}_{\\text{real}} \\approx 0.45 \\times \\text{COP}_{\\text{Carnot}} = 0.45 \\times \\frac{T_{\\text{supply}}}{T_{\\text{supply}} - T_{\\text{source}}}$$\n", "\n", @@ -120,14 +145,30 @@ { "cell_type": "code", "execution_count": null, - "id": "8", + "id": "9", + "metadata": {}, + "outputs": [], + "source": [ + "# COP calculation\n", + "T_supply = 45 + 273.15 # Supply temperature 45°C in Kelvin\n", + "T_source = outdoor_temp + 273.15 # Outdoor temp in Kelvin\n", + "\n", + "carnot_cop = T_supply / (T_supply - T_source)\n", + "real_cop = 0.45 * carnot_cop\n", + "real_cop = np.clip(real_cop, 2.0, 5.0) # Physical limits" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", "metadata": {}, "outputs": [], "source": [ "# Visualize COP vs temperature relationship\n", "px.scatter(\n", " x=outdoor_temp,\n", - " y=cop,\n", + " y=real_cop,\n", " title='Heat Pump COP vs Outdoor Temperature',\n", " labels={'x': 'Outdoor Temperature [°C]', 'y': 'COP'},\n", " opacity=0.5,\n", @@ -136,7 +177,7 @@ }, { "cell_type": "markdown", - "id": "9", + "id": "11", "metadata": {}, "source": [ "## Build the Model\n", @@ -151,7 +192,7 @@ { "cell_type": "code", "execution_count": null, - "id": "10", + "id": "12", "metadata": {}, "outputs": [], "source": [ @@ -173,7 +214,7 @@ " 'HeatPump',\n", " inputs=[fx.Flow('Elec', bus='Electricity', size=150)],\n", " outputs=[fx.Flow('Heat', bus='Heat', size=500)],\n", - " conversion_factors=[{'Elec': cop, 'Heat': 1}], # <-- Array for time-varying COP\n", + " conversion_factors=[{'Elec': real_cop, 'Heat': 1}], # <-- Array for time-varying COP\n", " ),\n", " # Heat demand\n", " fx.Sink('Building', inputs=[fx.Flow('Heat', bus='Heat', size=1, fixed_relative_profile=heat_demand)]),\n", @@ -184,7 +225,7 @@ }, { "cell_type": "markdown", - "id": "11", + "id": "13", "metadata": {}, "source": [ "## Analyze Results" @@ -193,7 +234,7 @@ { "cell_type": "code", "execution_count": null, - "id": "12", + "id": "14", "metadata": {}, "outputs": [], "source": [ @@ -203,7 +244,7 @@ { "cell_type": "code", "execution_count": null, - "id": "13", + "id": "15", "metadata": {}, "outputs": [], "source": [ @@ -213,7 +254,7 @@ { "cell_type": "code", "execution_count": null, - "id": "14", + "id": "16", "metadata": {}, "outputs": [], "source": [ @@ -242,7 +283,7 @@ }, { "cell_type": "markdown", - "id": "15", + "id": "17", "metadata": {}, "source": [ "## Key Concepts\n", @@ -282,7 +323,7 @@ }, { "cell_type": "markdown", - "id": "16", + "id": "18", "metadata": {}, "source": [ "## Summary\n", diff --git a/docs/notebooks/07-scenarios-and-periods.ipynb b/docs/notebooks/07-scenarios-and-periods.ipynb index 52d04b0d8..db74afefb 100644 --- a/docs/notebooks/07-scenarios-and-periods.ipynb +++ b/docs/notebooks/07-scenarios-and-periods.ipynb @@ -32,8 +32,9 @@ "metadata": {}, "outputs": [], "source": [ + "import numpy as np\n", "import pandas as pd\n", - "import xarray as xr\n", + "import plotly.express as px\n", "\n", "import flixopt as fx\n", "\n", @@ -71,16 +72,20 @@ "metadata": {}, "outputs": [], "source": [ - "from data.tutorial_data import get_scenarios_data\n", - "\n", - "data = get_scenarios_data()\n", - "timesteps = data['timesteps']\n", - "periods = data['periods']\n", - "scenarios = data['scenarios']\n", - "scenario_weights = data['scenario_weights']\n", - "heat_demand = data['heat_demand']\n", - "gas_prices = data['gas_prices']\n", - "elec_prices = data['elec_prices']" + "# Time horizon: one representative winter week\n", + "timesteps = pd.date_range('2024-01-15', periods=168, freq='h') # 7 days\n", + "\n", + "# Planning periods (years)\n", + "periods = pd.Index([2024, 2025, 2026], name='period')\n", + "\n", + "# Scenarios with probabilities\n", + "scenarios = pd.Index(['Mild Winter', 'Harsh Winter'], name='scenario')\n", + "scenario_weights = np.array([0.6, 0.4]) # 60% mild, 40% harsh\n", + "\n", + "print(f'Time dimension: {len(timesteps)} hours')\n", + "print(f'Periods: {list(periods)}')\n", + "print(f'Scenarios: {list(scenarios)}')\n", + "print(f'Scenario weights: {dict(zip(scenarios, scenario_weights, strict=False))}')" ] }, { @@ -88,7 +93,7 @@ "id": "6", "metadata": {}, "source": [ - "## Scenario-Dependent Demand Profiles\n", + "## Create Scenario-Dependent Demand Profiles\n", "\n", "Heat demand differs significantly between mild and harsh winters:" ] @@ -100,24 +105,98 @@ "metadata": {}, "outputs": [], "source": [ - "# Visualize demand scenarios with fxplot\n", - "demand_ds = xr.Dataset(\n", + "hours = np.arange(168)\n", + "hour_of_day = hours % 24\n", + "\n", + "# Base daily pattern (kW): higher in morning/evening\n", + "daily_pattern = np.select(\n", + " [\n", + " (hour_of_day >= 6) & (hour_of_day < 9), # Morning peak\n", + " (hour_of_day >= 9) & (hour_of_day < 17), # Daytime\n", + " (hour_of_day >= 17) & (hour_of_day < 22), # Evening peak\n", + " ],\n", + " [180, 120, 160],\n", + " default=100, # Night\n", + ").astype(float)\n", + "\n", + "# Add random variation\n", + "np.random.seed(42)\n", + "noise = np.random.normal(0, 10, len(timesteps))\n", + "\n", + "# Mild winter: lower demand\n", + "mild_demand = daily_pattern * 0.8 + noise\n", + "mild_demand = np.clip(mild_demand, 60, 200)\n", + "\n", + "# Harsh winter: higher demand\n", + "harsh_demand = daily_pattern * 1.3 + noise * 1.5\n", + "harsh_demand = np.clip(harsh_demand, 100, 280)\n", + "\n", + "# Create DataFrame with scenario columns (flixopt uses column names to match scenarios)\n", + "heat_demand = pd.DataFrame(\n", " {\n", - " scenario: xr.DataArray(\n", - " heat_demand[scenario].values,\n", - " dims=['time'],\n", - " coords={'time': timesteps},\n", - " )\n", - " for scenario in scenarios\n", - " }\n", + " 'Mild Winter': mild_demand,\n", + " 'Harsh Winter': harsh_demand,\n", + " },\n", + " index=timesteps,\n", ")\n", - "demand_ds.fxplot.line(title='Heat Demand by Scenario')" + "\n", + "print(f'Mild winter demand: {mild_demand.min():.0f} - {mild_demand.max():.0f} kW')\n", + "print(f'Harsh winter demand: {harsh_demand.min():.0f} - {harsh_demand.max():.0f} kW')" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "id": "8", "metadata": {}, + "outputs": [], + "source": [ + "# Visualize demand scenarios with plotly\n", + "fig = px.line(\n", + " heat_demand.iloc[:48],\n", + " title='Heat Demand by Scenario (First 2 Days)',\n", + " labels={'index': 'Time', 'value': 'kW', 'variable': 'Scenario'},\n", + ")\n", + "fig.update_traces(mode='lines')\n", + "fig" + ] + }, + { + "cell_type": "markdown", + "id": "9", + "metadata": {}, + "source": [ + "## Create Period-Dependent Prices\n", + "\n", + "Energy prices change across planning years:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": {}, + "outputs": [], + "source": [ + "# Gas prices by period (€/kWh) - expected to rise\n", + "gas_prices = np.array([0.06, 0.08, 0.10]) # 2024, 2025, 2026\n", + "\n", + "# Electricity sell prices by period (€/kWh) - CHP revenue\n", + "elec_prices = np.array([0.28, 0.34, 0.43]) # Rising with gas\n", + "\n", + "print('Gas prices by period:')\n", + "for period, price in zip(periods, gas_prices, strict=False):\n", + " print(f' {period}: {price:.2f} €/kWh')\n", + "\n", + "print('\\nElectricity sell prices by period:')\n", + "for period, price in zip(periods, elec_prices, strict=False):\n", + " print(f' {period}: {price:.2f} €/kWh')" + ] + }, + { + "cell_type": "markdown", + "id": "11", + "metadata": {}, "source": [ "## Build the Flow System\n", "\n", @@ -127,7 +206,7 @@ { "cell_type": "code", "execution_count": null, - "id": "9", + "id": "12", "metadata": {}, "outputs": [], "source": [ @@ -143,12 +222,12 @@ " fx.Carrier('heat', '#e74c3c', 'kW'),\n", ")\n", "\n", - "flow_system" + "print(flow_system)" ] }, { "cell_type": "markdown", - "id": "10", + "id": "13", "metadata": {}, "source": [ "## Add Components" @@ -157,7 +236,7 @@ { "cell_type": "code", "execution_count": null, - "id": "11", + "id": "14", "metadata": {}, "outputs": [], "source": [ @@ -234,7 +313,7 @@ }, { "cell_type": "markdown", - "id": "12", + "id": "15", "metadata": {}, "source": [ "## Run Optimization" @@ -243,7 +322,7 @@ { "cell_type": "code", "execution_count": null, - "id": "13", + "id": "16", "metadata": {}, "outputs": [], "source": [ @@ -252,7 +331,7 @@ }, { "cell_type": "markdown", - "id": "14", + "id": "17", "metadata": {}, "source": [ "## Analyze Results\n", @@ -263,25 +342,20 @@ { "cell_type": "code", "execution_count": null, - "id": "15", + "id": "18", "metadata": {}, "outputs": [], "source": [ "chp_size = flow_system.statistics.sizes['CHP(P_el)']\n", + "total_cost = flow_system.solution['costs']\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'CHP Electrical [kW]': float(chp_size.max()),\n", - " 'CHP Thermal [kW]': float(chp_size.max()) * 0.50 / 0.35,\n", - " 'Expected Cost [EUR]': float(flow_system.solution['costs'].sum()),\n", - " },\n", - " index=['Optimal'],\n", - ").T" + "print(f'Optimal CHP: {float(chp_size.max()):.0f} kW electrical ({float(chp_size.max()) * 0.50 / 0.35:.0f} kW thermal)')\n", + "print(f'Expected cost: {float(total_cost.sum()):.0f} €')" ] }, { "cell_type": "markdown", - "id": "16", + "id": "19", "metadata": {}, "source": [ "### Heat Balance by Scenario\n", @@ -292,7 +366,7 @@ { "cell_type": "code", "execution_count": null, - "id": "17", + "id": "20", "metadata": {}, "outputs": [], "source": [ @@ -301,7 +375,7 @@ }, { "cell_type": "markdown", - "id": "18", + "id": "21", "metadata": {}, "source": [ "### CHP Operation Patterns" @@ -310,7 +384,7 @@ { "cell_type": "code", "execution_count": null, - "id": "19", + "id": "22", "metadata": {}, "outputs": [], "source": [ @@ -319,7 +393,7 @@ }, { "cell_type": "markdown", - "id": "20", + "id": "23", "metadata": {}, "source": [ "### Multi-Dimensional Data Access\n", @@ -330,11 +404,13 @@ { "cell_type": "code", "execution_count": null, - "id": "21", + "id": "24", "metadata": {}, "outputs": [], "source": [ + "# View dimensions\n", "flow_rates = flow_system.statistics.flow_rates\n", + "print('Flow rates dimensions:', dict(flow_rates.sizes))\n", "\n", "# Plot flow rates\n", "flow_system.statistics.plot.flows()" @@ -343,27 +419,22 @@ { "cell_type": "code", "execution_count": null, - "id": "22", + "id": "25", "metadata": {}, "outputs": [], "source": [ "# CHP operation summary by scenario\n", "chp_heat = flow_rates['CHP(Q_th)']\n", "\n", - "pd.DataFrame(\n", - " {\n", - " scenario: {\n", - " 'Avg [kW]': float(chp_heat.sel(scenario=scenario).mean()),\n", - " 'Max [kW]': float(chp_heat.sel(scenario=scenario).max()),\n", - " }\n", - " for scenario in scenarios\n", - " }\n", - ")" + "for scenario in scenarios:\n", + " scenario_avg = float(chp_heat.sel(scenario=scenario).mean())\n", + " scenario_max = float(chp_heat.sel(scenario=scenario).max())\n", + " print(f'{scenario}: avg {scenario_avg:.0f} kW, max {scenario_max:.0f} kW')" ] }, { "cell_type": "markdown", - "id": "23", + "id": "26", "metadata": {}, "source": [ "## Sensitivity: What if Only Mild Winter?\n", @@ -374,7 +445,7 @@ { "cell_type": "code", "execution_count": null, - "id": "24", + "id": "27", "metadata": {}, "outputs": [], "source": [ @@ -385,18 +456,14 @@ "chp_size_mild = float(fs_mild.statistics.sizes['CHP(P_el)'].max())\n", "chp_size_both = float(chp_size.max())\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'Mild Only': {'CHP Size [kW]': chp_size_mild},\n", - " 'Both Scenarios': {'CHP Size [kW]': chp_size_both},\n", - " 'Uncertainty Buffer': {'CHP Size [kW]': chp_size_both - chp_size_mild},\n", - " }\n", + "print(\n", + " f'CHP sizing: {chp_size_mild:.0f} kW (mild only) vs {chp_size_both:.0f} kW (both scenarios) → +{chp_size_both - chp_size_mild:.0f} kW for uncertainty'\n", ")" ] }, { "cell_type": "markdown", - "id": "25", + "id": "28", "metadata": {}, "source": [ "### Energy Flow Sankey\n", @@ -407,7 +474,7 @@ { "cell_type": "code", "execution_count": null, - "id": "26", + "id": "29", "metadata": {}, "outputs": [], "source": [ @@ -416,7 +483,7 @@ }, { "cell_type": "markdown", - "id": "27", + "id": "30", "metadata": {}, "source": [ "## Key Concepts\n", diff --git a/docs/notebooks/08a-aggregation.ipynb b/docs/notebooks/08a-aggregation.ipynb index cccb7dce0..8bc1a4774 100644 --- a/docs/notebooks/08a-aggregation.ipynb +++ b/docs/notebooks/08a-aggregation.ipynb @@ -65,8 +65,8 @@ "flow_system.connect_and_transform() # Align all data as xarray\n", "\n", "timesteps = flow_system.timesteps\n", - "\n", - "flow_system" + "print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 24:.0f} days at hourly resolution)')\n", + "print(f'Components: {list(flow_system.components.keys())}')" ] }, { @@ -111,7 +111,10 @@ "solver = fx.solvers.HighsSolver(mip_gap=0.01)\n", "\n", "# Resample from 15-min to 4h resolution\n", - "fs_resampled = flow_system.transform.resample('4h')" + "fs_resampled = flow_system.transform.resample('4h')\n", + "\n", + "reduction = (1 - len(fs_resampled.timesteps) / len(flow_system.timesteps)) * 100\n", + "print(f'Resampled: {len(flow_system.timesteps)} → {len(fs_resampled.timesteps)} timesteps ({reduction:.0f}% reduction)')" ] }, { @@ -124,7 +127,9 @@ "# Optimize resampled system\n", "start = timeit.default_timer()\n", "fs_resampled.optimize(solver)\n", - "time_resampled = timeit.default_timer() - start" + "time_resampled = timeit.default_timer() - start\n", + "\n", + "print(f'Resampled: {time_resampled:.1f}s, {fs_resampled.solution[\"costs\"].item():,.0f} €')" ] }, { @@ -151,7 +156,10 @@ "fs_sizing.optimize(solver)\n", "time_stage1 = timeit.default_timer() - start\n", "\n", - "sizes = {k: float(v.item()) for k, v in fs_sizing.statistics.sizes.items()}" + "sizes = {k: float(v.item()) for k, v in fs_sizing.statistics.sizes.items()}\n", + "print(\n", + " f'Stage 1 (sizing): {time_stage1:.1f}s → CHP {sizes[\"CHP(Q_th)\"]:.0f}, Boiler {sizes[\"Boiler(Q_th)\"]:.0f}, Storage {sizes[\"Storage\"]:.0f}'\n", + ")" ] }, { @@ -165,7 +173,11 @@ "start = timeit.default_timer()\n", "fs_dispatch = flow_system.transform.fix_sizes(fs_sizing.statistics.sizes)\n", "fs_dispatch.optimize(solver)\n", - "time_stage2 = timeit.default_timer() - start" + "time_stage2 = timeit.default_timer() - start\n", + "\n", + "print(\n", + " f'Stage 2 (dispatch): {time_stage2:.1f}s, {fs_dispatch.solution[\"costs\"].item():,.0f} € (total: {time_stage1 + time_stage2:.1f}s)'\n", + ")" ] }, { @@ -188,7 +200,9 @@ "start = timeit.default_timer()\n", "fs_full = flow_system.copy()\n", "fs_full.optimize(solver)\n", - "time_full = timeit.default_timer() - start" + "time_full = timeit.default_timer() - start\n", + "\n", + "print(f'Full optimization: {time_full:.1f}s, {fs_full.solution[\"costs\"].item():,.0f} €')" ] }, { diff --git a/docs/notebooks/08b-rolling-horizon.ipynb b/docs/notebooks/08b-rolling-horizon.ipynb index 3b91fd980..c0d7bdf24 100644 --- a/docs/notebooks/08b-rolling-horizon.ipynb +++ b/docs/notebooks/08b-rolling-horizon.ipynb @@ -69,8 +69,8 @@ "flow_system.connect_and_transform() # Align all data as xarray\n", "\n", "timesteps = flow_system.timesteps\n", - "\n", - "flow_system" + "print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 24:.0f} days at hourly resolution)')\n", + "print(f'Components: {list(flow_system.components.keys())}')" ] }, { @@ -95,7 +95,9 @@ "start = timeit.default_timer()\n", "fs_full = flow_system.copy()\n", "fs_full.optimize(solver)\n", - "time_full = timeit.default_timer() - start" + "time_full = timeit.default_timer() - start\n", + "\n", + "print(f'Full: {time_full:.1f}s, {fs_full.solution[\"costs\"].item():,.0f} €')" ] }, { @@ -136,7 +138,9 @@ " horizon=192, # 2-day segments (192 timesteps at 15-min resolution)\n", " overlap=96, # 1-day lookahead\n", ")\n", - "time_rolling = timeit.default_timer() - start" + "time_rolling = timeit.default_timer() - start\n", + "\n", + "print(f'Rolling ({len(segments)} segments): {time_rolling:.1f}s, {fs_rolling.solution[\"costs\"].item():,.0f} €')" ] }, { @@ -250,16 +254,11 @@ "metadata": {}, "outputs": [], "source": [ - "pd.DataFrame(\n", - " {\n", - " f'Segment {i + 1}': {\n", - " 'Start': f'{seg.timesteps[0]:%m-%d %H:%M}',\n", - " 'End': f'{seg.timesteps[-1]:%m-%d %H:%M}',\n", - " 'Cost [EUR]': seg.solution['costs'].item(),\n", - " }\n", - " for i, seg in enumerate(segments)\n", - " }\n", - ")" + "print(f'{len(segments)} segments:')\n", + "for i, seg in enumerate(segments):\n", + " print(\n", + " f' {i + 1}: {seg.timesteps[0]:%m-%d %H:%M} → {seg.timesteps[-1]:%m-%d %H:%M} | {seg.solution[\"costs\"].item():,.0f} €'\n", + " )" ] }, { diff --git a/docs/notebooks/08c-clustering.ipynb b/docs/notebooks/08c-clustering.ipynb index 825aadd3a..0f3b4cc29 100644 --- a/docs/notebooks/08c-clustering.ipynb +++ b/docs/notebooks/08c-clustering.ipynb @@ -29,8 +29,7 @@ "import timeit\n", "\n", "import pandas as pd\n", - "import plotly.graph_objects as go\n", - "from plotly.subplots import make_subplots\n", + "import xarray as xr\n", "\n", "import flixopt as fx\n", "\n", @@ -72,18 +71,13 @@ "outputs": [], "source": [ "# Visualize input data\n", - "heat_demand = flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile\n", - "electricity_price = flow_system.components['GridBuy'].outputs[0].effects_per_flow_hour['costs']\n", - "\n", - "fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.1)\n", - "fig.add_trace(go.Scatter(x=timesteps, y=heat_demand.values, name='Heat Demand', line=dict(width=0.5)), row=1, col=1)\n", - "fig.add_trace(\n", - " go.Scatter(x=timesteps, y=electricity_price.values, name='Electricity Price', line=dict(width=0.5)), row=2, col=1\n", + "input_ds = xr.Dataset(\n", + " {\n", + " 'Heat Demand': flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile,\n", + " 'Electricity Price': flow_system.components['GridBuy'].outputs[0].effects_per_flow_hour['costs'],\n", + " }\n", ")\n", - "fig.update_layout(height=400, title='One Month of Input Data')\n", - "fig.update_yaxes(title_text='Heat Demand [MW]', row=1, col=1)\n", - "fig.update_yaxes(title_text='El. Price [€/MWh]', row=2, col=1)\n", - "fig.show()" + "input_ds.fxplot.line(facet_row='variable', title='One Month of Input Data')" ] }, { @@ -172,7 +166,7 @@ "source": [ "## Understanding the Clustering\n", "\n", - "The clustering algorithm groups similar days together. Let's inspect the cluster structure:" + "The clustering algorithm groups similar days together. Access all metadata via `fs.clustering`:" ] }, { @@ -181,14 +175,150 @@ "id": "11", "metadata": {}, "outputs": [], + "source": [ + "# Access clustering metadata directly\n", + "clustering = fs_clustered.clustering\n", + "clustering" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": {}, + "outputs": [], "source": [ "# Show clustering info using __repr__\n", "fs_clustered.clustering" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "13", + "metadata": {}, + "outputs": [], + "source": [ + "# Quality metrics - how well do the clusters represent the original data?\n", + "# Lower RMSE/MAE = better representation\n", + "clustering.metrics.to_dataframe().style.format('{:.3f}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14", + "metadata": {}, + "outputs": [], + "source": [ + "# Visual comparison: original vs clustered time series\n", + "clustering.plot.compare()" + ] + }, { "cell_type": "markdown", - "id": "12", + "id": "15", + "metadata": {}, + "source": [ + "## Advanced Clustering Options\n", + "\n", + "The `cluster()` method exposes many parameters for fine-tuning:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16", + "metadata": {}, + "outputs": [], + "source": [ + "# Try different clustering algorithms\n", + "fs_kmeans = flow_system.transform.cluster(\n", + " n_clusters=8,\n", + " cluster_duration='1D',\n", + " cluster_method='k_means', # Alternative: 'hierarchical' (default), 'k_medoids', 'averaging'\n", + ")\n", + "\n", + "# Compare cluster assignments between algorithms\n", + "print('hierarchical clusters:', fs_clustered.clustering.cluster_order.values)\n", + "print('k_means clusters: ', fs_kmeans.clustering.cluster_order.values)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17", + "metadata": {}, + "outputs": [], + "source": [ + "# Compare RMSE between algorithms\n", + "print('Quality comparison (RMSE for HeatDemand):')\n", + "print(\n", + " f' hierarchical: {float(fs_clustered.clustering.metrics[\"RMSE\"].sel(time_series=\"HeatDemand(Q_th)|fixed_relative_profile\")):.4f}'\n", + ")\n", + "print(\n", + " f' k_means: {float(fs_kmeans.clustering.metrics[\"RMSE\"].sel(time_series=\"HeatDemand(Q_th)|fixed_relative_profile\")):.4f}'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18", + "metadata": {}, + "outputs": [], + "source": [ + "# Visualize cluster structure with heatmap\n", + "clustering.plot.heatmap()" + ] + }, + { + "cell_type": "markdown", + "id": "19", + "metadata": {}, + "source": [ + "### Manual Cluster Assignment\n", + "\n", + "When comparing design variants or performing sensitivity analysis, you often want to\n", + "use the **same cluster structure** across different FlowSystem configurations.\n", + "Use `predef_cluster_order` to ensure comparable results:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20", + "metadata": {}, + "outputs": [], + "source": [ + "# Save the cluster order from our optimized system\n", + "cluster_order = fs_clustered.clustering.cluster_order.values\n", + "print(f'Cluster order to reuse: {cluster_order}')\n", + "\n", + "# Now modify the FlowSystem (e.g., increase storage capacity limits)\n", + "flow_system_modified = flow_system.copy()\n", + "flow_system_modified.components['Storage'].capacity_in_flow_hours.maximum_size = 2000 # Larger storage option\n", + "\n", + "# Cluster with the SAME cluster structure for fair comparison\n", + "fs_modified_clustered = flow_system_modified.transform.cluster(\n", + " n_clusters=8,\n", + " cluster_duration='1D',\n", + " predef_cluster_order=cluster_order, # Reuse cluster assignments\n", + ")\n", + "\n", + "# Optimize the modified system\n", + "fs_modified_clustered.optimize(solver)\n", + "\n", + "print('\\nComparison (same cluster structure):')\n", + "print(f' Original storage size: {fs_clustered.statistics.sizes[\"Storage\"].item():.0f}')\n", + "print(f' Modified storage size: {fs_modified_clustered.statistics.sizes[\"Storage\"].item():.0f}')\n", + "print(f' Original cost: {fs_clustered.solution[\"costs\"].item():,.0f} €')\n", + "print(f' Modified cost: {fs_modified_clustered.solution[\"costs\"].item():,.0f} €')" + ] + }, + { + "cell_type": "markdown", + "id": "21", "metadata": {}, "source": [ "## Method 3: Two-Stage Workflow (Recommended)\n", @@ -206,7 +336,7 @@ { "cell_type": "code", "execution_count": null, - "id": "13", + "id": "22", "metadata": {}, "outputs": [], "source": [ @@ -218,7 +348,7 @@ { "cell_type": "code", "execution_count": null, - "id": "14", + "id": "23", "metadata": {}, "outputs": [], "source": [ @@ -236,7 +366,7 @@ }, { "cell_type": "markdown", - "id": "15", + "id": "24", "metadata": {}, "source": [ "## Compare Results" @@ -245,7 +375,7 @@ { "cell_type": "code", "execution_count": null, - "id": "16", + "id": "25", "metadata": {}, "outputs": [], "source": [ @@ -294,7 +424,7 @@ }, { "cell_type": "markdown", - "id": "17", + "id": "26", "metadata": {}, "source": [ "## Expand Solution to Full Resolution\n", @@ -306,7 +436,7 @@ { "cell_type": "code", "execution_count": null, - "id": "18", + "id": "27", "metadata": {}, "outputs": [], "source": [ @@ -317,34 +447,29 @@ { "cell_type": "code", "execution_count": null, - "id": "19", + "id": "28", "metadata": {}, "outputs": [], "source": [ - "# Compare heat balance: Full vs Expanded\n", - "fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=['Full Optimization', 'Expanded from Clustering'])\n", + "# Compare heat production: Full vs Expanded\n", + "heat_flows = ['CHP(Q_th)|flow_rate', 'Boiler(Q_th)|flow_rate']\n", "\n", - "# Full\n", - "for var in ['CHP(Q_th)', 'Boiler(Q_th)']:\n", - " values = fs_full.solution[f'{var}|flow_rate'].values\n", - " fig.add_trace(go.Scatter(x=fs_full.timesteps, y=values, name=var, legendgroup=var, showlegend=True), row=1, col=1)\n", - "\n", - "# Expanded\n", - "for var in ['CHP(Q_th)', 'Boiler(Q_th)']:\n", - " values = fs_expanded.solution[f'{var}|flow_rate'].values\n", - " fig.add_trace(\n", - " go.Scatter(x=fs_expanded.timesteps, y=values, name=var, legendgroup=var, showlegend=False), row=2, col=1\n", - " )\n", + "# Create comparison dataset\n", + "comparison_ds = xr.Dataset(\n", + " {\n", + " name.replace('|flow_rate', ''): xr.concat(\n", + " [fs_full.solution[name], fs_expanded.solution[name]], dim=pd.Index(['Full', 'Expanded'], name='method')\n", + " )\n", + " for name in heat_flows\n", + " }\n", + ")\n", "\n", - "fig.update_layout(height=500, title='Heat Production Comparison')\n", - "fig.update_yaxes(title_text='MW', row=1, col=1)\n", - "fig.update_yaxes(title_text='MW', row=2, col=1)\n", - "fig.show()" + "comparison_ds.fxplot.line(facet_col='variable', color='method', title='Heat Production Comparison')" ] }, { "cell_type": "markdown", - "id": "20", + "id": "29", "metadata": {}, "source": [ "## Visualize Clustered Heat Balance" @@ -353,7 +478,7 @@ { "cell_type": "code", "execution_count": null, - "id": "21", + "id": "30", "metadata": {}, "outputs": [], "source": [ @@ -363,7 +488,7 @@ { "cell_type": "code", "execution_count": null, - "id": "22", + "id": "31", "metadata": {}, "outputs": [], "source": [ @@ -372,24 +497,45 @@ }, { "cell_type": "markdown", - "id": "23", + "id": "32", "metadata": {}, "source": [ "## API Reference\n", "\n", "### `transform.cluster()` Parameters\n", "\n", - "| Parameter | Type | Description |\n", - "|-----------|------|-------------|\n", - "| `n_clusters` | `int` | Number of typical periods (e.g., 8 typical days) |\n", - "| `cluster_duration` | `str \\| float` | Duration per cluster ('1D', '24h') or hours |\n", - "| `weights` | `dict[str, float]` | Optional weights for time series in clustering |\n", - "| `time_series_for_high_peaks` | `list[str]` | **Essential**: Force inclusion of peak periods |\n", - "| `time_series_for_low_peaks` | `list[str]` | Force inclusion of minimum periods |\n", + "| Parameter | Type | Default | Description |\n", + "|-----------|------|---------|-------------|\n", + "| `n_clusters` | `int` | - | Number of typical periods (e.g., 8 typical days) |\n", + "| `cluster_duration` | `str \\| float` | - | Duration per cluster ('1D', '24h') or hours |\n", + "| `weights` | `dict[str, float]` | None | Optional weights for time series in clustering |\n", + "| `time_series_for_high_peaks` | `list[str]` | None | **Essential**: Force inclusion of peak periods |\n", + "| `time_series_for_low_peaks` | `list[str]` | None | Force inclusion of minimum periods |\n", + "| `cluster_method` | `str` | 'hierarchical' | Algorithm: 'hierarchical', 'k_means', 'k_medoids', 'k_maxoids', 'averaging' |\n", + "| `representation_method` | `str` | 'medoidRepresentation' | 'medoidRepresentation', 'meanRepresentation', 'distributionAndMinMaxRepresentation' |\n", + "| `extreme_period_method` | `str \\| None` | None | How peaks are integrated: None, 'append', 'new_cluster_center', 'replace_cluster_center' |\n", + "| `rescale_cluster_periods` | `bool` | True | Rescale clusters to match original means |\n", + "| `predef_cluster_order` | `array` | None | Manual cluster assignments |\n", + "| `**tsam_kwargs` | - | - | Additional tsam parameters |\n", + "\n", + "### Clustering Object Properties\n", + "\n", + "After clustering, access metadata via `fs.clustering`:\n", + "\n", + "| Property | Description |\n", + "|----------|-------------|\n", + "| `n_clusters` | Number of clusters |\n", + "| `n_original_clusters` | Number of original time segments (e.g., 365 days) |\n", + "| `timesteps_per_cluster` | Timesteps in each cluster (e.g., 24 for daily) |\n", + "| `cluster_order` | xr.DataArray mapping original segment → cluster ID |\n", + "| `occurrences` | How many original segments each cluster represents |\n", + "| `metrics` | xr.Dataset with RMSE, MAE per time series |\n", + "| `plot.compare()` | Compare original vs clustered time series |\n", + "| `plot.heatmap()` | Visualize cluster structure |\n", "\n", "### Storage Behavior\n", "\n", - "Each `Storage` component has a `cluster_storage_mode` parameter that controls how it behaves during clustering:\n", + "Each `Storage` component has a `cluster_mode` parameter:\n", "\n", "| Mode | Description |\n", "|------|-------------|\n", @@ -398,37 +544,12 @@ "| `'cyclic'` | Each cluster is independent but cyclic (start = end) |\n", "| `'independent'` | Each cluster is independent, free start/end |\n", "\n", - "For a detailed comparison of storage modes, see [08c2-clustering-storage-modes](08c2-clustering-storage-modes.ipynb).\n", - "\n", - "### Peak Forcing Format\n", - "\n", - "```python\n", - "time_series_for_high_peaks = ['ComponentName(FlowName)|fixed_relative_profile']\n", - "```\n", - "\n", - "### Recommended Workflow\n", - "\n", - "```python\n", - "# Stage 1: Fast sizing\n", - "fs_sizing = flow_system.transform.cluster(\n", - " n_clusters=8,\n", - " cluster_duration='1D',\n", - " time_series_for_high_peaks=['Demand(Flow)|fixed_relative_profile'],\n", - ")\n", - "fs_sizing.optimize(solver)\n", - "\n", - "# Apply safety margin\n", - "sizes = {k: v.item() * 1.05 for k, v in fs_sizing.statistics.sizes.items()}\n", - "\n", - "# Stage 2: Accurate dispatch\n", - "fs_dispatch = flow_system.transform.fix_sizes(sizes)\n", - "fs_dispatch.optimize(solver)\n", - "```" + "For a detailed comparison of storage modes, see [08c2-clustering-storage-modes](08c2-clustering-storage-modes.ipynb)." ] }, { "cell_type": "markdown", - "id": "24", + "id": "33", "metadata": {}, "source": [ "## Summary\n", @@ -439,13 +560,18 @@ "- Apply **peak forcing** to capture extreme demand days\n", "- Use **two-stage optimization** for fast yet accurate investment decisions\n", "- **Expand solutions** back to full resolution with `expand_solution()`\n", + "- Access **clustering metadata** via `fs.clustering` (metrics, cluster_order, occurrences)\n", + "- Use **advanced options** like different algorithms\n", + "- **Manually assign clusters** using `predef_cluster_order`\n", "\n", "### Key Takeaways\n", "\n", "1. **Always use peak forcing** (`time_series_for_high_peaks`) for demand time series\n", "2. **Add safety margin** (5-10%) when fixing sizes from clustering\n", "3. **Two-stage is recommended**: clustering for sizing, full resolution for dispatch\n", - "4. **Storage handling** is configurable via `storage_mode`\n", + "4. **Storage handling** is configurable via `cluster_mode`\n", + "5. **Check metrics** to evaluate clustering quality\n", + "6. **Use `predef_cluster_order`** to reproduce or define custom cluster assignments\n", "\n", "### Next Steps\n", "\n", diff --git a/docs/notebooks/09-plotting-and-data-access.ipynb b/docs/notebooks/09-plotting-and-data-access.ipynb index 20a7e6f4f..39fa788da 100644 --- a/docs/notebooks/09-plotting-and-data-access.ipynb +++ b/docs/notebooks/09-plotting-and-data-access.ipynb @@ -71,7 +71,10 @@ "multiperiod = create_multiperiod_system()\n", "multiperiod.optimize(solver)\n", "\n", - "simple" + "print('Created systems:')\n", + "print(f' simple: {len(simple.components)} components, {len(simple.buses)} buses')\n", + "print(f' complex_sys: {len(complex_sys.components)} components, {len(complex_sys.buses)} buses')\n", + "print(f' multiperiod: {len(multiperiod.components)} components, dims={dict(multiperiod.solution.sizes)}')" ] }, { @@ -611,12 +614,13 @@ "source": [ "nodes, edges = simple.topology.infos()\n", "\n", - "pd.DataFrame(\n", - " {\n", - " 'Nodes': {label: info['class'] for label, info in nodes.items()},\n", - " 'Edges': {label: f'{info[\"start\"]} -> {info[\"end\"]}' for label, info in edges.items()},\n", - " }\n", - ")" + "print('Nodes:')\n", + "for label, info in nodes.items():\n", + " print(f' {label}: {info[\"class\"]}')\n", + "\n", + "print('\\nEdges (flows):')\n", + "for label, info in edges.items():\n", + " print(f' {info[\"start\"]} -> {info[\"end\"]}: {label}')" ] }, { @@ -636,7 +640,10 @@ "metadata": {}, "outputs": [], "source": [ - "multiperiod" + "print('Multiperiod system dimensions:')\n", + "print(f' Periods: {list(multiperiod.periods)}')\n", + "print(f' Scenarios: {list(multiperiod.scenarios)}')\n", + "print(f' Solution dims: {dict(multiperiod.solution.sizes)}')" ] }, { @@ -741,7 +748,11 @@ "outputs": [], "source": [ "# Get plot result\n", - "result = simple.statistics.plot.balance('Heat')" + "result = simple.statistics.plot.balance('Heat')\n", + "\n", + "print('PlotResult contains:')\n", + "print(f' data: {type(result.data).__name__} with vars {list(result.data.data_vars)}')\n", + "print(f' figure: {type(result.figure).__name__}')" ] }, { diff --git a/docs/user-guide/optimization/clustering.md b/docs/user-guide/optimization/clustering.md index 7ec5faac1..793fbf8fe 100644 --- a/docs/user-guide/optimization/clustering.md +++ b/docs/user-guide/optimization/clustering.md @@ -52,6 +52,10 @@ flow_rates = fs_expanded.solution['Boiler(Q_th)|flow_rate'] | `cluster_duration` | Duration of each cluster | `'1D'`, `'24h'`, or `24` (hours) | | `time_series_for_high_peaks` | Time series where peak clusters must be captured | `['HeatDemand(Q)|fixed_relative_profile']` | | `time_series_for_low_peaks` | Time series where minimum clusters must be captured | `['SolarGen(P)|fixed_relative_profile']` | +| `cluster_method` | Clustering algorithm | `'k_means'`, `'hierarchical'`, `'k_medoids'` | +| `representation_method` | How clusters are represented | `'meanRepresentation'`, `'medoidRepresentation'` | +| `random_state` | Random seed for reproducibility | `42` | +| `rescale_cluster_periods` | Rescale clusters to match original means | `True` (default) | ### Peak Selection @@ -68,6 +72,58 @@ fs_clustered = flow_system.transform.cluster( Without peak selection, the clustering algorithm might average out extreme days, leading to undersized equipment. +### Advanced Clustering Options + +Fine-tune the clustering algorithm with advanced parameters: + +```python +fs_clustered = flow_system.transform.cluster( + n_clusters=8, + cluster_duration='1D', + cluster_method='hierarchical', # Alternative to k_means + representation_method='medoidRepresentation', # Use actual periods, not averages + rescale_cluster_periods=True, # Match original time series means + random_state=42, # Reproducible results +) +``` + +**Available clustering algorithms** (`cluster_method`): + +| Method | Description | +|--------|-------------| +| `'k_means'` | Fast, good for most cases (default) | +| `'hierarchical'` | Produces consistent hierarchical groupings | +| `'k_medoids'` | Uses actual periods as representatives | +| `'k_maxoids'` | Maximizes representativeness | +| `'averaging'` | Simple averaging of similar periods | + +For advanced tsam parameters not exposed directly, use `**kwargs`: + +```python +# Pass any tsam.TimeSeriesAggregation parameter +fs_clustered = flow_system.transform.cluster( + n_clusters=8, + cluster_duration='1D', + sameMean=True, # Normalize all time series to same mean + sortValues=True, # Cluster by duration curves instead of shape +) +``` + +### Clustering Quality Metrics + +After clustering, access quality metrics to evaluate the aggregation accuracy: + +```python +fs_clustered = flow_system.transform.cluster(n_clusters=8, cluster_duration='1D') + +# Access clustering metrics (xr.Dataset) +metrics = fs_clustered.clustering.metrics +print(metrics) # Shows RMSE, MAE, etc. per time series + +# Access specific metric +rmse = metrics['RMSE'] # xr.DataArray with dims [time_series, period?, scenario?] +``` + ## Storage Modes Storage behavior during clustering is controlled via the `cluster_mode` parameter: diff --git a/docs/user-guide/results-plotting.md b/docs/user-guide/results-plotting.md index 1ecd26aa1..28e3d2b2b 100644 --- a/docs/user-guide/results-plotting.md +++ b/docs/user-guide/results-plotting.md @@ -2,6 +2,9 @@ After solving an optimization, flixOpt provides a powerful plotting API to visualize and analyze your results. The API is designed to be intuitive and chainable, giving you quick access to common plots while still allowing deep customization. +!!! tip "Plotting Custom Data" + For plotting arbitrary xarray data (not just flixopt results), see the [Custom Data Plotting](recipes/plotting-custom-data.md) guide which covers the `.fxplot` accessor. + ## The Plot Accessor All plotting is accessed through the `statistics.plot` accessor on your FlowSystem: diff --git a/flixopt/clustering/base.py b/flixopt/clustering/base.py index 4b31832e4..0f154484b 100644 --- a/flixopt/clustering/base.py +++ b/flixopt/clustering/base.py @@ -38,15 +38,15 @@ class ClusterStructure: which is needed for proper storage state-of-charge tracking across typical periods when using cluster(). - Note: "original_period" here refers to the original time chunks before - clustering (e.g., 365 original days), NOT the model's "period" dimension - (years/months). Each original time chunk gets assigned to a cluster. + Note: The "original_cluster" dimension indexes the original cluster-sized + time segments (e.g., 0..364 for 365 days), NOT the model's "period" dimension + (years). Each original segment gets assigned to a representative cluster. Attributes: - cluster_order: Maps each original time chunk index to its cluster ID. - dims: [original_period] for simple case, or - [original_period, period, scenario] for multi-period/scenario systems. - Values are cluster indices (0 to n_clusters-1). + cluster_order: Maps original cluster index → representative cluster ID. + dims: [original_cluster] for simple case, or + [original_cluster, period, scenario] for multi-period/scenario systems. + Values are cluster IDs (0 to n_clusters-1). cluster_occurrences: Count of how many original time chunks each cluster represents. dims: [cluster] for simple case, or [cluster, period, scenario] for multi-dim. n_clusters: Number of distinct clusters (typical periods). @@ -60,7 +60,7 @@ class ClusterStructure: - timesteps_per_cluster: 24 (for hourly data) For multi-scenario (e.g., 2 scenarios): - - cluster_order: shape (365, 2) with dims [original_period, scenario] + - cluster_order: shape (365, 2) with dims [original_cluster, scenario] - cluster_occurrences: shape (8, 2) with dims [cluster, scenario] """ @@ -73,7 +73,7 @@ def __post_init__(self): """Validate and ensure proper DataArray formatting.""" # Ensure cluster_order is a DataArray with proper dims if not isinstance(self.cluster_order, xr.DataArray): - self.cluster_order = xr.DataArray(self.cluster_order, dims=['original_period'], name='cluster_order') + self.cluster_order = xr.DataArray(self.cluster_order, dims=['original_cluster'], name='cluster_order') elif self.cluster_order.name is None: self.cluster_order = self.cluster_order.rename('cluster_order') @@ -92,7 +92,7 @@ def __repr__(self) -> str: occ = [int(self.cluster_occurrences.sel(cluster=c).values) for c in range(n_clusters)] return ( f'ClusterStructure(\n' - f' {self.n_original_periods} original periods → {n_clusters} clusters\n' + f' {self.n_original_clusters} original periods → {n_clusters} clusters\n' f' timesteps_per_cluster={self.timesteps_per_cluster}\n' f' occurrences={occ}\n' f')' @@ -124,9 +124,9 @@ def _create_reference_structure(self) -> tuple[dict, dict[str, xr.DataArray]]: return ref, arrays @property - def n_original_periods(self) -> int: + def n_original_clusters(self) -> int: """Number of original periods (before clustering).""" - return len(self.cluster_order.coords['original_period']) + return len(self.cluster_order.coords['original_cluster']) @property def has_multi_dims(self) -> bool: @@ -197,20 +197,20 @@ def get_cluster_weight_per_timestep(self) -> xr.DataArray: name='cluster_weight', ) - def plot(self, show: bool | None = None) -> PlotResult: + def plot(self, colors: str | list[str] | None = None, show: bool | None = None) -> PlotResult: """Plot cluster assignment visualization. Shows which cluster each original period belongs to, and the number of occurrences per cluster. Args: + colors: Colorscale name (str) or list of colors. + Defaults to CONFIG.Plotting.default_sequential_colorscale. show: Whether to display the figure. Defaults to CONFIG.Plotting.default_show. Returns: PlotResult containing the figure and underlying data. """ - import plotly.express as px - from ..config import CONFIG from ..plot_result import PlotResult @@ -218,27 +218,24 @@ def plot(self, show: bool | None = None) -> PlotResult: int(self.n_clusters) if isinstance(self.n_clusters, (int, np.integer)) else int(self.n_clusters.values) ) - # Create DataFrame for plotting - import pandas as pd - cluster_order = self.get_cluster_order_for_slice() - df = pd.DataFrame( - { - 'Original Period': range(1, len(cluster_order) + 1), - 'Cluster': cluster_order, - } + + # Build DataArray for fxplot heatmap + cluster_da = xr.DataArray( + cluster_order.reshape(1, -1), + dims=['y', 'original_cluster'], + coords={'y': ['Cluster'], 'original_cluster': range(1, len(cluster_order) + 1)}, + name='cluster_assignment', ) - # Bar chart showing cluster assignment - fig = px.bar( - df, - x='Original Period', - y=[1] * len(df), - color='Cluster', - color_continuous_scale='Viridis', - title=f'Cluster Assignment ({self.n_original_periods} periods → {n_clusters} clusters)', + # Use fxplot.heatmap for smart defaults + colorscale = colors or CONFIG.Plotting.default_sequential_colorscale + fig = cluster_da.fxplot.heatmap( + colors=colorscale, + title=f'Cluster Assignment ({self.n_original_clusters} periods → {n_clusters} clusters)', ) - fig.update_layout(yaxis_visible=False, coloraxis_colorbar_title='Cluster') + fig.update_yaxes(showticklabels=False) + fig.update_coloraxes(colorbar_title='Cluster') # Build data for PlotResult data = xr.Dataset( @@ -532,30 +529,30 @@ def validate(self) -> None: # (each weight is how many original periods that cluster represents) # Sum should be checked per period/scenario slice, not across all dimensions if self.cluster_structure is not None: - n_original_periods = self.cluster_structure.n_original_periods + n_original_clusters = self.cluster_structure.n_original_clusters # Sum over cluster dimension only (keep period/scenario if present) weight_sum_per_slice = self.representative_weights.sum(dim='cluster') # Check each slice if weight_sum_per_slice.size == 1: # Simple case: no period/scenario weight_sum = float(weight_sum_per_slice.values) - if abs(weight_sum - n_original_periods) > 1e-6: + if abs(weight_sum - n_original_clusters) > 1e-6: import warnings warnings.warn( f'representative_weights sum ({weight_sum}) does not match ' - f'n_original_periods ({n_original_periods})', + f'n_original_clusters ({n_original_clusters})', stacklevel=2, ) else: # Multi-dimensional: check each slice for val in weight_sum_per_slice.values.flat: - if abs(float(val) - n_original_periods) > 1e-6: + if abs(float(val) - n_original_clusters) > 1e-6: import warnings warnings.warn( f'representative_weights sum per slice ({float(val)}) does not match ' - f'n_original_periods ({n_original_periods})', + f'n_original_clusters ({n_original_clusters})', stacklevel=2, ) break # Only warn once @@ -585,8 +582,10 @@ def compare( *, select: SelectType | None = None, colors: ColorType | None = None, - facet_col: str | None = 'period', - facet_row: str | None = 'scenario', + color: str | None = 'auto', + line_dash: str | None = 'representation', + facet_col: str | None = 'auto', + facet_row: str | None = 'auto', show: bool | None = None, **plotly_kwargs: Any, ) -> PlotResult: @@ -600,8 +599,14 @@ def compare( or None to plot all time-varying variables. select: xarray-style selection dict, e.g. {'scenario': 'Base Case'}. colors: Color specification (colorscale name, color list, or label-to-color dict). - facet_col: Dimension for subplot columns (default: 'period'). - facet_row: Dimension for subplot rows (default: 'scenario'). + color: Dimension for line colors. 'auto' uses CONFIG priority (typically 'variable'). + Use 'representation' to color by Original/Clustered instead of line_dash. + line_dash: Dimension for line dash styles. Defaults to 'representation'. + Set to None to disable line dash differentiation. + facet_col: Dimension for subplot columns. 'auto' uses CONFIG priority. + Use 'variable' to create separate columns per variable. + facet_row: Dimension for subplot rows. 'auto' uses CONFIG priority. + Use 'variable' to create separate rows per variable. show: Whether to display the figure. Defaults to CONFIG.Plotting.default_show. **plotly_kwargs: Additional arguments passed to plotly. @@ -610,9 +615,7 @@ def compare( PlotResult containing the comparison figure and underlying data. """ import pandas as pd - import plotly.express as px - from ..color_processing import process_colors from ..config import CONFIG from ..plot_result import PlotResult from ..statistics_accessor import _apply_selection @@ -626,7 +629,7 @@ def compare( resolved_variables = self._resolve_variables(variables) - # Build Dataset with 'representation' dimension for Original/Clustered + # Build Dataset with variables as data_vars data_vars = {} for var in resolved_variables: original = result.original_data[var] @@ -650,54 +653,41 @@ def compare( { var: xr.DataArray( [sorted_vars[(var, r)] for r in ['Original', 'Clustered']], - dims=['representation', 'rank'], - coords={'representation': ['Original', 'Clustered'], 'rank': range(n)}, + dims=['representation', 'duration'], + coords={'representation': ['Original', 'Clustered'], 'duration': range(n)}, ) for var in resolved_variables } ) - # Resolve facets (only for timeseries) - actual_facet_col = facet_col if kind == 'timeseries' and facet_col in ds.dims else None - actual_facet_row = facet_row if kind == 'timeseries' and facet_row in ds.dims else None - - # Convert to long-form DataFrame - df = ds.to_dataframe().reset_index() - coord_cols = [c for c in ds.coords.keys() if c in df.columns] - df = df.melt(id_vars=coord_cols, var_name='variable', value_name='value') - - variable_labels = df['variable'].unique().tolist() - color_map = process_colors(colors, variable_labels, CONFIG.Plotting.default_qualitative_colorscale) - - # Set x-axis and title based on kind - x_col = 'time' if kind == 'timeseries' else 'rank' + # Set title based on kind if kind == 'timeseries': title = ( 'Original vs Clustered' if len(resolved_variables) > 1 else f'Original vs Clustered: {resolved_variables[0]}' ) - labels = {} else: title = 'Duration Curve' if len(resolved_variables) > 1 else f'Duration Curve: {resolved_variables[0]}' - labels = {'rank': 'Hours (sorted)', 'value': 'Value'} - - fig = px.line( - df, - x=x_col, - y='value', - color='variable', - line_dash='representation', - facet_col=actual_facet_col, - facet_row=actual_facet_row, + + # Use fxplot for smart defaults + line_kwargs = {} + if line_dash is not None: + line_kwargs['line_dash'] = line_dash + if line_dash == 'representation': + line_kwargs['line_dash_map'] = {'Original': 'dot', 'Clustered': 'solid'} + + fig = ds.fxplot.line( + colors=colors, + color=color, title=title, - labels=labels, - color_discrete_map=color_map, + facet_col=facet_col, + facet_row=facet_row, + **line_kwargs, **plotly_kwargs, ) - if actual_facet_row or actual_facet_col: - fig.update_yaxes(matches=None) - fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1])) + fig.update_yaxes(matches=None) + fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1])) plot_result = PlotResult(data=ds, figure=fig) @@ -743,8 +733,8 @@ def heatmap( *, select: SelectType | None = None, colors: str | list[str] | None = None, - facet_col: str | None = 'period', - animation_frame: str | None = 'scenario', + facet_col: str | None = 'auto', + animation_frame: str | None = 'auto', show: bool | None = None, **plotly_kwargs: Any, ) -> PlotResult: @@ -762,8 +752,8 @@ def heatmap( colors: Colorscale name (str) or list of colors for heatmap coloring. Dicts are not supported for heatmaps. Defaults to CONFIG.Plotting.default_sequential_colorscale. - facet_col: Dimension to facet on columns (default: 'period'). - animation_frame: Dimension for animation slider (default: 'scenario'). + facet_col: Dimension to facet on columns. 'auto' uses CONFIG priority. + animation_frame: Dimension for animation slider. 'auto' uses CONFIG priority. show: Whether to display the figure. Defaults to CONFIG.Plotting.default_show. **plotly_kwargs: Additional arguments passed to plotly. @@ -773,7 +763,6 @@ def heatmap( The data has 'cluster' variable with time dimension, matching original timesteps. """ import pandas as pd - import plotly.express as px from ..config import CONFIG from ..plot_result import PlotResult @@ -833,34 +822,29 @@ def heatmap( else: cluster_da = cluster_slices[(None, None)] - # Resolve facet_col and animation_frame - only use if dimension exists - actual_facet_col = facet_col if facet_col and facet_col in cluster_da.dims else None - actual_animation = animation_frame if animation_frame and animation_frame in cluster_da.dims else None - # Add dummy y dimension for heatmap visualization (single row) heatmap_da = cluster_da.expand_dims('y', axis=-1) heatmap_da = heatmap_da.assign_coords(y=['Cluster']) + heatmap_da.name = 'cluster_assignment' - colorscale = colors or CONFIG.Plotting.default_sequential_colorscale + # Reorder dims so 'time' and 'y' are first (heatmap x/y axes) + # Other dims (period, scenario) will be used for faceting/animation + target_order = ['time', 'y'] + [d for d in heatmap_da.dims if d not in ('time', 'y')] + heatmap_da = heatmap_da.transpose(*target_order) - # Use px.imshow with xr.DataArray - fig = px.imshow( - heatmap_da, - color_continuous_scale=colorscale, - facet_col=actual_facet_col, - animation_frame=actual_animation, + # Use fxplot.heatmap for smart defaults + fig = heatmap_da.fxplot.heatmap( + colors=colors, title='Cluster Assignments', - labels={'time': 'Time', 'color': 'Cluster'}, + facet_col=facet_col, + animation_frame=animation_frame, aspect='auto', **plotly_kwargs, ) - # Clean up facet labels - if actual_facet_col: - fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1])) - - # Hide y-axis since it's just a single row + # Clean up: hide y-axis since it's just a single row fig.update_yaxes(showticklabels=False) + fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1])) # Data is exactly what we plotted (without dummy y dimension) cluster_da.name = 'cluster' @@ -880,21 +864,27 @@ def clusters( *, select: SelectType | None = None, colors: ColorType | None = None, - facet_col_wrap: int | None = None, + color: str | None = 'auto', + facet_col: str | None = 'cluster', + facet_cols: int | None = None, show: bool | None = None, **plotly_kwargs: Any, ) -> PlotResult: """Plot each cluster's typical period profile. - Shows each cluster as a separate faceted subplot. Useful for - understanding what each cluster represents. + Shows each cluster as a separate faceted subplot with all variables + colored differently. Useful for understanding what each cluster represents. Args: variables: Variable(s) to plot. Can be a string, list of strings, or None to plot all time-varying variables. select: xarray-style selection dict, e.g. {'scenario': 'Base Case'}. colors: Color specification (colorscale name, color list, or label-to-color dict). - facet_col_wrap: Max columns before wrapping facets. + color: Dimension for line colors. 'auto' uses CONFIG priority (typically 'variable'). + Use 'cluster' to color by cluster instead of faceting. + facet_col: Dimension for subplot columns. Defaults to 'cluster'. + Use 'variable' to facet by variable instead. + facet_cols: Max columns before wrapping facets. Defaults to CONFIG.Plotting.default_facet_cols. show: Whether to display the figure. Defaults to CONFIG.Plotting.default_show. @@ -903,10 +893,6 @@ def clusters( Returns: PlotResult containing the figure and underlying data. """ - import pandas as pd - import plotly.express as px - - from ..color_processing import process_colors from ..config import CONFIG from ..plot_result import PlotResult from ..statistics_accessor import _apply_selection @@ -929,45 +915,37 @@ def clusters( n_clusters = int(cs.n_clusters) if isinstance(cs.n_clusters, (int, np.integer)) else int(cs.n_clusters.values) timesteps_per_cluster = cs.timesteps_per_cluster - # Build long-form DataFrame with cluster labels including occurrence counts - rows = [] + # Build Dataset with cluster dimension, using labels with occurrence counts + cluster_labels = [ + f'Cluster {c} (×{int(cs.cluster_occurrences.sel(cluster=c).values)})' for c in range(n_clusters) + ] + data_vars = {} for var in resolved_variables: data = aggregated_data[var].values data_by_cluster = data.reshape(n_clusters, timesteps_per_cluster) data_vars[var] = xr.DataArray( data_by_cluster, - dims=['cluster', 'timestep'], - coords={'cluster': range(n_clusters), 'timestep': range(timesteps_per_cluster)}, + dims=['cluster', 'time'], + coords={'cluster': cluster_labels, 'time': range(timesteps_per_cluster)}, ) - for c in range(n_clusters): - occurrence = int(cs.cluster_occurrences.sel(cluster=c).values) - label = f'Cluster {c} (×{occurrence})' - for t in range(timesteps_per_cluster): - rows.append({'cluster': label, 'timestep': t, 'value': data_by_cluster[c, t], 'variable': var}) - df = pd.DataFrame(rows) - - cluster_labels = df['cluster'].unique().tolist() - color_map = process_colors(colors, cluster_labels, CONFIG.Plotting.default_qualitative_colorscale) - facet_col_wrap = facet_col_wrap or CONFIG.Plotting.default_facet_cols + + ds = xr.Dataset(data_vars) title = 'Clusters' if len(resolved_variables) > 1 else f'Clusters: {resolved_variables[0]}' - fig = px.line( - df, - x='timestep', - y='value', - facet_col='cluster', - facet_row='variable' if len(resolved_variables) > 1 else None, - facet_col_wrap=facet_col_wrap if len(resolved_variables) == 1 else None, + # Use fxplot for smart defaults + fig = ds.fxplot.line( + colors=colors, + color=color, title=title, - color_discrete_map=color_map, + facet_col=facet_col, + facet_cols=facet_cols, **plotly_kwargs, ) - fig.update_layout(showlegend=False) - if len(resolved_variables) > 1: - fig.update_yaxes(matches=None) - fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1])) + fig.update_yaxes(matches=None) + fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1])) + # Include occurrences in result data data_vars['occurrences'] = cs.cluster_occurrences result_data = xr.Dataset(data_vars) plot_result = PlotResult(data=result_data, figure=fig) @@ -993,6 +971,9 @@ class Clustering: Attributes: result: The ClusterResult from the aggregation backend. backend_name: Name of the aggregation backend used (e.g., 'tsam', 'manual'). + metrics: Clustering quality metrics (RMSE, MAE, etc.) as xr.Dataset. + Each metric (e.g., 'RMSE', 'MAE') is a DataArray with dims + ``[time_series, period?, scenario?]``. Example: >>> fs_clustered = flow_system.transform.cluster(n_clusters=8, cluster_duration='1D') @@ -1004,6 +985,7 @@ class Clustering: result: ClusterResult backend_name: str = 'unknown' + metrics: xr.Dataset | None = None def _create_reference_structure(self) -> tuple[dict, dict[str, xr.DataArray]]: """Create reference structure for serialization.""" @@ -1026,7 +1008,7 @@ def __repr__(self) -> str: n_clusters = ( int(cs.n_clusters) if isinstance(cs.n_clusters, (int, np.integer)) else int(cs.n_clusters.values) ) - structure_info = f'{cs.n_original_periods} periods → {n_clusters} clusters' + structure_info = f'{cs.n_original_clusters} periods → {n_clusters} clusters' else: structure_info = 'no structure' return f'Clustering(\n backend={self.backend_name!r}\n {structure_info}\n)' @@ -1071,11 +1053,11 @@ def n_clusters(self) -> int: return int(n) if isinstance(n, (int, np.integer)) else int(n.values) @property - def n_original_periods(self) -> int: + def n_original_clusters(self) -> int: """Number of original periods (before clustering).""" if self.result.cluster_structure is None: raise ValueError('No cluster_structure available') - return self.result.cluster_structure.n_original_periods + return self.result.cluster_structure.n_original_clusters @property def timesteps_per_period(self) -> int: @@ -1152,17 +1134,17 @@ def create_cluster_structure_from_mapping( ClusterStructure derived from the mapping. """ n_original = len(timestep_mapping) - n_original_periods = n_original // timesteps_per_cluster + n_original_clusters = n_original // timesteps_per_cluster # Determine cluster order from the mapping # Each original period maps to the cluster of its first timestep cluster_order = [] - for p in range(n_original_periods): + for p in range(n_original_clusters): start_idx = p * timesteps_per_cluster cluster_idx = int(timestep_mapping.isel(original_time=start_idx).values) // timesteps_per_cluster cluster_order.append(cluster_idx) - cluster_order_da = xr.DataArray(cluster_order, dims=['original_period'], name='cluster_order') + cluster_order_da = xr.DataArray(cluster_order, dims=['original_cluster'], name='cluster_order') # Count occurrences of each cluster unique_clusters = np.unique(cluster_order) diff --git a/flixopt/clustering/intercluster_helpers.py b/flixopt/clustering/intercluster_helpers.py index d2a5eb9d3..a89a80862 100644 --- a/flixopt/clustering/intercluster_helpers.py +++ b/flixopt/clustering/intercluster_helpers.py @@ -132,7 +132,7 @@ def extract_capacity_bounds( def build_boundary_coords( - n_original_periods: int, + n_original_clusters: int, flow_system: FlowSystem, ) -> tuple[dict, list[str]]: """Build coordinates and dimensions for SOC_boundary variable. @@ -146,7 +146,7 @@ def build_boundary_coords( multi-period or stochastic optimizations. Args: - n_original_periods: Number of original (non-aggregated) time periods. + n_original_clusters: Number of original (non-aggregated) time periods. For example, if a year is clustered into 8 typical days but originally had 365 days, this would be 365. flow_system: The FlowSystem containing optional period/scenario dimensions. @@ -163,7 +163,7 @@ def build_boundary_coords( >>> coords['cluster_boundary'] array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]) """ - n_boundaries = n_original_periods + 1 + n_boundaries = n_original_clusters + 1 coords = {'cluster_boundary': np.arange(n_boundaries)} dims = ['cluster_boundary'] diff --git a/flixopt/components.py b/flixopt/components.py index 390fc6f02..e962791d8 100644 --- a/flixopt/components.py +++ b/flixopt/components.py @@ -1195,7 +1195,7 @@ class InterclusterStorageModel(StorageModel): Variables Created ----------------- - ``SOC_boundary``: Absolute SOC at each original period boundary. - Shape: (n_original_periods + 1,) plus any period/scenario dimensions. + Shape: (n_original_clusters + 1,) plus any period/scenario dimensions. Constraints Created ------------------- @@ -1330,7 +1330,7 @@ def _add_intercluster_linking(self) -> None: else int(cluster_structure.n_clusters.values) ) timesteps_per_cluster = cluster_structure.timesteps_per_cluster - n_original_periods = cluster_structure.n_original_periods + n_original_clusters = cluster_structure.n_original_clusters cluster_order = cluster_structure.cluster_order # 1. Constrain ΔE = 0 at cluster starts @@ -1338,7 +1338,7 @@ def _add_intercluster_linking(self) -> None: # 2. Create SOC_boundary variable flow_system = self._model.flow_system - boundary_coords, boundary_dims = build_boundary_coords(n_original_periods, flow_system) + boundary_coords, boundary_dims = build_boundary_coords(n_original_clusters, flow_system) capacity_bounds = extract_capacity_bounds(self.element.capacity_in_flow_hours, boundary_coords, boundary_dims) soc_boundary = self.add_variables( @@ -1360,12 +1360,14 @@ def _add_intercluster_linking(self) -> None: delta_soc = self._compute_delta_soc(n_clusters, timesteps_per_cluster) # 5. Add linking constraints - self._add_linking_constraints(soc_boundary, delta_soc, cluster_order, n_original_periods, timesteps_per_cluster) + self._add_linking_constraints( + soc_boundary, delta_soc, cluster_order, n_original_clusters, timesteps_per_cluster + ) # 6. Add cyclic or initial constraint if self.element.cluster_mode == 'intercluster_cyclic': self.add_constraints( - soc_boundary.isel(cluster_boundary=0) == soc_boundary.isel(cluster_boundary=n_original_periods), + soc_boundary.isel(cluster_boundary=0) == soc_boundary.isel(cluster_boundary=n_original_clusters), short_name='cyclic', ) else: @@ -1375,7 +1377,8 @@ def _add_intercluster_linking(self) -> None: if isinstance(initial, str): # 'equals_final' means cyclic self.add_constraints( - soc_boundary.isel(cluster_boundary=0) == soc_boundary.isel(cluster_boundary=n_original_periods), + soc_boundary.isel(cluster_boundary=0) + == soc_boundary.isel(cluster_boundary=n_original_clusters), short_name='initial_SOC_boundary', ) else: @@ -1389,7 +1392,7 @@ def _add_intercluster_linking(self) -> None: soc_boundary, cluster_order, capacity_bounds.has_investment, - n_original_periods, + n_original_clusters, timesteps_per_cluster, ) @@ -1438,7 +1441,7 @@ def _add_linking_constraints( soc_boundary: xr.DataArray, delta_soc: xr.DataArray, cluster_order: xr.DataArray, - n_original_periods: int, + n_original_clusters: int, timesteps_per_cluster: int, ) -> None: """Add constraints linking consecutive SOC_boundary values. @@ -1455,17 +1458,17 @@ def _add_linking_constraints( soc_boundary: SOC_boundary variable. delta_soc: Net SOC change per cluster. cluster_order: Mapping from original periods to representative clusters. - n_original_periods: Number of original (non-clustered) periods. + n_original_clusters: Number of original (non-clustered) periods. timesteps_per_cluster: Number of timesteps in each cluster period. """ soc_after = soc_boundary.isel(cluster_boundary=slice(1, None)) soc_before = soc_boundary.isel(cluster_boundary=slice(None, -1)) # Rename for alignment - soc_after = soc_after.rename({'cluster_boundary': 'original_period'}) - soc_after = soc_after.assign_coords(original_period=np.arange(n_original_periods)) - soc_before = soc_before.rename({'cluster_boundary': 'original_period'}) - soc_before = soc_before.assign_coords(original_period=np.arange(n_original_periods)) + soc_after = soc_after.rename({'cluster_boundary': 'original_cluster'}) + soc_after = soc_after.assign_coords(original_cluster=np.arange(n_original_clusters)) + soc_before = soc_before.rename({'cluster_boundary': 'original_cluster'}) + soc_before = soc_before.assign_coords(original_cluster=np.arange(n_original_clusters)) # Get delta_soc for each original period using cluster_order delta_soc_ordered = delta_soc.isel(cluster=cluster_order) @@ -1484,7 +1487,7 @@ def _add_combined_bound_constraints( soc_boundary: xr.DataArray, cluster_order: xr.DataArray, has_investment: bool, - n_original_periods: int, + n_original_clusters: int, timesteps_per_cluster: int, ) -> None: """Add constraints ensuring actual SOC stays within bounds. @@ -1498,21 +1501,21 @@ def _add_combined_bound_constraints( middle, and end of each cluster. With 2D (cluster, time) structure, we simply select charge_state at a - given time offset, then reorder by cluster_order to get original_period order. + given time offset, then reorder by cluster_order to get original_cluster order. Args: soc_boundary: SOC_boundary variable. cluster_order: Mapping from original periods to clusters. has_investment: Whether the storage has investment sizing. - n_original_periods: Number of original periods. + n_original_clusters: Number of original periods. timesteps_per_cluster: Timesteps in each cluster. """ charge_state = self.charge_state # soc_d: SOC at start of each original period soc_d = soc_boundary.isel(cluster_boundary=slice(None, -1)) - soc_d = soc_d.rename({'cluster_boundary': 'original_period'}) - soc_d = soc_d.assign_coords(original_period=np.arange(n_original_periods)) + soc_d = soc_d.rename({'cluster_boundary': 'original_cluster'}) + soc_d = soc_d.assign_coords(original_cluster=np.arange(n_original_clusters)) # Get self-discharge rate for decay calculation # Keep as DataArray to respect per-period/scenario values @@ -1523,13 +1526,13 @@ def _add_combined_bound_constraints( for sample_name, offset in zip(['start', 'mid', 'end'], sample_offsets, strict=False): # With 2D structure: select time offset, then reorder by cluster_order cs_at_offset = charge_state.isel(time=offset) # Shape: (cluster, ...) - # Reorder to original_period order using cluster_order indexer + # Reorder to original_cluster order using cluster_order indexer cs_t = cs_at_offset.isel(cluster=cluster_order) # Suppress xarray warning about index loss - we immediately assign new coords anyway with warnings.catch_warnings(): warnings.filterwarnings('ignore', message='.*does not create an index anymore.*') - cs_t = cs_t.rename({'cluster': 'original_period'}) - cs_t = cs_t.assign_coords(original_period=np.arange(n_original_periods)) + cs_t = cs_t.rename({'cluster': 'original_cluster'}) + cs_t = cs_t.assign_coords(original_cluster=np.arange(n_original_clusters)) # Apply decay factor (1-loss)^t to SOC_boundary per Eq. 9 decay_t = (1 - rel_loss) ** offset diff --git a/flixopt/config.py b/flixopt/config.py index 454f8ad3e..3bc3d5ebf 100644 --- a/flixopt/config.py +++ b/flixopt/config.py @@ -30,7 +30,7 @@ logging.addLevelName(SUCCESS_LEVEL, 'SUCCESS') # Deprecation removal version - update this when planning the next major version -DEPRECATION_REMOVAL_VERSION = '6.0.0' +DEPRECATION_REMOVAL_VERSION = '7.0.0' class MultilineFormatter(logging.Formatter): @@ -164,9 +164,8 @@ def format(self, record): 'default_sequential_colorscale': 'turbo', 'default_qualitative_colorscale': 'plotly', 'default_line_shape': 'hv', - 'extra_dim_priority': ('cluster', 'period', 'scenario'), - 'dim_slot_priority': ('facet_col', 'facet_row', 'animation_frame'), - 'x_dim_priority': ('time', 'duration', 'duration_pct', 'period', 'scenario', 'cluster'), + 'dim_priority': ('time', 'duration', 'duration_pct', 'variable', 'cluster', 'period', 'scenario'), + 'slot_priority': ('x', 'color', 'facet_col', 'facet_row', 'animation_frame'), } ), 'solving': MappingProxyType( @@ -562,9 +561,10 @@ class Plotting: default_facet_cols: Default number of columns for faceted plots. default_sequential_colorscale: Default colorscale for heatmaps and continuous data. default_qualitative_colorscale: Default colormap for categorical plots (bar/line/area charts). - extra_dim_priority: Order of extra dimensions when auto-assigning to slots. - dim_slot_priority: Order of slots to fill with extra dimensions. - x_dim_priority: Order of dimensions to prefer for x-axis when 'auto'. + dim_priority: Priority order for assigning dimensions to plot slots. + Dimensions are assigned to slots based on this order. + slot_priority: Order in which slots are filled during auto-assignment. + Default: x → color → facet_col → facet_row → animation_frame. Examples: ```python @@ -573,9 +573,11 @@ class Plotting: CONFIG.Plotting.default_sequential_colorscale = 'plasma' CONFIG.Plotting.default_qualitative_colorscale = 'Dark24' - # Customize dimension handling for faceting - CONFIG.Plotting.extra_dim_priority = ('scenario', 'period', 'cluster') - CONFIG.Plotting.dim_slot_priority = ('facet_row', 'facet_col', 'animation_frame') + # Customize dimension priority for auto-assignment + CONFIG.Plotting.dim_priority = ('time', 'scenario', 'variable', 'period', 'cluster') + + # Change slot fill order (e.g., prioritize facets over color) + CONFIG.Plotting.slot_priority = ('x', 'facet_col', 'facet_row', 'color', 'animation_frame') ``` """ @@ -586,9 +588,8 @@ class Plotting: default_sequential_colorscale: str = _DEFAULTS['plotting']['default_sequential_colorscale'] default_qualitative_colorscale: str = _DEFAULTS['plotting']['default_qualitative_colorscale'] default_line_shape: str = _DEFAULTS['plotting']['default_line_shape'] - extra_dim_priority: tuple[str, ...] = _DEFAULTS['plotting']['extra_dim_priority'] - dim_slot_priority: tuple[str, ...] = _DEFAULTS['plotting']['dim_slot_priority'] - x_dim_priority: tuple[str, ...] = _DEFAULTS['plotting']['x_dim_priority'] + dim_priority: tuple[str, ...] = _DEFAULTS['plotting']['dim_priority'] + slot_priority: tuple[str, ...] = _DEFAULTS['plotting']['slot_priority'] class Carriers: """Default carrier definitions for common energy types. @@ -690,9 +691,8 @@ def to_dict(cls) -> dict: 'default_sequential_colorscale': cls.Plotting.default_sequential_colorscale, 'default_qualitative_colorscale': cls.Plotting.default_qualitative_colorscale, 'default_line_shape': cls.Plotting.default_line_shape, - 'extra_dim_priority': cls.Plotting.extra_dim_priority, - 'dim_slot_priority': cls.Plotting.dim_slot_priority, - 'x_dim_priority': cls.Plotting.x_dim_priority, + 'dim_priority': cls.Plotting.dim_priority, + 'slot_priority': cls.Plotting.slot_priority, }, } diff --git a/flixopt/dataset_plot_accessor.py b/flixopt/dataset_plot_accessor.py index a022f3988..6c833e652 100644 --- a/flixopt/dataset_plot_accessor.py +++ b/flixopt/dataset_plot_accessor.py @@ -2,6 +2,7 @@ from __future__ import annotations +import warnings from typing import Any, Literal import pandas as pd @@ -13,59 +14,130 @@ from .config import CONFIG -def _get_x_dim(dims: list[str], x: str | Literal['auto'] | None = 'auto') -> str: - """Select x-axis dim from priority list, or 'variable' for scalar data.""" - if x and x != 'auto': - return x - - # Check priority list first - for dim in CONFIG.Plotting.x_dim_priority: - if dim in dims: - return dim - - # Fallback to first available dimension, or 'variable' for scalar data - return dims[0] if dims else 'variable' - - -def _resolve_auto_facets( +def assign_slots( ds: xr.Dataset, - facet_col: str | Literal['auto'] | None, - facet_row: str | Literal['auto'] | None, - animation_frame: str | Literal['auto'] | None = None, + *, + x: str | Literal['auto'] | None = 'auto', + color: str | Literal['auto'] | None = 'auto', + facet_col: str | Literal['auto'] | None = 'auto', + facet_row: str | Literal['auto'] | None = 'auto', + animation_frame: str | Literal['auto'] | None = 'auto', exclude_dims: set[str] | None = None, -) -> tuple[str | None, str | None, str | None]: - """Assign 'auto' facet slots from available dims using CONFIG priority lists.""" - # Get available extra dimensions with size > 1, excluding specified dims +) -> dict[str, str | None]: + """Assign dimensions to plot slots using CONFIG.Plotting.dim_priority. + + Dimensions are assigned in priority order to slots based on CONFIG.Plotting.slot_priority. + + Slot values: + - 'auto': auto-assign from available dims using priority + - None: skip this slot (not available for this plot type) + - str: use this specific dimension + + 'variable' is treated as a dimension when len(data_vars) > 1. It represents + the data_var names column in the melted DataFrame. + + Args: + ds: Dataset to analyze for available dimensions. + x: X-axis dimension. 'auto' assigns first available from priority. + color: Color grouping dimension. + facet_col: Column faceting dimension. + facet_row: Row faceting dimension. + animation_frame: Animation slider dimension. + exclude_dims: Dimensions to exclude from auto-assignment (e.g., already used for x elsewhere). + + Returns: + Dict with keys 'x', 'color', 'facet_col', 'facet_row', 'animation_frame' + and values being assigned dimension names (or None if slot skipped/unfilled). + """ + # Get available dimensions with size > 1, excluding specified dims exclude = exclude_dims or set() available = {d for d in ds.dims if ds.sizes[d] > 1 and d not in exclude} - extra_dims = [d for d in CONFIG.Plotting.extra_dim_priority if d in available] - used: set[str] = set() + # 'variable' is available when there are multiple data_vars (and not excluded) + if len(ds.data_vars) > 1 and 'variable' not in exclude: + available.add('variable') + + # Get priority-ordered list of available dims + priority_dims = [d for d in CONFIG.Plotting.dim_priority if d in available] + # Add any available dims not in priority list (fallback) + priority_dims.extend(d for d in available if d not in priority_dims) - # Map slot names to their input values + # Slot specification slots = { + 'x': x, + 'color': color, 'facet_col': facet_col, 'facet_row': facet_row, 'animation_frame': animation_frame, } - results: dict[str, str | None] = {'facet_col': None, 'facet_row': None, 'animation_frame': None} + # Slot fill order from config + slot_order = CONFIG.Plotting.slot_priority + + results: dict[str, str | None] = {k: None for k in slot_order} + used: set[str] = set() # First pass: resolve explicit dimensions (not 'auto' or None) to mark them as used for slot_name, value in slots.items(): if value is not None and value != 'auto': - if value in available and value not in used: - used.add(value) - results[slot_name] = value - - # Second pass: resolve 'auto' slots in dim_slot_priority order - dim_iter = iter(d for d in extra_dims if d not in used) - for slot_name in CONFIG.Plotting.dim_slot_priority: - if slots.get(slot_name) == 'auto': + used.add(value) + results[slot_name] = value + + # Second pass: resolve 'auto' slots in config-defined fill order + dim_iter = iter(d for d in priority_dims if d not in used) + for slot_name in slot_order: + if slots[slot_name] == 'auto': next_dim = next(dim_iter, None) if next_dim: used.add(next_dim) results[slot_name] = next_dim - return results['facet_col'], results['facet_row'], results['animation_frame'] + # Warn if any dimensions were not assigned to any slot + unassigned = available - used + if unassigned: + available_slots = [k for k, v in slots.items() if v is not None] + unavailable_slots = [k for k, v in slots.items() if v is None] + if unavailable_slots: + warnings.warn( + f'Dimensions {unassigned} not assigned to any plot dimension. ' + f'Not available for this plot type: {unavailable_slots}. ' + f'Reduce dimensions before plotting (e.g., .sel(), .isel(), .mean()).', + stacklevel=3, + ) + else: + warnings.warn( + f'Dimensions {unassigned} not assigned to any plot dimension ({available_slots}). ' + f'Reduce dimensions before plotting (e.g., .sel(), .isel(), .mean()).', + stacklevel=3, + ) + + return results + + +def _build_fig_kwargs( + slots: dict[str, str | None], + ds_sizes: dict[str, int], + px_kwargs: dict[str, Any], + facet_cols: int | None = None, +) -> dict[str, Any]: + """Build plotly express kwargs from slot assignments. + + Adds facet/animation args only if slots are assigned and not overridden in px_kwargs. + Handles facet_col_wrap based on dimension size. + """ + facet_col_wrap = facet_cols or CONFIG.Plotting.default_facet_cols + result: dict[str, Any] = {} + + # Add facet/animation kwargs from slots (skip if None or already in px_kwargs) + for slot in ('color', 'facet_col', 'facet_row', 'animation_frame'): + if slots.get(slot) and slot not in px_kwargs: + result[slot] = slots[slot] + + # Add facet_col_wrap if facet_col is set and dimension is large enough + if result.get('facet_col'): + dim_size = ds_sizes.get(result['facet_col'], facet_col_wrap + 1) + if facet_col_wrap < dim_size: + result['facet_col_wrap'] = facet_col_wrap + + return result def _dataset_to_long_df(ds: xr.Dataset, value_name: str = 'value', var_name: str = 'variable') -> pd.DataFrame: @@ -120,6 +192,7 @@ def bar( self, *, x: str | Literal['auto'] | None = 'auto', + color: str | Literal['auto'] | None = 'auto', colors: ColorType | None = None, title: str = '', xlabel: str = '', @@ -128,12 +201,15 @@ def bar( facet_row: str | Literal['auto'] | None = 'auto', animation_frame: str | Literal['auto'] | None = 'auto', facet_cols: int | None = None, + exclude_dims: set[str] | None = None, **px_kwargs: Any, ) -> go.Figure: """Create a grouped bar chart from the dataset. Args: - x: Dimension for x-axis. 'auto' uses CONFIG.Plotting.x_dim_priority. + x: Dimension for x-axis. 'auto' uses CONFIG.Plotting.dim_priority. + color: Dimension for color grouping. 'auto' uses 'variable' (data_var names) + if available, otherwise uses CONFIG priority. colors: Color specification (colorscale name, color list, or dict mapping). title: Plot title. xlabel: X-axis label. @@ -142,57 +218,46 @@ def bar( facet_row: Dimension for row facets. 'auto' uses CONFIG priority. animation_frame: Dimension for animation slider. facet_cols: Number of columns in facet grid wrap. + exclude_dims: Dimensions to exclude from auto-assignment. **px_kwargs: Additional arguments passed to plotly.express.bar. Returns: Plotly Figure. """ - # Determine x-axis first, then resolve facets from remaining dims - dims = list(self._ds.dims) - x_col = _get_x_dim(dims, x) - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - self._ds, facet_col, facet_row, animation_frame, exclude_dims={x_col} + slots = assign_slots( + self._ds, + x=x, + color=color, + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, + exclude_dims=exclude_dims, ) - df = _dataset_to_long_df(self._ds) if df.empty: return go.Figure() - variables = df['variable'].unique().tolist() - color_map = process_colors(colors, variables, default_colorscale=CONFIG.Plotting.default_qualitative_colorscale) + color_labels = df[slots['color']].unique().tolist() if slots['color'] and slots['color'] in df.columns else [] + color_map = process_colors(colors, color_labels, CONFIG.Plotting.default_qualitative_colorscale) - facet_col_wrap = facet_cols or CONFIG.Plotting.default_facet_cols - fig_kwargs: dict[str, Any] = { + labels = {**(({slots['x']: xlabel}) if xlabel and slots['x'] else {}), **({'value': ylabel} if ylabel else {})} + fig_kwargs = { 'data_frame': df, - 'x': x_col, + 'x': slots['x'], 'y': 'value', 'title': title, 'barmode': 'group', + 'color_discrete_map': color_map, + **({'labels': labels} if labels else {}), + **_build_fig_kwargs(slots, dict(self._ds.sizes), px_kwargs, facet_cols), } - # Only color by variable if it's not already on x-axis (and user didn't override) - if x_col != 'variable' and 'color' not in px_kwargs: - fig_kwargs['color'] = 'variable' - fig_kwargs['color_discrete_map'] = color_map - if xlabel: - fig_kwargs['labels'] = {x_col: xlabel} - if ylabel: - fig_kwargs['labels'] = {**fig_kwargs.get('labels', {}), 'value': ylabel} - - if actual_facet_col and 'facet_col' not in px_kwargs: - fig_kwargs['facet_col'] = actual_facet_col - if facet_col_wrap < self._ds.sizes.get(actual_facet_col, facet_col_wrap + 1): - fig_kwargs['facet_col_wrap'] = facet_col_wrap - if actual_facet_row and 'facet_row' not in px_kwargs: - fig_kwargs['facet_row'] = actual_facet_row - if actual_anim and 'animation_frame' not in px_kwargs: - fig_kwargs['animation_frame'] = actual_anim - return px.bar(**{**fig_kwargs, **px_kwargs}) def stacked_bar( self, *, x: str | Literal['auto'] | None = 'auto', + color: str | Literal['auto'] | None = 'auto', colors: ColorType | None = None, title: str = '', xlabel: str = '', @@ -201,6 +266,7 @@ def stacked_bar( facet_row: str | Literal['auto'] | None = 'auto', animation_frame: str | Literal['auto'] | None = 'auto', facet_cols: int | None = None, + exclude_dims: set[str] | None = None, **px_kwargs: Any, ) -> go.Figure: """Create a stacked bar chart from the dataset. @@ -209,7 +275,9 @@ def stacked_bar( values are stacked separately. Args: - x: Dimension for x-axis. 'auto' uses CONFIG.Plotting.x_dim_priority. + x: Dimension for x-axis. 'auto' uses CONFIG.Plotting.dim_priority. + color: Dimension for color grouping. 'auto' uses 'variable' (data_var names) + if available, otherwise uses CONFIG priority. colors: Color specification (colorscale name, color list, or dict mapping). title: Plot title. xlabel: X-axis label. @@ -223,45 +291,32 @@ def stacked_bar( Returns: Plotly Figure. """ - # Determine x-axis first, then resolve facets from remaining dims - dims = list(self._ds.dims) - x_col = _get_x_dim(dims, x) - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - self._ds, facet_col, facet_row, animation_frame, exclude_dims={x_col} + slots = assign_slots( + self._ds, + x=x, + color=color, + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, + exclude_dims=exclude_dims, ) - df = _dataset_to_long_df(self._ds) if df.empty: return go.Figure() - variables = df['variable'].unique().tolist() - color_map = process_colors(colors, variables, default_colorscale=CONFIG.Plotting.default_qualitative_colorscale) + color_labels = df[slots['color']].unique().tolist() if slots['color'] and slots['color'] in df.columns else [] + color_map = process_colors(colors, color_labels, CONFIG.Plotting.default_qualitative_colorscale) - facet_col_wrap = facet_cols or CONFIG.Plotting.default_facet_cols - fig_kwargs: dict[str, Any] = { + labels = {**(({slots['x']: xlabel}) if xlabel and slots['x'] else {}), **({'value': ylabel} if ylabel else {})} + fig_kwargs = { 'data_frame': df, - 'x': x_col, + 'x': slots['x'], 'y': 'value', 'title': title, + 'color_discrete_map': color_map, + **({'labels': labels} if labels else {}), + **_build_fig_kwargs(slots, dict(self._ds.sizes), px_kwargs, facet_cols), } - # Only color by variable if it's not already on x-axis (and user didn't override) - if x_col != 'variable' and 'color' not in px_kwargs: - fig_kwargs['color'] = 'variable' - fig_kwargs['color_discrete_map'] = color_map - if xlabel: - fig_kwargs['labels'] = {x_col: xlabel} - if ylabel: - fig_kwargs['labels'] = {**fig_kwargs.get('labels', {}), 'value': ylabel} - - if actual_facet_col and 'facet_col' not in px_kwargs: - fig_kwargs['facet_col'] = actual_facet_col - if facet_col_wrap < self._ds.sizes.get(actual_facet_col, facet_col_wrap + 1): - fig_kwargs['facet_col_wrap'] = facet_col_wrap - if actual_facet_row and 'facet_row' not in px_kwargs: - fig_kwargs['facet_row'] = actual_facet_row - if actual_anim and 'animation_frame' not in px_kwargs: - fig_kwargs['animation_frame'] = actual_anim - fig = px.bar(**{**fig_kwargs, **px_kwargs}) fig.update_layout(barmode='relative', bargap=0, bargroupgap=0) fig.update_traces(marker_line_width=0) @@ -271,6 +326,7 @@ def line( self, *, x: str | Literal['auto'] | None = 'auto', + color: str | Literal['auto'] | None = 'auto', colors: ColorType | None = None, title: str = '', xlabel: str = '', @@ -280,6 +336,7 @@ def line( animation_frame: str | Literal['auto'] | None = 'auto', facet_cols: int | None = None, line_shape: str | None = None, + exclude_dims: set[str] | None = None, **px_kwargs: Any, ) -> go.Figure: """Create a line chart from the dataset. @@ -287,7 +344,9 @@ def line( Each variable in the dataset becomes a separate line. Args: - x: Dimension for x-axis. 'auto' uses CONFIG.Plotting.x_dim_priority. + x: Dimension for x-axis. 'auto' uses CONFIG.Plotting.dim_priority. + color: Dimension for color grouping. 'auto' uses 'variable' (data_var names) + if available, otherwise uses CONFIG priority. colors: Color specification (colorscale name, color list, or dict mapping). title: Plot title. xlabel: X-axis label. @@ -303,52 +362,40 @@ def line( Returns: Plotly Figure. """ - # Determine x-axis first, then resolve facets from remaining dims - dims = list(self._ds.dims) - x_col = _get_x_dim(dims, x) - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - self._ds, facet_col, facet_row, animation_frame, exclude_dims={x_col} + slots = assign_slots( + self._ds, + x=x, + color=color, + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, + exclude_dims=exclude_dims, ) - df = _dataset_to_long_df(self._ds) if df.empty: return go.Figure() - variables = df['variable'].unique().tolist() - color_map = process_colors(colors, variables, default_colorscale=CONFIG.Plotting.default_qualitative_colorscale) + color_labels = df[slots['color']].unique().tolist() if slots['color'] and slots['color'] in df.columns else [] + color_map = process_colors(colors, color_labels, CONFIG.Plotting.default_qualitative_colorscale) - facet_col_wrap = facet_cols or CONFIG.Plotting.default_facet_cols - fig_kwargs: dict[str, Any] = { + labels = {**(({slots['x']: xlabel}) if xlabel and slots['x'] else {}), **({'value': ylabel} if ylabel else {})} + fig_kwargs = { 'data_frame': df, - 'x': x_col, + 'x': slots['x'], 'y': 'value', 'title': title, 'line_shape': line_shape or CONFIG.Plotting.default_line_shape, + 'color_discrete_map': color_map, + **({'labels': labels} if labels else {}), + **_build_fig_kwargs(slots, dict(self._ds.sizes), px_kwargs, facet_cols), } - # Only color by variable if it's not already on x-axis (and user didn't override) - if x_col != 'variable' and 'color' not in px_kwargs: - fig_kwargs['color'] = 'variable' - fig_kwargs['color_discrete_map'] = color_map - if xlabel: - fig_kwargs['labels'] = {x_col: xlabel} - if ylabel: - fig_kwargs['labels'] = {**fig_kwargs.get('labels', {}), 'value': ylabel} - - if actual_facet_col and 'facet_col' not in px_kwargs: - fig_kwargs['facet_col'] = actual_facet_col - if facet_col_wrap < self._ds.sizes.get(actual_facet_col, facet_col_wrap + 1): - fig_kwargs['facet_col_wrap'] = facet_col_wrap - if actual_facet_row and 'facet_row' not in px_kwargs: - fig_kwargs['facet_row'] = actual_facet_row - if actual_anim and 'animation_frame' not in px_kwargs: - fig_kwargs['animation_frame'] = actual_anim - return px.line(**{**fig_kwargs, **px_kwargs}) def area( self, *, x: str | Literal['auto'] | None = 'auto', + color: str | Literal['auto'] | None = 'auto', colors: ColorType | None = None, title: str = '', xlabel: str = '', @@ -358,12 +405,15 @@ def area( animation_frame: str | Literal['auto'] | None = 'auto', facet_cols: int | None = None, line_shape: str | None = None, + exclude_dims: set[str] | None = None, **px_kwargs: Any, ) -> go.Figure: """Create a stacked area chart from the dataset. Args: - x: Dimension for x-axis. 'auto' uses CONFIG.Plotting.x_dim_priority. + x: Dimension for x-axis. 'auto' uses CONFIG.Plotting.dim_priority. + color: Dimension for color grouping. 'auto' uses 'variable' (data_var names) + if available, otherwise uses CONFIG priority. colors: Color specification (colorscale name, color list, or dict mapping). title: Plot title. xlabel: X-axis label. @@ -378,46 +428,33 @@ def area( Returns: Plotly Figure. """ - # Determine x-axis first, then resolve facets from remaining dims - dims = list(self._ds.dims) - x_col = _get_x_dim(dims, x) - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - self._ds, facet_col, facet_row, animation_frame, exclude_dims={x_col} + slots = assign_slots( + self._ds, + x=x, + color=color, + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, + exclude_dims=exclude_dims, ) - df = _dataset_to_long_df(self._ds) if df.empty: return go.Figure() - variables = df['variable'].unique().tolist() - color_map = process_colors(colors, variables, default_colorscale=CONFIG.Plotting.default_qualitative_colorscale) + color_labels = df[slots['color']].unique().tolist() if slots['color'] and slots['color'] in df.columns else [] + color_map = process_colors(colors, color_labels, CONFIG.Plotting.default_qualitative_colorscale) - facet_col_wrap = facet_cols or CONFIG.Plotting.default_facet_cols - fig_kwargs: dict[str, Any] = { + labels = {**(({slots['x']: xlabel}) if xlabel and slots['x'] else {}), **({'value': ylabel} if ylabel else {})} + fig_kwargs = { 'data_frame': df, - 'x': x_col, + 'x': slots['x'], 'y': 'value', 'title': title, 'line_shape': line_shape or CONFIG.Plotting.default_line_shape, + 'color_discrete_map': color_map, + **({'labels': labels} if labels else {}), + **_build_fig_kwargs(slots, dict(self._ds.sizes), px_kwargs, facet_cols), } - # Only color by variable if it's not already on x-axis (and user didn't override) - if x_col != 'variable' and 'color' not in px_kwargs: - fig_kwargs['color'] = 'variable' - fig_kwargs['color_discrete_map'] = color_map - if xlabel: - fig_kwargs['labels'] = {x_col: xlabel} - if ylabel: - fig_kwargs['labels'] = {**fig_kwargs.get('labels', {}), 'value': ylabel} - - if actual_facet_col and 'facet_col' not in px_kwargs: - fig_kwargs['facet_col'] = actual_facet_col - if facet_col_wrap < self._ds.sizes.get(actual_facet_col, facet_col_wrap + 1): - fig_kwargs['facet_col_wrap'] = facet_col_wrap - if actual_facet_row and 'facet_row' not in px_kwargs: - fig_kwargs['facet_row'] = actual_facet_row - if actual_anim and 'animation_frame' not in px_kwargs: - fig_kwargs['animation_frame'] = actual_anim - return px.area(**{**fig_kwargs, **px_kwargs}) def heatmap( @@ -467,21 +504,54 @@ def heatmap( colors = colors or CONFIG.Plotting.default_sequential_colorscale facet_col_wrap = facet_cols or CONFIG.Plotting.default_facet_cols - actual_facet_col, _, actual_anim = _resolve_auto_facets(self._ds, facet_col, None, animation_frame) + # Heatmap uses imshow - first 2 dims are the x/y axes of the heatmap + # Only call assign_slots if we need to resolve 'auto' values + if facet_col == 'auto' or animation_frame == 'auto': + heatmap_axes = set(list(da.dims)[:2]) if len(da.dims) >= 2 else set() + slots = assign_slots( + self._ds, + x=None, + color=None, + facet_col=facet_col, + facet_row=None, + animation_frame=animation_frame, + exclude_dims=heatmap_axes, + ) + resolved_facet = slots['facet_col'] + resolved_animation = slots['animation_frame'] + else: + # Values already resolved (or None), use directly without re-resolving + resolved_facet = facet_col + resolved_animation = animation_frame imshow_args: dict[str, Any] = { - 'img': da, 'color_continuous_scale': colors, 'title': title or variable, } - if actual_facet_col and actual_facet_col in da.dims: - imshow_args['facet_col'] = actual_facet_col - if facet_col_wrap < da.sizes[actual_facet_col]: + if resolved_facet and resolved_facet in da.dims: + imshow_args['facet_col'] = resolved_facet + if facet_col_wrap < da.sizes[resolved_facet]: imshow_args['facet_col_wrap'] = facet_col_wrap - if actual_anim and actual_anim in da.dims: - imshow_args['animation_frame'] = actual_anim + if resolved_animation and resolved_animation in da.dims: + imshow_args['animation_frame'] = resolved_animation + + # Squeeze singleton dimensions not used for faceting/animation + # px.imshow can't handle extra singleton dims in multi-dimensional data + dims_to_preserve = set(list(da.dims)[:2]) # First 2 dims are heatmap x/y axes + if resolved_facet and resolved_facet in da.dims: + dims_to_preserve.add(resolved_facet) + if resolved_animation and resolved_animation in da.dims: + dims_to_preserve.add(resolved_animation) + for dim in list(da.dims): + if dim not in dims_to_preserve and da.sizes[dim] == 1: + da = da.squeeze(dim) + imshow_args['img'] = da + + # Use binary_string=False to handle non-numeric coords (e.g., string labels) + if 'binary_string' not in imshow_kwargs: + imshow_args['binary_string'] = False return px.imshow(**{**imshow_args, **imshow_kwargs}) @@ -525,8 +595,9 @@ def scatter( if df.empty: return go.Figure() - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - self._ds, facet_col, facet_row, animation_frame + # Scatter uses explicit x/y variable names, not dimensions + slots = assign_slots( + self._ds, x=None, color=None, facet_col=facet_col, facet_row=facet_row, animation_frame=animation_frame ) facet_col_wrap = facet_cols or CONFIG.Plotting.default_facet_cols @@ -542,14 +613,16 @@ def scatter( if ylabel: fig_kwargs['labels'] = {**fig_kwargs.get('labels', {}), y: ylabel} - if actual_facet_col: - fig_kwargs['facet_col'] = actual_facet_col - if facet_col_wrap < self._ds.sizes.get(actual_facet_col, facet_col_wrap + 1): + # Only use facets if the column actually exists in the dataframe + # (scatter uses wide format, so 'variable' column doesn't exist) + if slots['facet_col'] and slots['facet_col'] in df.columns: + fig_kwargs['facet_col'] = slots['facet_col'] + if facet_col_wrap < self._ds.sizes.get(slots['facet_col'], facet_col_wrap + 1): fig_kwargs['facet_col_wrap'] = facet_col_wrap - if actual_facet_row: - fig_kwargs['facet_row'] = actual_facet_row - if actual_anim: - fig_kwargs['animation_frame'] = actual_anim + if slots['facet_row'] and slots['facet_row'] in df.columns: + fig_kwargs['facet_row'] = slots['facet_row'] + if slots['animation_frame'] and slots['animation_frame'] in df.columns: + fig_kwargs['animation_frame'] = slots['animation_frame'] return px.scatter(**fig_kwargs) @@ -568,6 +641,9 @@ def pie( Extra dimensions are auto-assigned to facet_col and facet_row. For scalar values, a single pie is shown. + Note: + ``px.pie()`` does not support animation_frame, so only facets are available. + Args: colors: Color specification (colorscale name, color list, or dict mapping). title: Plot title. @@ -602,13 +678,15 @@ def pie( **px_kwargs, ) - # Multi-dimensional case - faceted/animated pies + # Multi-dimensional case - faceted pies (px.pie doesn't support animation_frame) df = _dataset_to_long_df(self._ds) if df.empty: return go.Figure() - # Note: px.pie doesn't support animation_frame - actual_facet_col, actual_facet_row, _ = _resolve_auto_facets(self._ds, facet_col, facet_row, None) + # Pie uses 'variable' for names and 'value' for values, no x/color/animation_frame + slots = assign_slots( + self._ds, x=None, color=None, facet_col=facet_col, facet_row=facet_row, animation_frame=None + ) facet_col_wrap = facet_cols or CONFIG.Plotting.default_facet_cols fig_kwargs: dict[str, Any] = { @@ -621,12 +699,12 @@ def pie( **px_kwargs, } - if actual_facet_col: - fig_kwargs['facet_col'] = actual_facet_col - if facet_col_wrap < self._ds.sizes.get(actual_facet_col, facet_col_wrap + 1): + if slots['facet_col']: + fig_kwargs['facet_col'] = slots['facet_col'] + if facet_col_wrap < self._ds.sizes.get(slots['facet_col'], facet_col_wrap + 1): fig_kwargs['facet_col_wrap'] = facet_col_wrap - if actual_facet_row: - fig_kwargs['facet_row'] = actual_facet_row + if slots['facet_row']: + fig_kwargs['facet_row'] = slots['facet_row'] return px.pie(**fig_kwargs) @@ -758,6 +836,7 @@ def stacked_bar( facet_row: str | Literal['auto'] | None = 'auto', animation_frame: str | Literal['auto'] | None = 'auto', facet_cols: int | None = None, + exclude_dims: set[str] | None = None, **px_kwargs: Any, ) -> go.Figure: """Create a stacked bar chart. See DatasetPlotAccessor.stacked_bar for details.""" @@ -865,22 +944,54 @@ def heatmap( colors = colors or CONFIG.Plotting.default_sequential_colorscale facet_col_wrap = facet_cols or CONFIG.Plotting.default_facet_cols - # Use Dataset for facet resolution - ds_for_resolution = da.to_dataset(name='_temp') - actual_facet_col, _, actual_anim = _resolve_auto_facets(ds_for_resolution, facet_col, None, animation_frame) + # Heatmap uses imshow - first 2 dims are the x/y axes of the heatmap + # Only call assign_slots if we need to resolve 'auto' values + if facet_col == 'auto' or animation_frame == 'auto': + heatmap_axes = set(list(da.dims)[:2]) if len(da.dims) >= 2 else set() + ds_for_resolution = da.to_dataset(name='_temp') + slots = assign_slots( + ds_for_resolution, + x=None, + color=None, + facet_col=facet_col, + facet_row=None, + animation_frame=animation_frame, + exclude_dims=heatmap_axes, + ) + resolved_facet = slots['facet_col'] + resolved_animation = slots['animation_frame'] + else: + # Values already resolved (or None), use directly without re-resolving + resolved_facet = facet_col + resolved_animation = animation_frame imshow_args: dict[str, Any] = { - 'img': da, 'color_continuous_scale': colors, 'title': title or (da.name if da.name else ''), } - if actual_facet_col and actual_facet_col in da.dims: - imshow_args['facet_col'] = actual_facet_col - if facet_col_wrap < da.sizes[actual_facet_col]: + if resolved_facet and resolved_facet in da.dims: + imshow_args['facet_col'] = resolved_facet + if facet_col_wrap < da.sizes[resolved_facet]: imshow_args['facet_col_wrap'] = facet_col_wrap - if actual_anim and actual_anim in da.dims: - imshow_args['animation_frame'] = actual_anim + if resolved_animation and resolved_animation in da.dims: + imshow_args['animation_frame'] = resolved_animation + + # Squeeze singleton dimensions not used for faceting/animation + # px.imshow can't handle extra singleton dims in multi-dimensional data + dims_to_preserve = set(list(da.dims)[:2]) # First 2 dims are heatmap x/y axes + if resolved_facet and resolved_facet in da.dims: + dims_to_preserve.add(resolved_facet) + if resolved_animation and resolved_animation in da.dims: + dims_to_preserve.add(resolved_animation) + for dim in list(da.dims): + if dim not in dims_to_preserve and da.sizes[dim] == 1: + da = da.squeeze(dim) + imshow_args['img'] = da + + # Use binary_string=False to handle non-numeric coords (e.g., string labels) + if 'binary_string' not in imshow_kwargs: + imshow_args['binary_string'] = False return px.imshow(**{**imshow_args, **imshow_kwargs}) diff --git a/flixopt/statistics_accessor.py b/flixopt/statistics_accessor.py index 382ed1bf0..e3581e4e3 100644 --- a/flixopt/statistics_accessor.py +++ b/flixopt/statistics_accessor.py @@ -127,6 +127,50 @@ def _reshape_time_for_heatmap( # --- Helper functions --- +def _prepare_for_heatmap( + da: xr.DataArray, + reshape: tuple[str, str] | Literal['auto'] | None, + facet_col: str | Literal['auto'] | None, + animation_frame: str | Literal['auto'] | None, +) -> xr.DataArray: + """Prepare DataArray for heatmap: determine axes, reshape if needed, transpose/squeeze.""" + + def finalize(da: xr.DataArray, heatmap_dims: list[str]) -> xr.DataArray: + """Transpose, squeeze, and clear name if needed.""" + other = [d for d in da.dims if d not in heatmap_dims] + da = da.transpose(*[d for d in heatmap_dims if d in da.dims], *other) + for dim in [d for d in da.dims if d not in heatmap_dims and da.sizes[d] == 1]: + da = da.squeeze(dim, drop=True) + return da.rename('') if da.sizes.get('variable', 1) > 1 else da + + def fallback_dims() -> list[str]: + """Default dims: (variable, time) if multi-var, else first 2 dims with size > 1.""" + if da.sizes.get('variable', 1) > 1: + return ['variable', 'time'] + dims = [d for d in da.dims if da.sizes[d] > 1][:2] + return dims if len(dims) >= 2 else list(da.dims)[:2] + + is_clustered = 'cluster' in da.dims and da.sizes['cluster'] > 1 + has_time = 'time' in da.dims + + # Clustered: use (time, cluster) as natural 2D + if is_clustered and reshape in ('auto', None): + return finalize(da, ['time', 'cluster']) + + # Explicit reshape: always apply + if reshape and reshape != 'auto' and has_time: + return finalize(_reshape_time_for_heatmap(da, reshape), ['timestep', 'timeframe']) + + # Auto reshape (non-clustered): apply only if extra dims fit in available slots + if reshape == 'auto' and has_time: + extra = [d for d in da.dims if d not in ('time', 'variable') and da.sizes[d] > 1] + slots = (facet_col == 'auto') + (animation_frame == 'auto') + if len(extra) <= slots: + return finalize(_reshape_time_for_heatmap(da, ('D', 'h')), ['timestep', 'timeframe']) + + return finalize(da, fallback_dims()) + + def _filter_by_pattern( names: list[str], include: FilterType | None, @@ -180,73 +224,6 @@ def _filter_by_carrier(ds: xr.Dataset, carrier: str | list[str] | None) -> xr.Da return ds[matching_vars] if matching_vars else xr.Dataset() -def _resolve_auto_facets( - ds: xr.Dataset, - facet_col: str | Literal['auto'] | None, - facet_row: str | Literal['auto'] | None, - animation_frame: str | Literal['auto'] | None = None, -) -> tuple[str | None, str | None, str | None]: - """Resolve 'auto' facet/animation dimensions based on available data dimensions. - - When 'auto' is specified, extra dimensions are assigned to slots based on: - - CONFIG.Plotting.extra_dim_priority: Order of dimensions to assign. - - CONFIG.Plotting.dim_slot_priority: Order of slots to fill. - - Args: - ds: Dataset to check for available dimensions. - facet_col: Dimension name, 'auto', or None. - facet_row: Dimension name, 'auto', or None. - animation_frame: Dimension name, 'auto', or None. - - Returns: - Tuple of (resolved_facet_col, resolved_facet_row, resolved_animation_frame). - Each is either a valid dimension name or None. - """ - # Get available extra dimensions with size > 1, sorted by priority - available = {d for d in ds.dims if ds.sizes[d] > 1} - extra_dims = [d for d in CONFIG.Plotting.extra_dim_priority if d in available] - used: set[str] = set() - - # Map slot names to their input values - slots = { - 'facet_col': facet_col, - 'facet_row': facet_row, - 'animation_frame': animation_frame, - } - results: dict[str, str | None] = {'facet_col': None, 'facet_row': None, 'animation_frame': None} - - # First pass: resolve explicit dimensions (not 'auto' or None) to mark them as used - for slot_name, value in slots.items(): - if value is not None and value != 'auto': - if value in available and value not in used: - used.add(value) - results[slot_name] = value - - # Second pass: resolve 'auto' slots in dim_slot_priority order - dim_iter = iter(d for d in extra_dims if d not in used) - for slot_name in CONFIG.Plotting.dim_slot_priority: - if slots.get(slot_name) == 'auto': - next_dim = next(dim_iter, None) - if next_dim: - used.add(next_dim) - results[slot_name] = next_dim - - return results['facet_col'], results['facet_row'], results['animation_frame'] - - -def _resolve_facets( - ds: xr.Dataset, - facet_col: str | Literal['auto'] | None, - facet_row: str | Literal['auto'] | None, -) -> tuple[str | None, str | None]: - """Resolve facet dimensions, returning None if not present in data. - - Legacy wrapper for _resolve_auto_facets for backward compatibility. - """ - resolved_col, resolved_row, _ = _resolve_auto_facets(ds, facet_col, facet_row, None) - return resolved_col, resolved_row - - def _dataset_to_long_df(ds: xr.Dataset, value_name: str = 'value', var_name: str = 'variable') -> pd.DataFrame: """Convert xarray Dataset to long-form DataFrame for plotly express.""" if not ds.data_vars: @@ -1382,9 +1359,6 @@ def balance( ds[label] = -ds[label] ds = _apply_selection(ds, select) - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - ds, facet_col, facet_row, animation_frame - ) # Build color map from Element.color attributes if no colors specified if colors is None: @@ -1399,9 +1373,9 @@ def balance( fig = ds.fxplot.stacked_bar( colors=colors, title=f'{node} [{unit_label}]' if unit_label else node, - facet_col=actual_facet_col, - facet_row=actual_facet_row, - animation_frame=actual_anim, + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, **plotly_kwargs, ) @@ -1493,9 +1467,6 @@ def carrier_balance( ds[label] = -ds[label] ds = _apply_selection(ds, select) - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - ds, facet_col, facet_row, animation_frame - ) # Use cached component colors for flows if colors is None: @@ -1523,9 +1494,9 @@ def carrier_balance( fig = ds.fxplot.stacked_bar( colors=colors, title=f'{carrier.capitalize()} Balance [{unit_label}]' if unit_label else f'{carrier.capitalize()} Balance', - facet_col=actual_facet_col, - facet_row=actual_facet_row, - animation_frame=actual_anim, + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, **plotly_kwargs, ) @@ -1544,7 +1515,7 @@ def heatmap( reshape: tuple[str, str] | Literal['auto'] | None = 'auto', colors: str | list[str] | None = None, facet_col: str | Literal['auto'] | None = 'auto', - animation_frame: str | Literal['auto'] | None = None, + animation_frame: str | Literal['auto'] | None = 'auto', show: bool | None = None, **plotly_kwargs: Any, ) -> PlotResult: @@ -1576,97 +1547,25 @@ def heatmap( PlotResult with processed data and figure. """ solution = self._stats._require_solution() - if isinstance(variables, str): variables = [variables] - # Resolve flow labels to variable names - resolved_variables = self._resolve_variable_names(variables, solution) - - ds = solution[resolved_variables] - ds = _apply_selection(ds, select) - - # Stack variables into single DataArray - variable_names = list(ds.data_vars) - dataarrays = [ds[var] for var in variable_names] - da = xr.concat(dataarrays, dim=pd.Index(variable_names, name='variable')) - - # Check if data is clustered (has cluster dimension with size > 1) - is_clustered = 'cluster' in da.dims and da.sizes['cluster'] > 1 - - # Determine facet and animation from available dims - has_multiple_vars = 'variable' in da.dims and da.sizes['variable'] > 1 - - if has_multiple_vars: - actual_facet = 'variable' - # Resolve animation using auto logic, excluding 'variable' which is used for facet - _, _, actual_animation = _resolve_auto_facets(da.to_dataset(name='value'), None, None, animation_frame) - if actual_animation == 'variable': - actual_animation = None - else: - # Resolve facet and animation using auto logic - actual_facet, _, actual_animation = _resolve_auto_facets( - da.to_dataset(name='value'), facet_col, None, animation_frame - ) - - # Determine heatmap dimensions based on data structure - if is_clustered and (reshape == 'auto' or reshape is None): - # Clustered data: use (time, cluster) as natural 2D heatmap axes - heatmap_dims = ['time', 'cluster'] - elif reshape and reshape != 'auto' and 'time' in da.dims: - # Non-clustered with explicit reshape: reshape time to (day, hour) etc. - # Extra dims will be handled via facet/animation or dropped - da = _reshape_time_for_heatmap(da, reshape) - heatmap_dims = ['timestep', 'timeframe'] - elif reshape == 'auto' and 'time' in da.dims and not is_clustered: - # Auto mode for non-clustered: use default ('D', 'h') reshape - # Extra dims will be handled via facet/animation or dropped - da = _reshape_time_for_heatmap(da, ('D', 'h')) - heatmap_dims = ['timestep', 'timeframe'] - elif has_multiple_vars: - # Can't reshape but have multiple vars: use variable + time as heatmap axes - heatmap_dims = ['variable', 'time'] - # variable is now a heatmap dim, use period/scenario for facet/animation - actual_facet, _, actual_animation = _resolve_auto_facets( - da.to_dataset(name='value'), facet_col, None, animation_frame - ) - else: - # Fallback: use first two available dimensions - available_dims = [d for d in da.dims if da.sizes[d] > 1] - if len(available_dims) >= 2: - heatmap_dims = available_dims[:2] - elif 'time' in da.dims: - heatmap_dims = ['time'] - else: - heatmap_dims = list(da.dims)[:1] - - # Keep only dims we need - keep_dims = set(heatmap_dims) | {d for d in [actual_facet, actual_animation] if d is not None} - for dim in [d for d in da.dims if d not in keep_dims]: - da = da.isel({dim: 0}, drop=True) if da.sizes[dim] > 1 else da.squeeze(dim, drop=True) + # Resolve, select, and stack into single DataArray + resolved = self._resolve_variable_names(variables, solution) + ds = _apply_selection(solution[resolved], select) + da = xr.concat([ds[v] for v in ds.data_vars], dim=pd.Index(list(ds.data_vars), name='variable')) - # Transpose to expected order - dim_order = heatmap_dims + [d for d in [actual_facet, actual_animation] if d] - da = da.transpose(*dim_order) + # Prepare for heatmap (reshape, transpose, squeeze) + da = _prepare_for_heatmap(da, reshape, facet_col, animation_frame) - # Clear name for multiple variables (colorbar would show first var's name) - if has_multiple_vars: - da = da.rename('') - - fig = da.fxplot.heatmap( - colors=colors, - facet_col=actual_facet, - animation_frame=actual_animation, - **plotly_kwargs, - ) + fig = da.fxplot.heatmap(colors=colors, facet_col=facet_col, animation_frame=animation_frame, **plotly_kwargs) if show is None: show = CONFIG.Plotting.default_show if show: fig.show() - reshaped_ds = da.to_dataset(name='value') if isinstance(da, xr.DataArray) else da - return PlotResult(data=reshaped_ds, figure=fig) + return PlotResult(data=da.to_dataset(name='value'), figure=fig) def flows( self, @@ -1737,9 +1636,6 @@ def flows( ds = ds[[lbl for lbl in matching_labels if lbl in ds]] ds = _apply_selection(ds, select) - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - ds, facet_col, facet_row, animation_frame - ) # Get unit label from first data variable's attributes unit_label = '' @@ -1750,9 +1646,9 @@ def flows( fig = ds.fxplot.line( colors=colors, title=f'Flows [{unit_label}]' if unit_label else 'Flows', - facet_col=actual_facet_col, - facet_row=actual_facet_row, - animation_frame=actual_anim, + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, **plotly_kwargs, ) @@ -1798,27 +1694,18 @@ def sizes( valid_labels = [lbl for lbl in ds.data_vars if float(ds[lbl].max()) < max_size] ds = ds[valid_labels] - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - ds, facet_col, facet_row, animation_frame - ) - - df = _dataset_to_long_df(ds) - if df.empty: + if not ds.data_vars: fig = go.Figure() else: - variables = df['variable'].unique().tolist() - color_map = process_colors(colors, variables) - fig = px.bar( - df, + fig = ds.fxplot.bar( x='variable', - y='value', color='variable', - facet_col=actual_facet_col, - facet_row=actual_facet_row, - animation_frame=actual_anim, - color_discrete_map=color_map, + colors=colors, title='Investment Sizes', - labels={'variable': 'Flow', 'value': 'Size'}, + ylabel='Size', + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, **plotly_kwargs, ) @@ -1913,10 +1800,6 @@ def sort_descending(arr: np.ndarray) -> np.ndarray: duration_coord = np.linspace(0, 100, n_timesteps) if normalize else np.arange(n_timesteps) result_ds = result_ds.assign_coords({duration_name: duration_coord}) - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - result_ds, facet_col, facet_row, animation_frame - ) - # Get unit label from first data variable's attributes unit_label = '' if ds.data_vars: @@ -1926,9 +1809,9 @@ def sort_descending(arr: np.ndarray) -> np.ndarray: fig = result_ds.fxplot.line( colors=colors, title=f'Duration Curve [{unit_label}]' if unit_label else 'Duration Curve', - facet_col=actual_facet_col, - facet_row=actual_facet_row, - animation_frame=actual_anim, + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, **plotly_kwargs, ) @@ -2057,14 +1940,14 @@ def effects( else: raise ValueError(f"'by' must be one of 'component', 'contributor', 'time', or None, got {by!r}") - # Resolve facets - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - combined.to_dataset(name='value'), facet_col, facet_row, animation_frame - ) - # Convert to DataFrame for plotly express df = combined.to_dataframe(name='value').reset_index() + # Resolve facet/animation: 'auto' means None for DataFrames (no dimension priority) + resolved_facet_col = None if facet_col == 'auto' else facet_col + resolved_facet_row = None if facet_row == 'auto' else facet_row + resolved_animation = None if animation_frame == 'auto' else animation_frame + # Build color map if color_col and color_col in df.columns: color_items = df[color_col].unique().tolist() @@ -2087,9 +1970,9 @@ def effects( y='value', color=color_col, color_discrete_map=color_map, - facet_col=actual_facet_col, - facet_row=actual_facet_row, - animation_frame=actual_anim, + facet_col=resolved_facet_col, + facet_row=resolved_facet_row, + animation_frame=resolved_animation, title=title, **plotly_kwargs, ) @@ -2138,16 +2021,13 @@ def charge_states( ds = ds[[s for s in storages if s in ds]] ds = _apply_selection(ds, select) - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - ds, facet_col, facet_row, animation_frame - ) fig = ds.fxplot.line( colors=colors, title='Storage Charge States', - facet_col=actual_facet_col, - facet_row=actual_facet_row, - animation_frame=actual_anim, + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, **plotly_kwargs, ) fig.update_yaxes(title_text='Charge State') @@ -2231,61 +2111,46 @@ def storage( # Apply selection ds = _apply_selection(ds, select) - actual_facet_col, actual_facet_row, actual_anim = _resolve_auto_facets( - ds, facet_col, facet_row, animation_frame - ) - # Build color map + # Separate flow data from charge_state flow_labels = [lbl for lbl in ds.data_vars if lbl != 'charge_state'] + flow_ds = ds[flow_labels] + charge_da = ds['charge_state'] + + # Build color map for flows if colors is None: colors = self._get_color_map_for_balance(storage, flow_labels) - color_map = process_colors(colors, flow_labels) - color_map['charge_state'] = 'black' - # Convert to long-form DataFrame - df = _dataset_to_long_df(ds) - - # Create figure with facets using px.bar for flows - flow_df = df[df['variable'] != 'charge_state'] - charge_df = df[df['variable'] == 'charge_state'] - - fig = px.bar( - flow_df, + # Create stacked bar chart for flows using fxplot + fig = flow_ds.fxplot.stacked_bar( x='time', - y='value', color='variable', - facet_col=actual_facet_col, - facet_row=actual_facet_row, - animation_frame=actual_anim, - color_discrete_map=color_map, + colors=colors, title=f'{storage} Operation ({unit})', + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, **plotly_kwargs, ) - fig.update_layout(bargap=0, bargroupgap=0) - fig.update_traces(marker_line_width=0) # Add charge state as line on secondary y-axis - if not charge_df.empty: - # Create line figure with same facets to get matching trace structure - line_fig = px.line( - charge_df, + if charge_da.size > 0: + # Create line figure with same facets + line_fig = charge_da.fxplot.line( x='time', - y='value', - facet_col=actual_facet_col, - facet_row=actual_facet_row, - animation_frame=actual_anim, + color=None, # Single line, no color grouping + facet_col=facet_col, + facet_row=facet_row, + animation_frame=animation_frame, ) # Get the primary y-axes from the bar figure to create matching secondary axes - # px creates axes named: yaxis, yaxis2, yaxis3, etc. primary_yaxes = [key for key in fig.layout if key.startswith('yaxis')] # For each primary y-axis, create a secondary y-axis for i, primary_key in enumerate(sorted(primary_yaxes, key=lambda x: int(x[5:]) if x[5:] else 0)): - # Determine secondary axis name (y -> y2, y2 -> y3 pattern won't work) - # Instead use a consistent offset: yaxis -> yaxis10, yaxis2 -> yaxis11, etc. primary_num = primary_key[5:] if primary_key[5:] else '1' - secondary_num = int(primary_num) + 100 # Use high offset to avoid conflicts + secondary_num = int(primary_num) + 100 secondary_key = f'yaxis{secondary_num}' secondary_anchor = f'x{primary_num}' if primary_num != '1' else 'x' @@ -2299,14 +2164,13 @@ def storage( # Add line traces with correct axis assignments for i, trace in enumerate(line_fig.data): - # Map trace index to secondary y-axis primary_num = i + 1 if i > 0 else 1 secondary_yaxis = f'y{primary_num + 100}' trace.name = 'charge_state' trace.line = dict(color=charge_state_color, width=2) trace.yaxis = secondary_yaxis - trace.showlegend = i == 0 # Only show legend for first trace + trace.showlegend = i == 0 trace.legendgroup = 'charge_state' fig.add_trace(trace) diff --git a/flixopt/transform_accessor.py b/flixopt/transform_accessor.py index 3a13dbb63..6a5b51caa 100644 --- a/flixopt/transform_accessor.py +++ b/flixopt/transform_accessor.py @@ -582,6 +582,14 @@ def cluster( weights: dict[str, float] | None = None, time_series_for_high_peaks: list[str] | None = None, time_series_for_low_peaks: list[str] | None = None, + cluster_method: Literal['k_means', 'k_medoids', 'hierarchical', 'k_maxoids', 'averaging'] = 'hierarchical', + representation_method: Literal[ + 'meanRepresentation', 'medoidRepresentation', 'distributionAndMinMaxRepresentation' + ] = 'medoidRepresentation', + extreme_period_method: Literal['append', 'new_cluster_center', 'replace_cluster_center'] | None = None, + rescale_cluster_periods: bool = True, + predef_cluster_order: xr.DataArray | np.ndarray | list[int] | None = None, + **tsam_kwargs: Any, ) -> FlowSystem: """ Create a FlowSystem with reduced timesteps using typical clusters. @@ -591,7 +599,7 @@ def cluster( through time series aggregation using the tsam package. The method: - 1. Performs time series clustering using tsam (k-means) + 1. Performs time series clustering using tsam (hierarchical by default) 2. Extracts only the typical clusters (not all original timesteps) 3. Applies timestep weighting for accurate cost representation 4. Handles storage states between clusters based on each Storage's ``cluster_mode`` @@ -607,6 +615,25 @@ def cluster( time_series_for_high_peaks: Time series labels for explicitly selecting high-value clusters. **Recommended** for demand time series to capture peak demand days. time_series_for_low_peaks: Time series labels for explicitly selecting low-value clusters. + cluster_method: Clustering algorithm to use. Options: + ``'hierarchical'`` (default), ``'k_means'``, ``'k_medoids'``, + ``'k_maxoids'``, ``'averaging'``. + representation_method: How cluster representatives are computed. Options: + ``'medoidRepresentation'`` (default), ``'meanRepresentation'``, + ``'distributionAndMinMaxRepresentation'``. + extreme_period_method: How extreme periods (peaks) are integrated. Options: + ``None`` (default, no special handling), ``'append'``, + ``'new_cluster_center'``, ``'replace_cluster_center'``. + rescale_cluster_periods: If True (default), rescale cluster periods so their + weighted mean matches the original time series mean. + predef_cluster_order: Predefined cluster assignments for manual clustering. + Array of cluster indices (0 to n_clusters-1) for each original period. + If provided, clustering is skipped and these assignments are used directly. + For multi-dimensional FlowSystems, use an xr.DataArray with dims + ``[original_cluster, period?, scenario?]`` to specify different assignments + per period/scenario combination. + **tsam_kwargs: Additional keyword arguments passed to + ``tsam.TimeSeriesAggregation``. See tsam documentation for all options. Returns: A new FlowSystem with reduced timesteps (only typical clusters). @@ -676,11 +703,47 @@ def cluster( ds = self._fs.to_dataset(include_solution=False) + # Validate tsam_kwargs doesn't override explicit parameters + reserved_tsam_keys = { + 'noTypicalPeriods', + 'hoursPerPeriod', + 'resolution', + 'clusterMethod', + 'extremePeriodMethod', + 'representationMethod', + 'rescaleClusterPeriods', + 'predefClusterOrder', + 'weightDict', + 'addPeakMax', + 'addPeakMin', + } + conflicts = reserved_tsam_keys & set(tsam_kwargs.keys()) + if conflicts: + raise ValueError( + f'Cannot override explicit parameters via tsam_kwargs: {conflicts}. ' + f'Use the corresponding cluster() parameters instead.' + ) + + # Validate predef_cluster_order dimensions if it's a DataArray + if isinstance(predef_cluster_order, xr.DataArray): + expected_dims = {'original_cluster'} + if has_periods: + expected_dims.add('period') + if has_scenarios: + expected_dims.add('scenario') + if set(predef_cluster_order.dims) != expected_dims: + raise ValueError( + f'predef_cluster_order dimensions {set(predef_cluster_order.dims)} ' + f'do not match expected {expected_dims} for this FlowSystem.' + ) + # Cluster each (period, scenario) combination using tsam directly tsam_results: dict[tuple, tsam.TimeSeriesAggregation] = {} cluster_orders: dict[tuple, np.ndarray] = {} cluster_occurrences_all: dict[tuple, dict] = {} - use_extreme_periods = bool(time_series_for_high_peaks or time_series_for_low_peaks) + + # Collect metrics per (period, scenario) slice + clustering_metrics_all: dict[tuple, pd.DataFrame] = {} for period_label in periods: for scenario_label in scenarios: @@ -693,18 +756,34 @@ def cluster( if selector: logger.info(f'Clustering {", ".join(f"{k}={v}" for k, v in selector.items())}...') + # Handle predef_cluster_order for multi-dimensional case + predef_order_slice = None + if predef_cluster_order is not None: + if isinstance(predef_cluster_order, xr.DataArray): + # Extract slice for this (period, scenario) combination + predef_order_slice = predef_cluster_order.sel(**selector, drop=True).values + else: + # Simple array/list - use directly + predef_order_slice = predef_cluster_order + # Use tsam directly clustering_weights = weights or self._calculate_clustering_weights(temporaly_changing_ds) + # tsam expects 'None' as a string, not Python None + tsam_extreme_method = 'None' if extreme_period_method is None else extreme_period_method tsam_agg = tsam.TimeSeriesAggregation( df, noTypicalPeriods=n_clusters, hoursPerPeriod=hours_per_cluster, resolution=dt, - clusterMethod='k_means', - extremePeriodMethod='new_cluster_center' if use_extreme_periods else 'None', + clusterMethod=cluster_method, + extremePeriodMethod=tsam_extreme_method, + representationMethod=representation_method, + rescaleClusterPeriods=rescale_cluster_periods, + predefClusterOrder=predef_order_slice, weightDict={name: w for name, w in clustering_weights.items() if name in df.columns}, addPeakMax=time_series_for_high_peaks or [], addPeakMin=time_series_for_low_peaks or [], + **tsam_kwargs, ) # Suppress tsam warning about minimal value constraints (informational, not actionable) with warnings.catch_warnings(): @@ -714,10 +793,60 @@ def cluster( tsam_results[key] = tsam_agg cluster_orders[key] = tsam_agg.clusterOrder cluster_occurrences_all[key] = tsam_agg.clusterPeriodNoOccur + # Compute accuracy metrics with error handling + try: + clustering_metrics_all[key] = tsam_agg.accuracyIndicators() + except Exception as e: + logger.warning(f'Failed to compute clustering metrics for {key}: {e}') + clustering_metrics_all[key] = pd.DataFrame() # Use first result for structure first_key = (periods[0], scenarios[0]) first_tsam = tsam_results[first_key] + + # Convert metrics to xr.Dataset with period/scenario dims if multi-dimensional + # Filter out empty DataFrames (from failed accuracyIndicators calls) + non_empty_metrics = {k: v for k, v in clustering_metrics_all.items() if not v.empty} + if not non_empty_metrics: + # All metrics failed - create empty Dataset + clustering_metrics = xr.Dataset() + elif len(non_empty_metrics) == 1 or len(clustering_metrics_all) == 1: + # Simple case: convert single DataFrame to Dataset + metrics_df = non_empty_metrics.get(first_key) + if metrics_df is None: + metrics_df = next(iter(non_empty_metrics.values())) + clustering_metrics = xr.Dataset( + { + col: xr.DataArray( + metrics_df[col].values, dims=['time_series'], coords={'time_series': metrics_df.index} + ) + for col in metrics_df.columns + } + ) + else: + # Multi-dim case: combine metrics into Dataset with period/scenario dims + # First, get the metric columns from any non-empty DataFrame + sample_df = next(iter(non_empty_metrics.values())) + metric_names = list(sample_df.columns) + time_series_names = list(sample_df.index) + + # Build DataArrays for each metric + data_vars = {} + for metric in metric_names: + # Shape: (time_series, period?, scenario?) + slices = {} + for (p, s), df in clustering_metrics_all.items(): + if df.empty: + # Use NaN for failed metrics + slices[(p, s)] = xr.DataArray(np.full(len(time_series_names), np.nan), dims=['time_series']) + else: + slices[(p, s)] = xr.DataArray(df[metric].values, dims=['time_series']) + + da = self._combine_slices_to_dataarray_generic(slices, ['time_series'], periods, scenarios, metric) + da = da.assign_coords(time_series=time_series_names) + data_vars[metric] = da + + clustering_metrics = xr.Dataset(data_vars) n_reduced_timesteps = len(first_tsam.typicalPeriods) actual_n_clusters = len(first_tsam.clusterPeriodNoOccur) @@ -851,7 +980,7 @@ def _build_cluster_occurrences_for_key(key: tuple) -> np.ndarray: # Build multi-dimensional arrays if has_periods or has_scenarios: # Multi-dimensional case: build arrays for each (period, scenario) combination - # cluster_order: dims [original_period, period?, scenario?] + # cluster_order: dims [original_cluster, period?, scenario?] cluster_order_slices = {} timestep_mapping_slices = {} cluster_occurrences_slices = {} @@ -863,7 +992,7 @@ def _build_cluster_occurrences_for_key(key: tuple) -> np.ndarray: for s in scenarios: key = (p, s) cluster_order_slices[key] = xr.DataArray( - cluster_orders[key], dims=['original_period'], name='cluster_order' + cluster_orders[key], dims=['original_cluster'], name='cluster_order' ) timestep_mapping_slices[key] = xr.DataArray( _build_timestep_mapping_for_key(key), @@ -877,7 +1006,7 @@ def _build_cluster_occurrences_for_key(key: tuple) -> np.ndarray: # Combine slices into multi-dimensional DataArrays cluster_order_da = self._combine_slices_to_dataarray_generic( - cluster_order_slices, ['original_period'], periods, scenarios, 'cluster_order' + cluster_order_slices, ['original_cluster'], periods, scenarios, 'cluster_order' ) timestep_mapping_da = self._combine_slices_to_dataarray_generic( timestep_mapping_slices, ['original_time'], periods, scenarios, 'timestep_mapping' @@ -887,7 +1016,7 @@ def _build_cluster_occurrences_for_key(key: tuple) -> np.ndarray: ) else: # Simple case: single (None, None) slice - cluster_order_da = xr.DataArray(cluster_orders[first_key], dims=['original_period'], name='cluster_order') + cluster_order_da = xr.DataArray(cluster_orders[first_key], dims=['original_cluster'], name='cluster_order') # Use renamed timesteps as coordinates original_timesteps_coord = self._fs.timesteps.rename('original_time') timestep_mapping_da = xr.DataArray( @@ -932,6 +1061,7 @@ def _build_cluster_weights_for_key(key: tuple) -> xr.DataArray: reduced_fs.clustering = Clustering( result=aggregation_result, backend_name='tsam', + metrics=clustering_metrics, ) return reduced_fs @@ -996,7 +1126,7 @@ def _combine_slices_to_dataarray_generic( Args: slices: Dict mapping (period, scenario) tuples to DataArrays. - base_dims: Base dimensions of each slice (e.g., ['original_period'] or ['original_time']). + base_dims: Base dimensions of each slice (e.g., ['original_cluster'] or ['original_time']). periods: List of period labels ([None] if no periods dimension). scenarios: List of scenario labels ([None] if no scenarios dimension). name: Name for the resulting DataArray. @@ -1085,7 +1215,7 @@ def expand_solution(self) -> FlowSystem: disaggregates the FlowSystem by: 1. Expanding all time series data from typical clusters to full timesteps 2. Expanding the solution by mapping each typical cluster back to all - original segments it represents + original clusters it represents For FlowSystems with periods and/or scenarios, each (period, scenario) combination is expanded using its own cluster assignment. @@ -1121,7 +1251,7 @@ def expand_solution(self) -> FlowSystem: Note: The expanded FlowSystem repeats the typical cluster values for all - segments belonging to the same cluster. Both input data and solution + original clusters belonging to the same cluster. Both input data and solution are consistently expanded, so they match. This is an approximation - the actual dispatch at full resolution would differ due to intra-cluster variations in time series data. @@ -1162,18 +1292,44 @@ def expand_solution(self) -> FlowSystem: scenarios = list(self._fs.scenarios) if has_scenarios else [None] n_original_timesteps = len(original_timesteps) n_reduced_timesteps = n_clusters * timesteps_per_cluster + n_original_clusters = cluster_structure.n_original_clusters # Expand function using ClusterResult.expand_data() - handles multi-dimensional cases - def expand_da(da: xr.DataArray) -> xr.DataArray: + # For charge_state with cluster dim, also includes the extra timestep + # Clamp to valid bounds to handle partial clusters at the end + last_original_cluster_idx = min( + (n_original_timesteps - 1) // timesteps_per_cluster, + n_original_clusters - 1, + ) + + def expand_da(da: xr.DataArray, var_name: str = '') -> xr.DataArray: if 'time' not in da.dims: return da.copy() - return info.result.expand_data(da, original_time=original_timesteps) + expanded = info.result.expand_data(da, original_time=original_timesteps) + + # For charge_state with cluster dim, append the extra timestep value + if var_name.endswith('|charge_state') and 'cluster' in da.dims: + # Get extra timestep from last cluster using vectorized selection + cluster_order = cluster_structure.cluster_order # (n_original_clusters,) or with period/scenario + if cluster_order.ndim == 1: + last_cluster = int(cluster_order[last_original_cluster_idx]) + extra_val = da.isel(cluster=last_cluster, time=-1) + else: + # Multi-dimensional: select last cluster for each period/scenario slice + last_clusters = cluster_order.isel(original_cluster=last_original_cluster_idx) + extra_val = da.isel(cluster=last_clusters, time=-1) + # Drop 'cluster'/'time' coords created by isel (kept as non-dim coords) + extra_val = extra_val.drop_vars(['cluster', 'time'], errors='ignore') + extra_val = extra_val.expand_dims(time=[original_timesteps_extra[-1]]) + expanded = xr.concat([expanded, extra_val], dim='time') + + return expanded # 1. Expand FlowSystem data (with cluster_weight set to 1.0 for all timesteps) reduced_ds = self._fs.to_dataset(include_solution=False) # Filter out cluster-related variables and copy attrs without clustering info data_vars = { - name: expand_da(da) + name: expand_da(da, name) for name, da in reduced_ds.data_vars.items() if name != 'cluster_weight' and not name.startswith('clustering|') } @@ -1201,17 +1357,22 @@ def expand_da(da: xr.DataArray) -> xr.DataArray: expanded_fs = FlowSystem.from_dataset(expanded_ds) # 2. Expand solution + # charge_state variables get their extra timestep via expand_da; others get NaN via reindex reduced_solution = self._fs.solution expanded_fs._solution = xr.Dataset( - {name: expand_da(da) for name, da in reduced_solution.data_vars.items()}, + {name: expand_da(da, name) for name, da in reduced_solution.data_vars.items()}, attrs=reduced_solution.attrs, ) + # Reindex to timesteps_extra for consistency with non-expanded FlowSystems + # (variables without extra timestep data will have NaN at the final timestep) + expanded_fs._solution = expanded_fs._solution.reindex(time=original_timesteps_extra) # 3. Combine charge_state with SOC_boundary for InterclusterStorageModel storages # For intercluster storages, charge_state is relative (ΔE) and can be negative. # Per Blanke et al. (2022) Eq. 9, actual SOC at time t in period d is: # SOC(t) = SOC_boundary[d] * (1 - loss)^t_within_period + charge_state(t) # where t_within_period is hours from period start (accounts for self-discharge decay). + n_original_timesteps_extra = len(original_timesteps_extra) soc_boundary_vars = [name for name in reduced_solution.data_vars if name.endswith('|SOC_boundary')] for soc_boundary_name in soc_boundary_vars: storage_name = soc_boundary_name.rsplit('|', 1)[0] @@ -1222,30 +1383,42 @@ def expand_da(da: xr.DataArray) -> xr.DataArray: soc_boundary = reduced_solution[soc_boundary_name] expanded_charge_state = expanded_fs._solution[charge_state_name] - # Map each original timestep to its original period index - original_period_indices = np.arange(n_original_timesteps) // timesteps_per_cluster + # Map each original timestep (including extra) to its original period index + # The extra timestep belongs to the last period + original_cluster_indices = np.minimum( + np.arange(n_original_timesteps_extra) // timesteps_per_cluster, + n_original_clusters - 1, + ) # Select SOC_boundary for each timestep (boundary[d] for period d) - # SOC_boundary has dim 'cluster_boundary', we select indices 0..n_original_periods-1 + # SOC_boundary has dim 'cluster_boundary', we select indices 0..n_original_clusters-1 soc_boundary_per_timestep = soc_boundary.isel( - cluster_boundary=xr.DataArray(original_period_indices, dims=['time']) + cluster_boundary=xr.DataArray(original_cluster_indices, dims=['time']) ) - soc_boundary_per_timestep = soc_boundary_per_timestep.assign_coords(time=original_timesteps) + soc_boundary_per_timestep = soc_boundary_per_timestep.assign_coords(time=original_timesteps_extra) # Apply self-discharge decay to SOC_boundary based on time within period # Get the storage's relative_loss_per_hour from the clustered flow system storage = self._fs.storages.get(storage_name) if storage is not None: # Time within period for each timestep (0, 1, 2, ..., timesteps_per_cluster-1, 0, 1, ...) - time_within_period = np.arange(n_original_timesteps) % timesteps_per_cluster + # The extra timestep is at index timesteps_per_cluster (one past the last within-cluster index) + time_within_period = np.arange(n_original_timesteps_extra) % timesteps_per_cluster + # The extra timestep gets the correct decay (timesteps_per_cluster) + time_within_period[-1] = timesteps_per_cluster time_within_period_da = xr.DataArray( - time_within_period, dims=['time'], coords={'time': original_timesteps} + time_within_period, dims=['time'], coords={'time': original_timesteps_extra} ) # Decay factor: (1 - loss)^t, using mean loss over time - # Keep as DataArray to respect per-period/scenario values loss_value = storage.relative_loss_per_hour.mean('time') if (loss_value > 0).any(): decay_da = (1 - loss_value) ** time_within_period_da + if 'cluster' in decay_da.dims: + # Map each timestep to its cluster's decay value + cluster_per_timestep = cluster_structure.cluster_order.values[original_cluster_indices] + decay_da = decay_da.isel(cluster=xr.DataArray(cluster_per_timestep, dims=['time'])).drop_vars( + 'cluster', errors='ignore' + ) soc_boundary_per_timestep = soc_boundary_per_timestep * decay_da # Combine: actual_SOC = SOC_boundary * decay + charge_state @@ -1254,15 +1427,22 @@ def expand_da(da: xr.DataArray) -> xr.DataArray: combined_charge_state = (expanded_charge_state + soc_boundary_per_timestep).clip(min=0) expanded_fs._solution[charge_state_name] = combined_charge_state.assign_attrs(expanded_charge_state.attrs) + # Remove SOC_boundary variables - they're cluster-specific and now incorporated into charge_state + for soc_boundary_name in soc_boundary_vars: + if soc_boundary_name in expanded_fs._solution: + del expanded_fs._solution[soc_boundary_name] + # Also drop the cluster_boundary coordinate (orphaned after removing SOC_boundary) + if 'cluster_boundary' in expanded_fs._solution.coords: + expanded_fs._solution = expanded_fs._solution.drop_vars('cluster_boundary') + n_combinations = len(periods) * len(scenarios) - n_original_segments = cluster_structure.n_original_periods logger.info( f'Expanded FlowSystem from {n_reduced_timesteps} to {n_original_timesteps} timesteps ' f'({n_clusters} clusters' + ( f', {n_combinations} period/scenario combinations)' if n_combinations > 1 - else f' → {n_original_segments} original segments)' + else f' → {n_original_clusters} original clusters)' ) ) diff --git a/tests/test_cluster_reduce_expand.py b/tests/test_cluster_reduce_expand.py index 7072fe22e..4059470ee 100644 --- a/tests/test_cluster_reduce_expand.py +++ b/tests/test_cluster_reduce_expand.py @@ -167,7 +167,7 @@ def test_expand_solution_enables_statistics_accessor(solver_fixture, timesteps_8 # These should work without errors flow_rates = fs_expanded.statistics.flow_rates assert 'Boiler(Q_th)' in flow_rates - assert len(flow_rates['Boiler(Q_th)'].coords['time']) == 192 + assert len(flow_rates['Boiler(Q_th)'].coords['time']) == 193 # 192 + 1 extra timestep flow_hours = fs_expanded.statistics.flow_hours assert 'Boiler(Q_th)' in flow_hours @@ -321,7 +321,7 @@ def test_cluster_and_expand_with_scenarios(solver_fixture, timesteps_8_days, sce flow_var = 'Boiler(Q_th)|flow_rate' assert flow_var in fs_expanded.solution assert 'scenario' in fs_expanded.solution[flow_var].dims - assert len(fs_expanded.solution[flow_var].coords['time']) == 192 + assert len(fs_expanded.solution[flow_var].coords['time']) == 193 # 192 + 1 extra timestep def test_expand_solution_maps_scenarios_independently(solver_fixture, timesteps_8_days, scenarios_2): @@ -449,9 +449,9 @@ def test_storage_cluster_mode_intercluster(self, solver_fixture, timesteps_8_day soc_boundary = fs_clustered.solution['Battery|SOC_boundary'] assert 'cluster_boundary' in soc_boundary.dims - # Number of boundaries = n_original_periods + 1 - n_original_periods = fs_clustered.clustering.result.cluster_structure.n_original_periods - assert soc_boundary.sizes['cluster_boundary'] == n_original_periods + 1 + # Number of boundaries = n_original_clusters + 1 + n_original_clusters = fs_clustered.clustering.result.cluster_structure.n_original_clusters + assert soc_boundary.sizes['cluster_boundary'] == n_original_clusters + 1 def test_storage_cluster_mode_intercluster_cyclic(self, solver_fixture, timesteps_8_days): """Storage with cluster_mode='intercluster_cyclic' - linked with yearly cycling.""" @@ -693,7 +693,7 @@ def test_expand_solution_with_periods(self, solver_fixture, timesteps_8_days, pe # Solution should have period dimension flow_var = 'Boiler(Q_th)|flow_rate' assert 'period' in fs_expanded.solution[flow_var].dims - assert len(fs_expanded.solution[flow_var].coords['time']) == 192 + assert len(fs_expanded.solution[flow_var].coords['time']) == 193 # 192 + 1 extra timestep def test_cluster_with_periods_and_scenarios(self, solver_fixture, timesteps_8_days, periods_2, scenarios_2): """Clustering should work with both periods and scenarios.""" @@ -719,7 +719,7 @@ def test_cluster_with_periods_and_scenarios(self, solver_fixture, timesteps_8_da fs_expanded = fs_clustered.transform.expand_solution() assert 'period' in fs_expanded.solution[flow_var].dims assert 'scenario' in fs_expanded.solution[flow_var].dims - assert len(fs_expanded.solution[flow_var].coords['time']) == 192 + assert len(fs_expanded.solution[flow_var].coords['time']) == 193 # 192 + 1 extra timestep # ==================== Peak Selection Tests ==================== diff --git a/tests/test_clustering/test_base.py b/tests/test_clustering/test_base.py index 9c63f25f6..9cca4de81 100644 --- a/tests/test_clustering/test_base.py +++ b/tests/test_clustering/test_base.py @@ -17,7 +17,7 @@ class TestClusterStructure: def test_basic_creation(self): """Test basic ClusterStructure creation.""" - cluster_order = xr.DataArray([0, 1, 0, 1, 2, 0], dims=['original_period']) + cluster_order = xr.DataArray([0, 1, 0, 1, 2, 0], dims=['original_cluster']) cluster_occurrences = xr.DataArray([3, 2, 1], dims=['cluster']) structure = ClusterStructure( @@ -29,7 +29,7 @@ def test_basic_creation(self): assert structure.n_clusters == 3 assert structure.timesteps_per_cluster == 24 - assert structure.n_original_periods == 6 + assert structure.n_original_clusters == 6 def test_creation_from_numpy(self): """Test ClusterStructure creation from numpy arrays.""" @@ -42,12 +42,12 @@ def test_creation_from_numpy(self): assert isinstance(structure.cluster_order, xr.DataArray) assert isinstance(structure.cluster_occurrences, xr.DataArray) - assert structure.n_original_periods == 5 + assert structure.n_original_clusters == 5 def test_get_cluster_weight_per_timestep(self): """Test weight calculation per timestep.""" structure = ClusterStructure( - cluster_order=xr.DataArray([0, 1, 0], dims=['original_period']), + cluster_order=xr.DataArray([0, 1, 0], dims=['original_cluster']), cluster_occurrences=xr.DataArray([2, 1], dims=['cluster']), n_clusters=2, timesteps_per_cluster=4, @@ -136,7 +136,7 @@ def test_basic_creation(self): structure = create_cluster_structure_from_mapping(mapping, timesteps_per_cluster=4) assert structure.timesteps_per_cluster == 4 - assert structure.n_original_periods == 3 + assert structure.n_original_clusters == 3 class TestClustering: diff --git a/tests/test_clustering/test_integration.py b/tests/test_clustering/test_integration.py index 2d04a51c1..16c638c95 100644 --- a/tests/test_clustering/test_integration.py +++ b/tests/test_clustering/test_integration.py @@ -170,6 +170,104 @@ def test_cluster_reduces_timesteps(self): assert len(fs_clustered.timesteps) * len(fs_clustered.clusters) == 48 +class TestClusterAdvancedOptions: + """Tests for advanced clustering options.""" + + @pytest.fixture + def basic_flow_system(self): + """Create a basic FlowSystem for testing.""" + pytest.importorskip('tsam') + from flixopt import Bus, Flow, Sink, Source + from flixopt.core import TimeSeriesData + + n_hours = 168 # 7 days + fs = FlowSystem(timesteps=pd.date_range('2024-01-01', periods=n_hours, freq='h')) + + demand_data = np.sin(np.linspace(0, 14 * np.pi, n_hours)) + 2 + bus = Bus('electricity') + grid_flow = Flow('grid_in', bus='electricity', size=100) + demand_flow = Flow( + 'demand_out', bus='electricity', size=100, fixed_relative_profile=TimeSeriesData(demand_data / 100) + ) + source = Source('grid', outputs=[grid_flow]) + sink = Sink('demand', inputs=[demand_flow]) + fs.add_elements(source, sink, bus) + return fs + + def test_cluster_method_parameter(self, basic_flow_system): + """Test that cluster_method parameter works.""" + fs_clustered = basic_flow_system.transform.cluster( + n_clusters=2, cluster_duration='1D', cluster_method='hierarchical' + ) + assert len(fs_clustered.clusters) == 2 + + def test_hierarchical_is_deterministic(self, basic_flow_system): + """Test that hierarchical clustering (default) produces deterministic results.""" + fs1 = basic_flow_system.transform.cluster(n_clusters=2, cluster_duration='1D') + fs2 = basic_flow_system.transform.cluster(n_clusters=2, cluster_duration='1D') + + # Hierarchical clustering should produce identical cluster orders + xr.testing.assert_equal(fs1.clustering.cluster_order, fs2.clustering.cluster_order) + + def test_metrics_available(self, basic_flow_system): + """Test that clustering metrics are available after clustering.""" + fs_clustered = basic_flow_system.transform.cluster(n_clusters=2, cluster_duration='1D') + + assert fs_clustered.clustering.metrics is not None + assert isinstance(fs_clustered.clustering.metrics, xr.Dataset) + assert 'time_series' in fs_clustered.clustering.metrics.dims + assert len(fs_clustered.clustering.metrics.data_vars) > 0 + + def test_representation_method_parameter(self, basic_flow_system): + """Test that representation_method parameter works.""" + fs_clustered = basic_flow_system.transform.cluster( + n_clusters=2, cluster_duration='1D', representation_method='medoidRepresentation' + ) + assert len(fs_clustered.clusters) == 2 + + def test_rescale_cluster_periods_parameter(self, basic_flow_system): + """Test that rescale_cluster_periods parameter works.""" + fs_clustered = basic_flow_system.transform.cluster( + n_clusters=2, cluster_duration='1D', rescale_cluster_periods=False + ) + assert len(fs_clustered.clusters) == 2 + + def test_tsam_kwargs_passthrough(self, basic_flow_system): + """Test that additional kwargs are passed to tsam.""" + # sameMean is a valid tsam parameter + fs_clustered = basic_flow_system.transform.cluster(n_clusters=2, cluster_duration='1D', sameMean=True) + assert len(fs_clustered.clusters) == 2 + + def test_metrics_with_periods(self): + """Test that metrics have period dimension for multi-period FlowSystems.""" + pytest.importorskip('tsam') + from flixopt import Bus, Flow, Sink, Source + from flixopt.core import TimeSeriesData + + n_hours = 168 # 7 days + fs = FlowSystem( + timesteps=pd.date_range('2024-01-01', periods=n_hours, freq='h'), + periods=pd.Index([2025, 2030], name='period'), + ) + + demand_data = np.sin(np.linspace(0, 14 * np.pi, n_hours)) + 2 + bus = Bus('electricity') + grid_flow = Flow('grid_in', bus='electricity', size=100) + demand_flow = Flow( + 'demand_out', bus='electricity', size=100, fixed_relative_profile=TimeSeriesData(demand_data / 100) + ) + source = Source('grid', outputs=[grid_flow]) + sink = Sink('demand', inputs=[demand_flow]) + fs.add_elements(source, sink, bus) + + fs_clustered = fs.transform.cluster(n_clusters=2, cluster_duration='1D') + + # Metrics should have period dimension + assert fs_clustered.clustering.metrics is not None + assert 'period' in fs_clustered.clustering.metrics.dims + assert len(fs_clustered.clustering.metrics.period) == 2 + + class TestClusteringModuleImports: """Tests for flixopt.clustering module imports."""