Skip to content

Conversation

@davisethan
Copy link
Contributor

@davisethan davisethan commented Jan 14, 2026

Problem: MOABB prevents detailed CodeCarbon compute profiling metrics from being saved to file.

Proposed Solution: Make CodeCarbon fully configurable outside the script level; e.g. environment variables, configuration files 12. When CodeCarbon is installed, the MOABB tabular results have an additional column codecarbon_task_name that is a unique UUID4 that can be joined with related rows from CodeCarbon tabular results by the column task_name. CodeCarbon writes multiple files, requiring the programmer to combine relevant CodeCarbon tables to join with MOABB tables to see detailed compute profiling metrics per cross-validation.

Google Colab: https://colab.research.google.com/drive/1YOUe47Easrj-FVbVrpsLfMmHsmcQGE2_?usp=sharing

Additional Changes: Python's time.time is wall clock time possibly relying on NTP server synchronization and is unreliable for tracking benchmark duration 3. A better alternative for benchmarking is to use a performance timer: time.perf_counter 4.

Footnotes

  1. https://mlco2.github.io/codecarbon/usage.html#configuration

  2. https://mlco2.github.io/codecarbon/parameters.html

  3. https://peps.python.org/pep-0418/#time-time

  4. https://peps.python.org/pep-0418/#time-perf-counter

@davisethan davisethan mentioned this pull request Jan 14, 2026
Copy link
Collaborator

@bruAristimunha bruAristimunha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small adjustments, fill the what's new and can you also please adjust the tutorial?

@bruAristimunha
Copy link
Collaborator

To make the tutorial run within the documentation, you will need to rename the file name to plot or tutorial in the beginning of the file name.

@davisethan
Copy link
Contributor Author

This PR does not contain the changes made in #870. If both PRs are accepted, I can add the changes from #870 into this PR, or alternatively, I can handle any merge conflicts that may occur between both PRs being merged.

@davisethan
Copy link
Contributor Author

I've written a tutorial that can be included in MOABB documentation. Currently it clones the feature branch of this PR and installs MOABB in development mode. The tutorial can be made final and added in a separate PR, given the features in this PR are present in the MOABB pypi.

Tutorial: https://colab.research.google.com/drive/1XfHtbDqtEIcS4SlBNy_4cYU13eI4PPud?usp=sharing

davisethan and others added 11 commits January 16, 2026 08:15
This commit enables fine-grained control over CodeCarbon emissions tracking
through the new codecarbon_config parameter in the benchmark() function.

Changes:
- Added codecarbon_config parameter to benchmark() with sensible defaults
  (save_to_file=False, log_level='error' for minimal overhead)
- Parameter is propagated to all evaluation types (WithinSession,
  CrossSession, CrossSubject)
- Comprehensive documentation with examples of available options
- Updated example files to demonstrate CodeCarbon configuration

Features enabled:
- 28+ configurable CodeCarbon parameters
- Multiple output backends (CSV, API, Prometheus, Logfire)
- GPU power tracking (gpu_ids parameter)
- Real-time carbon intensity data (Electricity Maps API)
- Environment variable and config file support
- Process-level or machine-level tracking modes

The implementation maintains full backward compatibility. Existing code
continues to work without any changes. Default configuration minimizes
overhead while still enabling emissions tracking.

Examples added:
- plot_example_codecarbon.py (renamed from example_codecarbon.py)
  with 7 different configuration scenarios
- plot_benchmark.py with CodeCarbon documentation
- plot_benchmark_grid_search.py with GridSearch-specific notes

Tutorial verified: Emissions tracking produces accurate CSV output with
energy consumption, power, and CO2 emissions data.
Significantly improve the CodeCarbon visualization capabilities by adding
support for detailed emissions analysis and efficiency metrics.

Changes:
- Enhanced codecarbon_plot() with optional multi-subplot analysis
  * include_efficiency: Shows energy efficiency ranking (accuracy/kg CO2)
  * include_power_vs_score: Shows accuracy vs emissions trade-off
  * Maintains backward compatibility (default behavior unchanged)

- New emissions_summary() function provides detailed metrics:
  * Average and total CO2 emissions per pipeline
  * Energy efficiency metrics (score per kg CO2)
  * Emissions per evaluation and standard deviations
  * Useful for identifying optimal sustainable pipelines

- Improved plotting implementation:
  * Better subplot handling for multiple visualizations
  * Enhanced legends, grids, and annotations
  * Support for optional efficiency ranking visualization
  * Support for Pareto frontier visualization (accuracy vs emissions)

- Updated example file to demonstrate new features:
  * Basic emissions visualization
  * Efficiency analysis example
  * Full multi-plot analysis example
  * Emissions summary report generation

The improvements enable users to:
1. Visualize CO2 emissions across datasets and pipelines
2. Identify most efficient pipelines (best accuracy per kg CO2)
3. Analyze performance-sustainability trade-offs
4. Generate detailed emissions reports for decision-making

All changes maintain full backward compatibility.
Update the example to comprehensively demonstrate all three CodeCarbon
visualization modes and detailed emissions analysis capabilities.

Changes:
- Reorganized plotting section into three distinct visualization modes:
  * Mode 1 (Default): Basic CO2 emissions per dataset/algorithm
  * Mode 2 (Efficiency): Add energy efficiency analysis (score/kg CO2)
  * Mode 3 (Complete): Full analysis with Pareto frontier visualization

- Enhanced emissions summary section with:
  * Detailed explanation of each metric
  * Sustainability rankings (efficiency, emissions, accuracy)
  * Recommendations for choosing sustainable pipelines
  * Comparison of trade-offs between pipelines

- Added comprehensive documentation:
  * Explanation of efficiency metric (accuracy per kg CO2)
  * Pareto frontier concept and optimization
  * Use cases for each visualization mode
  * Practical analysis and interpretation

Tutorial now demonstrates:
1. Creating three different visualizations
2. Generating detailed emissions summary reports
3. Identifying most efficient and sustainable pipelines
4. Understanding performance-sustainability trade-offs
5. Making data-driven decisions about pipeline selection

Example output includes rankings and recommendations to help users
choose pipelines that balance accuracy and environmental impact.
Remove all print statements from plot_example_codecarbon.py and replace
the Emissions Summary Report section with plot-based visualizations that
better demonstrate the data analysis capabilities.

Changes:
- Removed console output (print statements) from tutorial
- Added 4-subplot summary visualization showing:
  - Pipeline efficiency rankings (colored bar chart)
  - Average emissions comparison per pipeline
  - Accuracy performance with variability error bars
  - Total emissions summary
- Added Pareto frontier visualization for accuracy vs emissions trade-off
- Demonstrates optimal decision-making for pipeline selection

Bug fixes in codecarbon_plot:
- Fixed axes indexing when creating multiple subplots (n_plots > 1)
- Removed incorrect list wrapping of numpy axes array
- Fixed unique_pipelines to always be a list for consistent .index() calls
- Converted numpy array to list to prevent AttributeError

The tutorial now focuses entirely on visualizations rather than console
output, making it clearer and more visually informative for users.
Remove deprecated 'penalty' parameter from LogisticRegression classifiers
and use 'l1_ratio' parameter instead, following scikit-learn 1.8+ guidance:

- penalty='l1' → l1_ratio=1.0 with solver='saga'
- penalty='elasticnet' → removed, keeping only l1_ratio (0 < l1_ratio < 1)
- Updated Python examples and YAML pipeline configurations
- Resolves FutureWarning about penalty parameter removal in scikit-learn 1.10
- Remove invalid 'disable_rapl' parameter that doesn't exist in codecarbon>=2.1.4
  Process-level tracking already avoids RAPL permission issues
- Optimize learning curve evaluation by reusing EmissionsTracker per session
  instead of creating new instances for each iteration (major speedup)
- Move tracker initialization from inner loop to session level for efficiency
Re-add penalty='elasticnet' for ElasticNet configurations where 0 < l1_ratio < 1.
The previous deprecation fix incorrectly removed penalty parameter while keeping
l1_ratio values between 0-1. Without penalty='elasticnet', scikit-learn defaults
to penalty='l2' and ignores the l1_ratio, causing UserWarning.

Updated in:
- Python pipeline examples with l1_ratio 0.70 and 0.75
- YAML pipeline configurations for ElasticNet gridsearch
- Documentation in whats_new.rst

Fixes issue where l1_ratio parameter was ignored due to missing penalty parameter.
bruAristimunha and others added 4 commits January 18, 2026 20:33
Implement two complementary parallelization strategies:

1. Sphinx parallel build (-j auto flag):
   - Add -j auto to SPHINXOPTS in Makefile
   - Parallelizes the write-phase of Sphinx builds
   - Uses automatic CPU count for optimal performance
   - Expected ~40% speedup on multi-core systems for large projects

2. Sphinx-Gallery parallel example execution:
   - Add parallel: True to sphinx_gallery_conf in conf.py
   - Enables parallel execution of tutorial/example scripts
   - Examples are processed concurrently during gallery generation

Both optimizations are transparent to users - no content changes required.
Documentation builds will now use all available CPU cores for faster generation.

Addresses performance concerns about documentation generation time.
Updated author email for Ethan Davis in the plot example.

Signed-off-by: Ethan Davis <89031823+davisethan@users.noreply.github.com>
- Handle corrupted partial extractions in _correct_path by removing the target directory before renaming
- Add shutil import for cleanup operations
- Update CI cache key to use github.run_id to prevent stale cache reuse
- Add cleanup step to remove incomplete dataset extractions
@bruAristimunha bruAristimunha merged commit b9273fb into NeuroTechX:develop Jan 18, 2026
15 checks passed
@bruAristimunha
Copy link
Collaborator

Many thanks for the contribution here @davisethan 🙏🏽

@davisethan
Copy link
Contributor Author

@bruAristimunha Happy to get this data tracking started, it was fun, and thank you for the plotting, tutorials, and last leg repo-specific logistics 🚀

@davisethan davisethan deleted the codecarbon-to-file branch January 19, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants