-
Notifications
You must be signed in to change notification settings - Fork 225
Codecarbon to file #866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codecarbon to file #866
Conversation
bruAristimunha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small adjustments, fill the what's new and can you also please adjust the tutorial?
|
To make the tutorial run within the documentation, you will need to rename the file name to plot or tutorial in the beginning of the file name. |
|
I've written a tutorial that can be included in MOABB documentation. Currently it clones the feature branch of this PR and installs MOABB in development mode. The tutorial can be made final and added in a separate PR, given the features in this PR are present in the MOABB pypi. Tutorial: https://colab.research.google.com/drive/1XfHtbDqtEIcS4SlBNy_4cYU13eI4PPud?usp=sharing |
This commit enables fine-grained control over CodeCarbon emissions tracking through the new codecarbon_config parameter in the benchmark() function. Changes: - Added codecarbon_config parameter to benchmark() with sensible defaults (save_to_file=False, log_level='error' for minimal overhead) - Parameter is propagated to all evaluation types (WithinSession, CrossSession, CrossSubject) - Comprehensive documentation with examples of available options - Updated example files to demonstrate CodeCarbon configuration Features enabled: - 28+ configurable CodeCarbon parameters - Multiple output backends (CSV, API, Prometheus, Logfire) - GPU power tracking (gpu_ids parameter) - Real-time carbon intensity data (Electricity Maps API) - Environment variable and config file support - Process-level or machine-level tracking modes The implementation maintains full backward compatibility. Existing code continues to work without any changes. Default configuration minimizes overhead while still enabling emissions tracking. Examples added: - plot_example_codecarbon.py (renamed from example_codecarbon.py) with 7 different configuration scenarios - plot_benchmark.py with CodeCarbon documentation - plot_benchmark_grid_search.py with GridSearch-specific notes Tutorial verified: Emissions tracking produces accurate CSV output with energy consumption, power, and CO2 emissions data.
Significantly improve the CodeCarbon visualization capabilities by adding support for detailed emissions analysis and efficiency metrics. Changes: - Enhanced codecarbon_plot() with optional multi-subplot analysis * include_efficiency: Shows energy efficiency ranking (accuracy/kg CO2) * include_power_vs_score: Shows accuracy vs emissions trade-off * Maintains backward compatibility (default behavior unchanged) - New emissions_summary() function provides detailed metrics: * Average and total CO2 emissions per pipeline * Energy efficiency metrics (score per kg CO2) * Emissions per evaluation and standard deviations * Useful for identifying optimal sustainable pipelines - Improved plotting implementation: * Better subplot handling for multiple visualizations * Enhanced legends, grids, and annotations * Support for optional efficiency ranking visualization * Support for Pareto frontier visualization (accuracy vs emissions) - Updated example file to demonstrate new features: * Basic emissions visualization * Efficiency analysis example * Full multi-plot analysis example * Emissions summary report generation The improvements enable users to: 1. Visualize CO2 emissions across datasets and pipelines 2. Identify most efficient pipelines (best accuracy per kg CO2) 3. Analyze performance-sustainability trade-offs 4. Generate detailed emissions reports for decision-making All changes maintain full backward compatibility.
Update the example to comprehensively demonstrate all three CodeCarbon visualization modes and detailed emissions analysis capabilities. Changes: - Reorganized plotting section into three distinct visualization modes: * Mode 1 (Default): Basic CO2 emissions per dataset/algorithm * Mode 2 (Efficiency): Add energy efficiency analysis (score/kg CO2) * Mode 3 (Complete): Full analysis with Pareto frontier visualization - Enhanced emissions summary section with: * Detailed explanation of each metric * Sustainability rankings (efficiency, emissions, accuracy) * Recommendations for choosing sustainable pipelines * Comparison of trade-offs between pipelines - Added comprehensive documentation: * Explanation of efficiency metric (accuracy per kg CO2) * Pareto frontier concept and optimization * Use cases for each visualization mode * Practical analysis and interpretation Tutorial now demonstrates: 1. Creating three different visualizations 2. Generating detailed emissions summary reports 3. Identifying most efficient and sustainable pipelines 4. Understanding performance-sustainability trade-offs 5. Making data-driven decisions about pipeline selection Example output includes rankings and recommendations to help users choose pipelines that balance accuracy and environmental impact.
Remove all print statements from plot_example_codecarbon.py and replace the Emissions Summary Report section with plot-based visualizations that better demonstrate the data analysis capabilities. Changes: - Removed console output (print statements) from tutorial - Added 4-subplot summary visualization showing: - Pipeline efficiency rankings (colored bar chart) - Average emissions comparison per pipeline - Accuracy performance with variability error bars - Total emissions summary - Added Pareto frontier visualization for accuracy vs emissions trade-off - Demonstrates optimal decision-making for pipeline selection Bug fixes in codecarbon_plot: - Fixed axes indexing when creating multiple subplots (n_plots > 1) - Removed incorrect list wrapping of numpy axes array - Fixed unique_pipelines to always be a list for consistent .index() calls - Converted numpy array to list to prevent AttributeError The tutorial now focuses entirely on visualizations rather than console output, making it clearer and more visually informative for users.
Remove deprecated 'penalty' parameter from LogisticRegression classifiers and use 'l1_ratio' parameter instead, following scikit-learn 1.8+ guidance: - penalty='l1' → l1_ratio=1.0 with solver='saga' - penalty='elasticnet' → removed, keeping only l1_ratio (0 < l1_ratio < 1) - Updated Python examples and YAML pipeline configurations - Resolves FutureWarning about penalty parameter removal in scikit-learn 1.10
- Remove invalid 'disable_rapl' parameter that doesn't exist in codecarbon>=2.1.4 Process-level tracking already avoids RAPL permission issues - Optimize learning curve evaluation by reusing EmissionsTracker per session instead of creating new instances for each iteration (major speedup) - Move tracker initialization from inner loop to session level for efficiency
Re-add penalty='elasticnet' for ElasticNet configurations where 0 < l1_ratio < 1. The previous deprecation fix incorrectly removed penalty parameter while keeping l1_ratio values between 0-1. Without penalty='elasticnet', scikit-learn defaults to penalty='l2' and ignores the l1_ratio, causing UserWarning. Updated in: - Python pipeline examples with l1_ratio 0.70 and 0.75 - YAML pipeline configurations for ElasticNet gridsearch - Documentation in whats_new.rst Fixes issue where l1_ratio parameter was ignored due to missing penalty parameter.
76b3886 to
5ba0dd1
Compare
Implement two complementary parallelization strategies: 1. Sphinx parallel build (-j auto flag): - Add -j auto to SPHINXOPTS in Makefile - Parallelizes the write-phase of Sphinx builds - Uses automatic CPU count for optimal performance - Expected ~40% speedup on multi-core systems for large projects 2. Sphinx-Gallery parallel example execution: - Add parallel: True to sphinx_gallery_conf in conf.py - Enables parallel execution of tutorial/example scripts - Examples are processed concurrently during gallery generation Both optimizations are transparent to users - no content changes required. Documentation builds will now use all available CPU cores for faster generation. Addresses performance concerns about documentation generation time.
Updated author email for Ethan Davis in the plot example. Signed-off-by: Ethan Davis <89031823+davisethan@users.noreply.github.com>
- Handle corrupted partial extractions in _correct_path by removing the target directory before renaming - Add shutil import for cleanup operations - Update CI cache key to use github.run_id to prevent stale cache reuse - Add cleanup step to remove incomplete dataset extractions
|
Many thanks for the contribution here @davisethan 🙏🏽 |
|
@bruAristimunha Happy to get this data tracking started, it was fun, and thank you for the plotting, tutorials, and last leg repo-specific logistics 🚀 |
Problem: MOABB prevents detailed CodeCarbon compute profiling metrics from being saved to file.
Proposed Solution: Make CodeCarbon fully configurable outside the script level; e.g. environment variables, configuration files 12. When CodeCarbon is installed, the MOABB tabular results have an additional column
codecarbon_task_namethat is a unique UUID4 that can be joined with related rows from CodeCarbon tabular results by the columntask_name. CodeCarbon writes multiple files, requiring the programmer to combine relevant CodeCarbon tables to join with MOABB tables to see detailed compute profiling metrics per cross-validation.Google Colab: https://colab.research.google.com/drive/1YOUe47Easrj-FVbVrpsLfMmHsmcQGE2_?usp=sharing
Additional Changes: Python's
time.timeis wall clock time possibly relying on NTP server synchronization and is unreliable for tracking benchmark duration 3. A better alternative for benchmarking is to use a performance timer:time.perf_counter4.Footnotes
https://mlco2.github.io/codecarbon/usage.html#configuration ↩
https://mlco2.github.io/codecarbon/parameters.html ↩
https://peps.python.org/pep-0418/#time-time ↩
https://peps.python.org/pep-0418/#time-perf-counter ↩