Skip to content

Conversation

@HyperPS
Copy link

@HyperPS HyperPS commented Nov 29, 2025

This PR adds validation for extremely-large HDF5 dataset metadata to prevent
remote DoS via HDF5 shape bombs in .keras files. Includes:
Vuln (Reported on Huntr and GHSA)

• Defensive size validation
• Rejection of hostile shapes and dimension overflows
• Recursion path fix for correct inner_path handling
• Avoids unbounded numpy allocation of multi-gigabyte tensors

Backward compatible with valid Keras weight files.

@google-cla
Copy link

google-cla bot commented Nov 29, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @HyperPS, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical security vulnerability by enhancing the KerasFileEditor to robustly validate HDF5 dataset metadata. It introduces comprehensive checks for malformed shapes, excessively high tensor ranks, and large memory allocations, effectively preventing potential Denial of Service attacks that could arise from specially crafted .keras files. The update ensures safer loading of model weights without compromising backward compatibility.

Highlights

  • DoS Prevention: Fixes a Denial of Service (DoS) vulnerability related to malicious HDF5 dataset metadata in .keras files, specifically preventing 'HDF5 shape bombs'.
  • Metadata Validation: Implements defensive size validation, rejection of hostile shapes (e.g., negative dimensions, rank > 64), and checks for dimension overflows in HDF5 datasets.
  • Memory Allocation Control: Prevents unbounded NumPy allocation of multi-gigabyte tensors by enforcing a maximum memory size limit of 1 GiB for HDF5 datasets during loading.
  • Recursion Path Fix: Corrects the handling of inner_path during recursive calls within _extract_weights_from_store to ensure proper state management.
  • Backward Compatibility: Ensures that the implemented security measures are backward compatible with valid Keras weight files.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a critical Denial of Service vulnerability by introducing comprehensive validation for HDF5 dataset metadata before loading. The changes include checks for hostile shapes, dimension overflows, and excessive memory allocation, which successfully prevents 'shape bomb' attacks. Additionally, the refactoring of the _extract_weights_from_store method significantly improves code clarity and correctness, notably by fixing a bug in the recursive path handling. The overall implementation is robust and enhances the security of file loading operations.

@codecov-commenter
Copy link

codecov-commenter commented Nov 29, 2025

Codecov Report

❌ Patch coverage is 41.66667% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.55%. Comparing base (f2c00fe) to head (376885d).

Files with missing lines Patch % Lines
keras/src/saving/file_editor.py 41.66% 9 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #21880      +/-   ##
==========================================
- Coverage   82.57%   82.55%   -0.02%     
==========================================
  Files         577      577              
  Lines       59599    59620      +21     
  Branches     9351     9355       +4     
==========================================
+ Hits        49213    49220       +7     
- Misses       7978     7987       +9     
- Partials     2408     2413       +5     
Flag Coverage Δ
keras 82.37% <41.66%> (-0.02%) ⬇️
keras-jax 62.86% <41.66%> (-0.02%) ⬇️
keras-numpy 57.51% <41.66%> (-0.01%) ⬇️
keras-openvino 34.32% <0.00%> (-0.02%) ⬇️
keras-tensorflow 64.39% <41.66%> (-0.02%) ⬇️
keras-torch 63.56% <41.66%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hertschuh
Copy link
Collaborator

@HyperPS

Thank you for the PR!

Vuln (Reported on Huntr and GHSA)

Do you have references/links for these?

@HyperPS
Copy link
Author

HyperPS commented Dec 1, 2025

Thanks @hertschuh ,The Huntr report is still private (I reported the issue there), so there’s no public link yet — maintainers can access it via the magic-link email from Huntr. I also submitted the same vulnerability to GHSA, which is currently in the private review queue, so there isn’t a public link for that yet either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants