Add support for overriding model architecture in Hugging Face conversion by gagika · Pull Request #3094 · AI-Hypercomputer/maxtext

gagika · 2026-02-05T06:27:56Z

Description

This PR adds a new flag --override_model_architecture to the Hugging Face checkpoint conversion script (src/MaxText/utils/ckpt_conversion/to_huggingface.py).

Why is this change being made?
Previously, the conversion script enforced a strict validation check that required the running MaxText configuration (passed via CLI/YAML) to exactly match the hardcoded Hugging Face configuration for a given model_name. This prevented users from converting modified or experimental model architectures (e.g., a Llama 3.1 model) without manually modifying the source code to add a new static model config entry.

The Solution:
This change introduces a boolean flag override_model_architecture.

Default Behavior (False): Retains the existing safety check. If the architecture parameters mismatch, the script raises a ValueError listing the differences.
Override Behavior (True): Explicitly overwrites the standard Hugging Face configuration object with the architecture parameters defined in the MaxText configuration (e.g., num_heads, hidden_size, num_layers, vocab_size) before saving the config.json.

This allows for greater flexibility when working with custom model variants while maintaining safety defaults.

Tests

Tested the conversion script locally with a modified Llama 3.1 8B architecture (custom heads and head dimension).

Command used:

JAX_PLATFORMS=cpu PYTHONPATH=$(pwd)/src python3 src/MaxText/utils/ckpt_conversion/to_huggingface.py \
    src/MaxText/configs/base.yml \
    --override_model_architecture \
    skip_jax_distributed_system=true \
    model_name="llama3.1-8b" \
    hf_access_token=${HF_TOKEN} \
    base_output_directory="${HF_MODEL_CHECKPOINT}" \
    load_parameters_path="${MAXTEXT_CHECKPOINT}" \
    scan_layers=True \
    head_dim=256 \
    hardware=cpu

# Checklist

Before submitting this PR, please make sure (put X in square brackets):
- [x] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label.
- [x] I have necessary comments in my code, particularly in hard-to-understand areas.
- [x] I have run end-to-end tests tests and provided workload links above if applicable.
- [x] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files).

codecov · 2026-02-05T06:36:40Z

Codecov Report

❌ Patch coverage is 0% with 31 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...rc/MaxText/utils/ckpt_conversion/to_huggingface.py	0.00%	31 Missing ⚠️

📢 Thoughts on this report? Let us know!

hengtaoguo · 2026-02-06T18:08:11Z

src/MaxText/utils/ckpt_conversion/to_huggingface.py

  hf_config_obj = HF_MODEL_CONFIGS[model_key]

+  # Validate architecture consistency (raising ValueError on mismatch) or override HF config if specified.
+  validate_or_update_architecture(hf_config_obj, config, override=FLAGS.override_model_architecture)


Great feature! This will update hf_config_obj in-place right?

hengtaoguo · 2026-02-06T18:11:08Z

src/MaxText/utils/ckpt_conversion/to_huggingface.py

  }


+def validate_or_update_architecture(hf_config, max_config, override: bool):


nit: I wonder if we should move this function to https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_conversion/utils/utils.py. But either way works since this is only used in to_huggingface.

Add support for overriding model architecture in Hugging Face conversion

8b8f9c4

gagika marked this pull request as ready for review February 5, 2026 17:58

gagika requested review from NicoGrande, RissyRan, bvandermoon, gobbleturk, hengtaoguo, jiangjy1982, parambole, richjames0, shralex, shuningjin and suexu1025 as code owners February 5, 2026 17:58

gagika assigned hengtaoguo and entrpn Feb 5, 2026

entrpn approved these changes Feb 6, 2026

View reviewed changes

hengtaoguo approved these changes Feb 6, 2026

View reviewed changes

hengtaoguo reviewed Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for overriding model architecture in Hugging Face conversion#3094

Add support for overriding model architecture in Hugging Face conversion#3094
gagika wants to merge 1 commit intomainfrom
agagik-to-hf

gagika commented Feb 5, 2026

Uh oh!

codecov bot commented Feb 5, 2026

Uh oh!

hengtaoguo Feb 6, 2026

Uh oh!

hengtaoguo Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		}


		def validate_or_update_architecture(hf_config, max_config, override: bool):

Conversation

gagika commented Feb 5, 2026

Description

Tests

Uh oh!

codecov bot commented Feb 5, 2026

Codecov Report

Uh oh!

hengtaoguo Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

hengtaoguo Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants