Skip to content

Add support for overriding model architecture in Hugging Face conversion#3094

Open
gagika wants to merge 1 commit intomainfrom
agagik-to-hf
Open

Add support for overriding model architecture in Hugging Face conversion#3094
gagika wants to merge 1 commit intomainfrom
agagik-to-hf

Conversation

@gagika
Copy link
Collaborator

@gagika gagika commented Feb 5, 2026

Description

This PR adds a new flag --override_model_architecture to the Hugging Face checkpoint conversion script (src/MaxText/utils/ckpt_conversion/to_huggingface.py).

Why is this change being made?
Previously, the conversion script enforced a strict validation check that required the running MaxText configuration (passed via CLI/YAML) to exactly match the hardcoded Hugging Face configuration for a given model_name. This prevented users from converting modified or experimental model architectures (e.g., a Llama 3.1 model) without manually modifying the source code to add a new static model config entry.

The Solution:
This change introduces a boolean flag override_model_architecture.

  • Default Behavior (False): Retains the existing safety check. If the architecture parameters mismatch, the script raises a ValueError listing the differences.
  • Override Behavior (True): Explicitly overwrites the standard Hugging Face configuration object with the architecture parameters defined in the MaxText configuration (e.g., num_heads, hidden_size, num_layers, vocab_size) before saving the config.json.

This allows for greater flexibility when working with custom model variants while maintaining safety defaults.

Tests

Tested the conversion script locally with a modified Llama 3.1 8B architecture (custom heads and head dimension).

Command used:

JAX_PLATFORMS=cpu PYTHONPATH=$(pwd)/src python3 src/MaxText/utils/ckpt_conversion/to_huggingface.py \
    src/MaxText/configs/base.yml \
    --override_model_architecture \
    skip_jax_distributed_system=true \
    model_name="llama3.1-8b" \
    hf_access_token=${HF_TOKEN} \
    base_output_directory="${HF_MODEL_CHECKPOINT}" \
    load_parameters_path="${MAXTEXT_CHECKPOINT}" \
    scan_layers=True \
    head_dim=256 \
    hardware=cpu

# Checklist

Before submitting this PR, please make sure (put X in square brackets):
- [x] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label.
- [x] I have necessary comments in my code, particularly in hard-to-understand areas.
- [x] I have run end-to-end tests tests and provided workload links above if applicable.
- [x] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files).

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 0% with 31 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...rc/MaxText/utils/ckpt_conversion/to_huggingface.py 0.00% 31 Missing ⚠️

📢 Thoughts on this report? Let us know!

hf_config_obj = HF_MODEL_CONFIGS[model_key]

# Validate architecture consistency (raising ValueError on mismatch) or override HF config if specified.
validate_or_update_architecture(hf_config_obj, config, override=FLAGS.override_model_architecture)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feature! This will update hf_config_obj in-place right?

}


def validate_or_update_architecture(hf_config, max_config, override: bool):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if we should move this function to https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_conversion/utils/utils.py. But either way works since this is only used in to_huggingface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants