Skip to content

checkpoint_storage_concurrent_gb flag is only respected when load_parameters_path is passed #2829

@mkmg

Description

@mkmg

Bug report

I set checkpoint_storage_concurrent_gb to 900.

When I run with load_parameters_path set, I see the following logs:
Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7934458b4c50>, enable_pinned_host_transfer=False, save_concurrent_bytes: 900000000000 (838.2 GiB), restore_concurrent_bytes: 900000000000 (838.2 GiB)

This is as expected. However for runs where A) load_full_state_path is set or B) neither load_full_state_path or load_parameters_path is set, I see the following logs:
Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f107ea088f0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)

Logs/Output

No response

Environment Information

orbax-checkpoint version: 0.11.30

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions