[FLINK-38990][Runtime/Checkpointing] Support configurable initial delay for first checkpoint trigger#27484
Open
Myracle wants to merge 1 commit intoapache:masterfrom
Open
Conversation
…ay for first checkpoint trigger
Collaborator
davidradl
reviewed
Jan 28, 2026
| - `execution.checkpointing.dir`: The directory to write checkpoints to. This takes a path URI like *s3://mybucket/flink-app/checkpoints* or *hdfs://namenode:port/flink/checkpoints*. | ||
| - `execution.checkpointing.savepoint-dir`: The default directory for savepoints. Takes a path URI, similar to `execution.checkpointing.dir`. | ||
| - `execution.checkpointing.interval`: The base interval setting. To enable checkpointing, you need to set this value larger than 0. | ||
| - `execution.checkpointing.initial-delay`: The initial delay before the first checkpoint is triggered. This is useful for jobs that need time to warm up or catch up with backlogs (e.g., consuming from Kafka with large lag). |
Contributor
There was a problem hiding this comment.
I am curious,
- can we notice the warm up or catch up with backlogs activity and dynamically wait as long a is appropriate.
- I suggest it is worth documenting what the impact is if we hit a warm up or catch up with backlogs activity without this delay and some discussion of the trade offs when using this option.
davidradl
reviewed
Jan 28, 2026
| .text( | ||
| "The initial delay before the first checkpoint is triggered after the job starts. " | ||
| + "This is useful for jobs that need time to warm up or catch up with backlogs. " | ||
| + "If set to 0 (default), the initial delay will be randomly chosen between " |
Contributor
There was a problem hiding this comment.
I wonder if it would be better to have this as a random jitter above the minimum pause. Otherwise we could randomly get a very long delay for the first checkpoint if CHECKPOINTING_INTERVAL is large.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
This pull request adds a new configuration option execution.checkpointing.initial-delay that allows users to configure the delay before the first checkpoint is triggered after job startup. This is particularly useful for jobs that need time to warm up or catch up with backlogs (e.g., consuming from Kafka with large lag) before performing the first checkpoint.
Currently, the initial delay before the first checkpoint is randomly chosen between minPauseBetweenCheckpoints and baseInterval. This behavior is not configurable and may not be suitable for scenarios where jobs need a longer warm-up period. With this change, users can explicitly configure the initial delay to avoid checkpoint overhead during the critical catch-up phase.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (no)Documentation