Skip to content

Conversation

@vasilchev
Copy link
Contributor

@vasilchev vasilchev commented Jan 8, 2026

Try introduce 'PAUSE' Success Action.

Idea:
Currently only 'NEXTGROUP' Success Action is available once Group Success Condition is fulfilled - start next group.
In some use-cases, Success results from Targets is not enough and external checks must be made to verify really installation was success (i.e. regression bugs - installation itself was successful but introduced regression bugs that can be seen later)

For that, we would need Success Action that could fulfill this - i.e. in case of Group Success Condition is met -> PAUSE the Rollout so that external checks can be made, after checks are ok, externally resume the Rollout so it can continue.

Current Issues found.

  1. On each Rollout scheduler run - > Success/Error Conditions are evaluated on Running Groups - Groups that have not finished (there are still Active Actions)
    1.1) In case of Error Condition is fulfilled -> ErrorAction('PAUSE') is triggered and 'FINISHED' state is set to the Group (even thought there are still active Actions for this 'error' Group
    1.2) In case of Success Condition is fulfilled -> SuccessAction is triggered but Group is not set to 'FINISHED' until all Actions are in terminate status - i.e. Group is left running, which means on next scheduler run it will again be evaluated for success/error conditions

It is valid to leave Groups in Running state in the case of Dynamic groups- as they should never finish - there are always Targets that will show up later and must be included and actions assigned (on periodical Rollout scheduler runs).

But in case of 'PAUSE' Success Action, Success Group that are left in RUNNING state (with still active Actions) means multiple time evaluating the success condition -> exec success action.
which leaves to flows like:

  1. start rollout with 3 groups with 'pause' success action
  2. first group success condition is met -> execute success action -> rollout is paused
  3. external checks finish, resume rollout (setting Rollout in Running state)
  4. rollout scheduler kicks in and handles all running rollouts
    5.1) first group (that initially was success and paused the rollout) is now Finished - OK scenario - it is not evaluated anymore
    5.2) first group (that initially was success and paused the rollout) is still in Running state - it is again evaluated and again PAUSE action is executed -> Not the desired result

Option 0) (in this PR)
Rollout executor overrides the Group Success Action with "NEXTGROUP" after it has executed it once (so that after external resume it continues as normal flow.

  • no explicit trigger next group, after resume it will pick up the "NEXTGROUP" automatically and work as of now
  • this is a hacky way that overrides initial settings
  • it is not intuitive and expected - in case of Failed Rollout and want to copy the Rollout - it will probably copy it with the SuccessAction overriden..

Option 1)
Introduce new Group 'Running' status that indicates it is RUNNING but already success condition + action was executed once - that way 'Success' Actions could check and decide whether to do something or not (in the case of 'PAUSE' and it sees it was already evaluated once and triggered in the past to not execute again.
It should be kind of 'Running' status so that dynamic groups are still taken into account and filled with new targets/actions

  • seems logical and intuitive - even now success actions sometimes are executed and doing queries that may be obsolete (in case of static group, already in success state, but still running, already started next group?)
  • no db migration needed (re-use the group status, just introduce new value)
  • difficult to pick a proper 'running' status name that indicates group is still running but not finished and success action already executed once - i.e. 'Running-Success' ?
  • resume operation has to check if last group was with Pause Action and was executed already and trigger next group

Option 2)
Introduce new Rollout Group field in db indicating success action already executed
-db migration

  • resume operation has to check if last group was with Pause Action and was executed already and trigger next group

EDIT:

After additional review a different approach was implemented

  1. Made PAUSE Success action to pause only if next group is in Schedule status (i.e. not started yet), if running, do nothing
  2. Rollout Resume MGMT API changed to check if 'Paused' Rollout has last Group in
  • Error state (this was not covered actually form previous impl)
  • Success state with 'PAUSE' success action
    In either case, Resume MGMT API does:
  • Trigger next group
  • Resumes Rollout

That way we do not override Success Action - i.e. copy of Rollout now works as expected.

@vasilchev vasilchev force-pushed the rollout/pause_success_action branch 2 times, most recently from e278faa to a60b82f Compare January 8, 2026 22:38
Signed-off-by: vasilchev <vasil.ilchev@bosch.com>
@vasilchev vasilchev force-pushed the rollout/pause_success_action branch 5 times, most recently from de351b9 to c91dc1f Compare January 9, 2026 17:29
…ollout

Fix Rollout Mgmt Resource to accept new Pause Action

Signed-off-by: vasilchev <vasil.ilchev@bosch.com>
@vasilchev vasilchev force-pushed the rollout/pause_success_action branch from c91dc1f to a317398 Compare January 9, 2026 17:39
@avgustinmm
Copy link
Contributor

@vasilchev check to be compliant with ECA - see https://api.eclipse.org/git/eca/status/gh/eclipse-hawkbit/hawkbit/2867
Maybe bosch.io vs bosch.com email

Signed-off-by: vasilchev <vasil.ilchev@bosch.com>
@sonarqubecloud
Copy link

@avgustinmm avgustinmm merged commit 0083d55 into eclipse-hawkbit:master Jan 13, 2026
5 checks passed
@vasilchev vasilchev deleted the rollout/pause_success_action branch January 13, 2026 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants