-
Notifications
You must be signed in to change notification settings - Fork 140
Moves API heartbeat to it's own thread. #7331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dkliban
wants to merge
1
commit into
pulp:main
Choose a base branch
from
dkliban:7315
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Fixed bug where API process would miss heartbeats during long uploads. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -2,6 +2,8 @@ | |||||
| from logging import getLogger | ||||||
| import os | ||||||
| import sys | ||||||
| import threading | ||||||
| import time | ||||||
|
|
||||||
| import click | ||||||
| import django | ||||||
|
|
@@ -22,9 +24,40 @@ | |||||
|
|
||||||
|
|
||||||
| class PulpApiWorker(SyncWorker): | ||||||
| def __init__(self, *args, **kwargs): | ||||||
| super().__init__(*args, **kwargs) | ||||||
| self.heartbeat_thread = None | ||||||
|
|
||||||
| def _heartbeat_loop(self): | ||||||
| """Run heartbeat in a loop. Exit process if heartbeat fails.""" | ||||||
| try: | ||||||
| while self.alive: | ||||||
| try: | ||||||
| self.heartbeat() | ||||||
| except Exception as e: | ||||||
| # Any unhandled exception in heartbeat should restart the worker | ||||||
| logger.error( | ||||||
| f"Heartbeat thread encountered fatal error for worker {self.name}: {e}" | ||||||
| ) | ||||||
| # Exit with error code to trigger worker restart | ||||||
| os._exit(Arbiter.WORKER_BOOT_ERROR) | ||||||
|
|
||||||
| time.sleep(self.heartbeat_interval) | ||||||
| except Exception as e: | ||||||
| # Safety net: if even the loop itself fails, restart | ||||||
| logger.error(f"Heartbeat thread loop failed for worker {self.name}: {e}") | ||||||
| os._exit(Arbiter.WORKER_BOOT_ERROR) | ||||||
|
|
||||||
| def notify(self): | ||||||
| super().notify() | ||||||
| self.heartbeat() | ||||||
|
|
||||||
| # Check if heartbeat thread is still alive | ||||||
| if not self.heartbeat_thread.is_alive(): | ||||||
| logger.error( | ||||||
| f"Heartbeat thread died unexpectedly for worker {self.name}. " | ||||||
| "Exiting to trigger restart." | ||||||
| ) | ||||||
| os._exit(Arbiter.WORKER_BOOT_ERROR) | ||||||
|
|
||||||
| def heartbeat(self): | ||||||
| try: | ||||||
|
|
@@ -37,7 +70,8 @@ def heartbeat(self): | |||||
| logger.debug(self.beat_msg) | ||||||
| except (InterfaceError, DatabaseError) as e: | ||||||
| logger.error(f"{self.fail_beat_msg} Exception: {str(e)}") | ||||||
| exit(Arbiter.WORKER_BOOT_ERROR) | ||||||
| # This will be caught by _heartbeat_loop and trigger os._exit | ||||||
| raise | ||||||
|
|
||||||
| def init_process(self): | ||||||
| os.environ.setdefault("DJANGO_SETTINGS_MODULE", "pulpcore.app.settings") | ||||||
|
|
@@ -76,6 +110,15 @@ def init_process(self): | |||||
| logger.error(f"An API app with name {self.name} already exists in the database.") | ||||||
| exit(Arbiter.WORKER_BOOT_ERROR) | ||||||
|
|
||||||
| # Store heartbeat interval for the heartbeat thread | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
:mild sarcasm: |
||||||
| self.heartbeat_interval = settings.API_APP_TTL // 3 | ||||||
|
|
||||||
| # Start heartbeat thread before entering the main loop | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now why don't we then put all of that stuff right before entering the main loop? |
||||||
| self.heartbeat_thread = threading.Thread( | ||||||
| target=self._heartbeat_loop, daemon=True, name=f"heartbeat-{self.name}" | ||||||
| ) | ||||||
| self.heartbeat_thread.start() | ||||||
|
|
||||||
| super().init_process() | ||||||
|
|
||||||
| def run(self): | ||||||
|
|
||||||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First i thought, this is dead code. Now i think we have the opportunity to check in on the thread here.