-
Notifications
You must be signed in to change notification settings - Fork 3
fix(docworker): Fix handling deleted document #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves the handling of deleted documents in the document worker by introducing a DocumentNotFoundException exception class and adding logic to check if a document exists before attempting to set its state to FAILED.
Changes:
- Adds a new
DocumentNotFoundExceptionexception that inherits fromJobException - Introduces a
check_document_exists()method to verify document presence before operations - Updates exception handling to gracefully handle cases where documents are deleted during processing
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| packages/dsw-document-worker/dsw/document_worker/exceptions.py | Adds DocumentNotFoundException class and updates create_job_exception to return it when document is not found |
| packages/dsw-document-worker/dsw/document_worker/worker.py | Adds document existence check in _set_failed, handles DocumentNotFoundException in run method, and uses new exception in get_document |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| LOG.warning('Document %s does not exist, cannot set to FAILED', | ||
| self.doc_uuid) | ||
| return |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check_document_exists() method returns False on any exception (line 98-101), which could hide database connectivity or permission issues. When database errors occur, the code will incorrectly assume the document doesn't exist and silently return without setting the FAILED state or reporting the actual error. Consider distinguishing between 'document not found' and 'error checking document existence' to avoid masking genuine database problems.
| LOG.warning('Document %s does not exist, cannot set to FAILED', | |
| self.doc_uuid) | |
| return | |
| LOG.warning( | |
| 'Document %s may not exist or could not be checked; ' | |
| 'attempting to set state to FAILED anyway', | |
| self.doc_uuid, | |
| ) |
| try: | ||
| self._run() | ||
| except DocumentNotFoundException as e: | ||
| LOG.warning('Document not found: %s', e.log_message()) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DocumentNotFoundException is caught and logged, but no error reporting to Sentry occurs. This creates inconsistent error reporting compared to the JobException handler (lines 350-353) which captures the exception in Sentry. Add SentryReporter.capture_exception(e) to ensure these errors are properly tracked.
| LOG.warning('Document not found: %s', e.log_message()) | |
| LOG.warning('Document not found: %s', e.log_message()) | |
| SentryReporter.capture_exception(e) |
| def check_document_exists(self) -> bool: | ||
| LOG.debug('Checking if document "%s" exists in DB', self.doc_uuid) | ||
| try: | ||
| doc = self.ctx.app.db.fetch_document( | ||
| document_uuid=self.doc_uuid, | ||
| tenant_uuid=self.tenant_uuid, | ||
| ) | ||
| exists = doc is not None | ||
| LOG.debug('Document "%s" exists: %s', self.doc_uuid, exists) | ||
| return exists | ||
| except Exception as e: | ||
| LOG.error('Failed to check if document "%s" exists: %s', | ||
| self.doc_uuid, str(e)) | ||
| return False |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check_document_exists() method duplicates the document fetching logic from get_document() (lines 113-116). Both methods call fetch_document with the same parameters. This duplication could lead to maintenance issues if the fetching logic needs to change. Consider refactoring to have a single method that performs the fetch, or have this method leverage the existing get_document() method's fetch logic.
No description provided.