fix: Skip redundant S3 upload when file already exists after rollback#1400
Merged
dimitri-yatsenko merged 1 commit intomaint/0.14from Feb 17, 2026
Merged
fix: Skip redundant S3 upload when file already exists after rollback#1400dimitri-yatsenko merged 1 commit intomaint/0.14from
dimitri-yatsenko merged 1 commit intomaint/0.14from
Conversation
After a transaction rollback, S3 files survive but DB tracking entries are lost. On retry, upload_filepath would re-upload the entire file (potentially multi-GB) because it only checked the DB. Now checks S3 via a single stat_object call before uploading. If the object exists with matching size and contents_hash metadata, the upload is skipped. The DB tracking entry is always (re-)inserted regardless. Also adds s3.Folder.stat() method and refactors exists() to use it, avoiding redundant stat_object calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ttngu207
approved these changes
Feb 17, 2026
This was referenced Feb 17, 2026
upload_filepath re-uploads files after transaction rollback — should check S3 before uploading
#1397
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
upload_filepathre-uploads multi-GB files after transaction rollbackstat_objectcall for an existing object with matching size andcontents_hashmetadatas3.Folder.stat()method; refactorsexists()to use itRoot cause
S3 uploads are not transactional. When a transaction rolls back after a successful upload (e.g., DB connection timeout during a long
make()), the file remains on S3 but the tracking entry is lost. On retry,upload_filepathchecks only the DB, finds no entry, and re-uploads the entire file — creating an infinite retry loop for large files.What changed
datajoint/s3.py: Newstat()method returns the fullstat_objectresult (size, metadata) orNone— single HTTP HEAD request.exists()refactored to use it.datajoint/external.py: Inupload_filepath's else branch (no DB entry), before calling_upload_file:s3.stat()on the expected S3 pathcontents_hashfrom metadata → skip upload, log infoskip_checksummode, match on size onlyTest plan
🤖 Generated with Claude Code