Skip to content

fix: Skip redundant S3 upload when file already exists after rollback#1400

Merged
dimitri-yatsenko merged 1 commit intomaint/0.14from
fix/skip-s3-reupload-1397
Feb 17, 2026
Merged

fix: Skip redundant S3 upload when file already exists after rollback#1400
dimitri-yatsenko merged 1 commit intomaint/0.14from
fix/skip-s3-reupload-1397

Conversation

@dimitri-yatsenko
Copy link
Member

Summary

Root cause

S3 uploads are not transactional. When a transaction rolls back after a successful upload (e.g., DB connection timeout during a long make()), the file remains on S3 but the tracking entry is lost. On retry, upload_filepath checks only the DB, finds no entry, and re-uploads the entire file — creating an infinite retry loop for large files.

What changed

datajoint/s3.py: New stat() method returns the full stat_object result (size, metadata) or None — single HTTP HEAD request. exists() refactored to use it.

datajoint/external.py: In upload_filepath's else branch (no DB entry), before calling _upload_file:

  1. Call s3.stat() on the expected S3 path
  2. If object exists with matching size and contents_hash from metadata → skip upload, log info
  3. If skip_checksum mode, match on size only
  4. Always insert the DB tracking entry regardless

Test plan

  • Verify existing external storage tests pass
  • Manual test: upload large filepath, kill connection before commit, verify retry skips re-upload

🤖 Generated with Claude Code

After a transaction rollback, S3 files survive but DB tracking entries
are lost. On retry, upload_filepath would re-upload the entire file
(potentially multi-GB) because it only checked the DB.

Now checks S3 via a single stat_object call before uploading. If the
object exists with matching size and contents_hash metadata, the upload
is skipped. The DB tracking entry is always (re-)inserted regardless.

Also adds s3.Folder.stat() method and refactors exists() to use it,
avoiding redundant stat_object calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimitri-yatsenko dimitri-yatsenko merged commit f401a20 into maint/0.14 Feb 17, 2026
3 checks passed
@dimitri-yatsenko dimitri-yatsenko deleted the fix/skip-s3-reupload-1397 branch February 17, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants