Skip to content

Conversation

@cmeesters
Copy link
Collaborator

@cmeesters cmeesters commented Nov 14, 2025

porting the workflow to a new cluster. Turned out, that many design decisions very faulty. Particularly, the necessity to have so many absolute paths is gone.

Note, the porting and fixing is only half-done: The rest will be more fine-grained.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added ZINC and ENAMINE database integration with mirror support and URL validation
    • Introduced re-screening functionality with metadata tracking
    • Enhanced logging and error diagnostics throughout workflow execution
  • Chores

    • Updated HPC (SLURM) resource configurations for improved cluster scheduling
    • Reorganized output directory structure for streamlined data management
    • Updated toolchain and environment version requirements

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 14, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This PR restructures a Snakemake-based drug screening workflow, replacing configuration-driven output directories with hardcoded "results" paths, refactoring ZINC download logic from HTTP requests to curl-based chunking with local assembly, introducing new docking preparation rules with spacing calculations, and adding comprehensive logging across preparation and analysis rules.

Changes

Cohort / File(s) Summary
Configuration & Profiles
config/config.yaml, profiles/Mogon-NHR/config.yaml
Config restructuring: introduces EXPERIMENT_NAME, removes directory configs (PREPARED_DATA_DIR, etc.), adds ZINC-specific settings (ZINC_MIRROR, TRANCHES), adds ENAMINE_INPUT and RESCREENING metadata, updates TARGETS format with PDB_ID/CHAINS examples. Profile adds SLURM executor, account, and runtime configurations.
Workflow Core
workflow/Snakefile
Removes global config vars (INPUT_DIR, PREPARED_LIGAND_DIR, OUTPUT_DIR, TEMP_DATA_DIR, SUBSET, RESCREENING_TARGETS), replaces OUTPUT_DIR paths with hardcoded "results", deletes localrules block, reorders rule includes.
Analysis Rules
workflow/rules/analyse.smk
Renames librarylibrary_files function, introduces robust ZINC URL checking with check_zinc_url and url_reachable helpers supporting mirrors and user overrides, refactors path construction to use "docking"/"rescreening" directories, expands ZINC_INPUT dataset/name combinations.
Docking Rules
workflow/rules/docking.smk
Adds get_spacing() parser for grid files, introduces prepare_docking_local and prepare_docking_ligand local rules, refactors docking rule to consume prepared inputs with dynamic spacing calculation, updates aggregate_in and mergeDocking for new directory structure, switches MPI launcher to mpiexec.
Preparation Rules
workflow/rules/preparation.smk
Relocates outputs from INPUT_DIR/TMP_DIR to project-relative paths ("PDB", "receptor", "scratch", "prepared", "minimized", "grid", "library"), adds comprehensive logging to all rules, converts several outputs to temporary (temp/scratch-based), updates rule inputs to reference new identifiers, introduces explicit error handling.
Download Scripts
workflow/scripts/ENAMINEdownload.py, workflow/scripts/ZINCdownload.py
ENAMINEdownload: replaces INPUT_DIR with DATABASE base path. ZINCdownload: complete rewrite from requests-based to curl-based chunking, adds itertools for tranche/subset combinations, implements local hash tracking (hashes.txt), adds per-chunk download with SHA-256 checksums, implements assembly logic to decompress/concatenate chunks.
Preparation Scripts
workflow/scripts/prepareDocking.py, workflow/scripts/prepareReceptor.py
prepareDocking: breaks output_directory assignment (references undefined snakemake.output). prepareReceptor: adds tempfile import, gzip decompression for .gz inputs, enhanced debugging logs for unzipping/chain handling, redirects stdout/stderr to log.

Sequence Diagram(s)

sequenceDiagram
    participant Config
    participant Snakefile
    participant Rules
    participant Scripts
    
    Note over Config,Scripts: OLD FLOW: Config-driven Paths
    Config->>Snakefile: OUTPUT_DIR, INPUT_DIR, TMP_DIR
    Snakefile->>Rules: Global vars (INPUT_DIR, OUTPUT_DIR, etc.)
    Rules->>Scripts: Path references via config
    Scripts->>Scripts: Write to TMP_DIR, INPUT_DIR, OUTPUT_DIR
    
    Note over Config,Scripts: NEW FLOW: Hardcoded Relative Paths
    Config->>Snakefile: EXPERIMENT_NAME, ZINC_MIRROR
    Snakefile->>Rules: Hardcoded "results", "scratch", "docking"
    Rules->>Rules: prepare_docking_local/ligand (new)
    Rules->>Scripts: Direct path strings
    Scripts->>Scripts: Write to "scratch", "prepared", "minimized", "grid"
Loading
sequenceDiagram
    participant Config
    participant ZINCdownload
    participant ZINC_Server
    participant Local_Storage
    
    Note over Config,Local_Storage: OLD: Single HTTP Request
    Config->>ZINCdownload: dataset, name
    ZINCdownload->>ZINC_Server: requests.get(URL/file.pdbqt.gz)
    ZINC_Server-->>ZINCdownload: Single large file
    ZINCdownload->>Local_Storage: Save to INPUT_DIR/ZINC/...
    
    Note over Config,Local_Storage: NEW: Chunked Download with Assembly
    Config->>ZINCdownload: WEIGHT, LOGP, REACT, PURCHASE, PH, CHARGE
    ZINCdownload->>ZINCdownload: Generate tranche/subset combinations
    loop For Each Chunk
        ZINCdownload->>ZINC_Server: curl (with mirror support)
        ZINC_Server-->>ZINCdownload: Chunk file
        ZINCdownload->>Local_Storage: Store chunk + SHA-256 (hashes.txt)
    end
    ZINCdownload->>ZINCdownload: Decompress + concatenate chunks
    ZINCdownload->>Local_Storage: Final gzip assembly + checksum
    ZINCdownload->>Local_Storage: Cleanup chunk files
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Critical areas requiring attention:
    • workflow/scripts/prepareDocking.py: Breaking change in output_directory assignment (snakemake.output. is incomplete/undefined); this will cause runtime failures if the output directory is accessed.
    • workflow/rules/docking.smk: New get_spacing() function with WorkflowError handling requires validation of grid file parsing logic and edge cases.
    • workflow/scripts/ZINCdownload.py: Complete rewrite with curl-based chunking, local hash tracking, and assembly logic is complex and requires verification of download robustness, checksum handling, and cleanup on failure.
    • workflow/rules/preparation.smk: Pervasive path restructuring with new directory hierarchy; all output paths must be verified for consistency across dependent rules.
    • workflow/rules/analyse.smk: New ZINC URL checking with redirects and fallback logic; verify user prompt handling and fallback to local data.

Possibly related PRs

Poem

🐰 Through scratch and docking dirs we hop,
With ZINC chunks downloaded—curl won't stop!
Grid spacing parsed, logs now run deep,
Results hardcoded, our paths to keep.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/profile_and_localizing

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5443dad and 0c46ed2.

📒 Files selected for processing (10)
  • config/config.yaml (2 hunks)
  • profiles/Mogon-NHR/config.yaml (1 hunks)
  • workflow/Snakefile (4 hunks)
  • workflow/rules/analyse.smk (20 hunks)
  • workflow/rules/docking.smk (2 hunks)
  • workflow/rules/preparation.smk (8 hunks)
  • workflow/scripts/ENAMINEdownload.py (1 hunks)
  • workflow/scripts/ZINCdownload.py (1 hunks)
  • workflow/scripts/prepareDocking.py (1 hunks)
  • workflow/scripts/prepareReceptor.py (2 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cmeesters cmeesters merged commit b955211 into main Nov 14, 2025
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants