Update training workflow to handle CRISPR data with multiple cell types #85

mayasheth · 2025-06-09T15:04:57Z

No description provided.

kaybrand

I like how you pulled the model directories out of the dataset directories. It is also nice that you supply a single merged CRISPRi training dataset to the train_model rule, using the CellType column to combine distinct datasets.

These are my suggestions from my first read-through:
In line 58 of the README, please correct 'saples' to 'samples'.
You refer several times to 'ct' and 'cd' in variable names. Renaming them to 'cell_type' and 'crispr_dataset' (or another name) would make your code more readable.
On line 19 of Snakefile_training, you set config["results_dir"] to be an absolute path. But if this were already an absolute path, the results dir might end up looking something like /oak/stanford/groups/engreitz/Users/kaybrand/ENCODE_rE2G/oak/stanford/groups/engreitz/Users/kaybrand/ENCODE_rE2G/results. I advise checking if the path starts from the root, checking if directory exists, and creating it if it does not, then saving the path to config["results_dir"].

fix indexing error in utils

Maya Sheth and others added 6 commits June 6, 2025 10:07

first attempt at implementation, untested

393b751

fix some typos

a0a10c6

allow for multiple crispr datasets

fc84661

update snakefile

88396be

update to fully working

9fa5b11

update readme and example configs"

e8bc14a

kaybrand reviewed Jun 9, 2025

View reviewed changes

Maya Sheth and others added 5 commits August 7, 2025 10:29

fix pandas indexing

110e62f

Merge pull request #87 from EngreitzLab/ms_indexing_fix

e39d27e

fix indexing error in utils

fix typo

a85df94

clarify variable names

77d1e05

actually clarify variable names

6b81c64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update training workflow to handle CRISPR data with multiple cell types #85

Update training workflow to handle CRISPR data with multiple cell types #85

Uh oh!

mayasheth commented Jun 9, 2025

Uh oh!

kaybrand left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update training workflow to handle CRISPR data with multiple cell types #85

Are you sure you want to change the base?

Update training workflow to handle CRISPR data with multiple cell types #85

Uh oh!

Conversation

mayasheth commented Jun 9, 2025

Uh oh!

kaybrand left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants