-
Notifications
You must be signed in to change notification settings - Fork 275
Parallel checkpointing #1810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v5.0
Are you sure you want to change the base?
Parallel checkpointing #1810
Conversation
|
Here's the code health analysis summary for commits Analysis Summary
Code Coverage Report
|
…nt folder deleted
|
Checkpointing is not working for me (tested on macOS). To test it I created a new configuration file using Then I ran the workflow on a dataset with 19 images: This is what I see inside the temp directories: |
|
@nfahlgren Thank you for catching that, I think I broke it at some point around when I switched the order of The fix that I went with is to define a new attribute Testing with a workflow that should find 38 images: |
Describe your changes
Implements parallel checkpointing using a new attribute to
WorkflowConfigandjupyterconfigcalledcheckpoint. Most changes are downstream of having changedparallel.workflow_inputsfrom a function into a class so that when it is initialized with the cli arguments from a parallel job it uses those to touch a dummy file to say the workflow was attempted. Whenworkflow_inputs.resultis used (the getter is called) the checkpointing file is updated to "complete". Those changes are used throughout parallel to allow for 1: checkpointing in the use case where your jobs all get killed by some server error or whatever and 2: continuous/interim analysis where you are adding images over time and running the same workflow on them but don't need to reanalyze the images that were already analyzed.Maybe
checkpointdirectory should be renamed something less likely to exist like_PCV_PARALLEL_CHECKPOINT_? I'd hate to accidentally delete someone's folder because they didn't read the docs.Type of update
This is a new feature.
Associated issues
Closes #1807
Additional context
See #1807 for some comments and examples on the implementation. This builds on changes from the
jupyter-parallelizationbranch from PR 1803.Pretty sure the data frame creation on line 71:84 is less than ideal, someone with more pandas/json.load experience might have a much better way to do that (ie, reviewer please help)
For the reviewer
See this page for instructions on how to review the pull request.
plantcv/mkdocs.ymlupdating.md