Add resume support to HybridTop simulation unit#1774
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1774 +/- ##
==========================================
- Coverage 95.54% 93.18% -2.37%
==========================================
Files 197 197
Lines 17046 17060 +14
==========================================
- Hits 16287 15897 -390
- Misses 759 1163 +404
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
| or (system.getNumForces() != sampler_system.getNumForces()) | ||
| or (system.getNumParticles() != sampler_system.getNumParticles()) | ||
| or (system.getNumConstraints() != sampler_system.getNumConstraints()) |
There was a problem hiding this comment.
I like this check but could we go even further and check more things, like comparing the seralised xmls of the systems that should compare every force and parameter in the system, assuming ordering is consistent?
There was a problem hiding this comment.
Ordering is not consistent unfortunately - the thermodynamic state object adds/pops forces, which reorders things.
There was a problem hiding this comment.
Do you have a particular edge case in mind here? I went for a lightweight check as purely as set of guardrails for our users - i.e. the likelihood of a clash like this is rather low.
There was a problem hiding this comment.
If the order changes, could we zip over the matched forces and compare the seralised versions of just that force? If we are doing this, why not check everything if its possible to be sure?
One case that this might miss is if the openmm version is different between restarts should we be worried about this?
| trajectory = shared_path / output_settings.output_filename | ||
| checkpoint = shared_path / output_settings.checkpoint_storage_filename | ||
|
|
||
| if trajectory.is_file() and checkpoint.is_file(): |
There was a problem hiding this comment.
Do we need to check if there is anything in the files, what happens if they were made but are empty?
There was a problem hiding this comment.
That's already handled by multisate reporter.
There was a problem hiding this comment.
To explain this a little bit more - in that case the reporter/sampler will fail saying it can't read the file.
One option would be to recover by turning off the restart and trying from scratch again in the unit, in practice the only way you could do that is by deleting files - which could be dangerous. Instead, given how rare this situation should be, the way I think it should recover is by executing a fresh version of the unit again, that will give you fresh new inputs that aren't likely corrupted without the problematic case of deleting files.
There was a problem hiding this comment.
Okay as long as that is covered somewhere and an error is raised thats fine, even if the user has to delete the files and run again in the meantime before we have an automated solution.
There was a problem hiding this comment.
I'll add a specific test for this - it's easy enough to make it fail on an empty file.
There was a problem hiding this comment.
Can there be any cases where one of the two files is corrupted , leading to a failure/things being out of sync, or is that also handled by the multistatereporter?
There was a problem hiding this comment.
files are empty
If the files are actually empty, then it will fail with an OSError
checkpoint file is missing
This is actually a case that shouldn't happen - we shouldn't be in a situation where either file is missing, I have added a check in _check_restart to cover that case and raise an IOError.
one of the two files is corrupted
If either of the files are corrupted & unreadable then you will get an OSError (i.e. NetCDF won't be able to recognise the file).
There's no real way to look for bit flips (we don't store checksums or anything).
checkpoint is not associated with the trajectory
That's checked by multistatereporter (I won't add a test for it).
|
pre-commit.ci autofix |
|
🚨 API breaking changes detected! 🚨 |
Blocked by #1773
Fixes #1722
TODO:
Checklist
newsentry, or the changes are not user-facing.pre-commit.ci autofixbefore requesting review.Manual Tests: these are slow so don't need to be run every commit, only before merging and when relevant changes are made (generally at reviewer-discretion).
Developers certificate of origin