Produce better reduced data in nightly runs#231
Conversation
| def test_pipeline_save_data_to_disk(workflow, output_folder: Path): | ||
| workflow = powder.with_pixel_mask_filenames(workflow, []) |
There was a problem hiding this comment.
Is this the fixture you mentioned? If it leads to some unintentional test coupling/interaction it is worth investigating that better. What is the fixture scope? Should workflows from the fixture get copied instead of modified directly?
There was a problem hiding this comment.
The problem is not coupling between tests. It is this parametrised fixture:
It means that all tests depending onworkflow run once for each detector name. And they will override previous files. So only the file written in the last test to run will remain.(Note that the
workflow fixture is function-scoped. So there is no issue with coupling modifications from functions.)
IMHO, the change in this PR makes sense and makes the outcome predictable.
There was a problem hiding this comment.
👍 ok I misunderstood, so it was just that they all write to the same filename!
| def test_pipeline_save_data_to_disk(workflow, output_folder: Path): | ||
| workflow = powder.with_pixel_mask_filenames(workflow, []) |
There was a problem hiding this comment.
The problem is not coupling between tests. It is this parametrised fixture:
It means that all tests depending onworkflow run once for each detector name. And they will override previous files. So only the file written in the last test to run will remain.(Note that the
workflow fixture is function-scoped. So there is no issue with coupling modifications from functions.)
IMHO, the change in this PR makes sense and makes the outcome predictable.
There was a problem hiding this comment.
The new test runs on all CI and locally, right? That means that we have to download the large files now. How about only running this test in nightly? Or if it's too risky with detecting broken tests too late, how about using a small file in regular CI and a large file in a nightly run?
There was a problem hiding this comment.
I don't see having to download the large file locally as a big issue. With pooch, you probably only have to do this once. After that, the test runs quickly.
I was more concerned about having to download it every time on CI but I don't see a way around it.
Because of the above, I prefer to keep things simple?
There was a problem hiding this comment.
The test will also take a little while with a larger file. But I get the argument about simplicity.
In the nightly runs, we upload an artifact which contains a reduced CIF file from the Dream Geant4 simulation.
The data we used to save was:
endcap_forwarddetector (because the test was using a parametrized fixture and the last file that was written was from the endcap, overwriting files from previous banks)As a result, there has very little or no signal, and looked nothing like a d-spacing spectrum:

Here, we use the data files with more events, and use the

mantledetector bank.We also increase the number of bins from 200 to 2000, to get something that analysis hopefully can make better use of.
This was discovered by @AndrewSazonov