Skip to content

Conversation

@Jjm321814
Copy link
Contributor

A few bugs existed in old versions of blip reco

  1. Running blipreco on mostly empty event displays (somewhat common in simple MC productions of single neutrino events) would lead to empty hit collections being accessed and a subsequent memory issue/crash (Not specifically a segfault though).
  2. A few blip-related hitcluster variables were not copying quite correctly. Effected variables are blip.clusters[i].blipID, and blip.clusters[i].isMatched, which both had default values. The independently saved clusters objects were correctly updated.
  3. The MC truth PDG values for blip-energy deposits have puzzling labels and need their evaluation looked into.

The first bug was related to a hardcoded hitHandleGH label, which looked at gaushit rather than specialBlipGausHit. On ~emtpy events the lower threshold (specialBlipGausHit) will have a few entries, but the code uses hitHandleGH for truth matching, and that higher threshold hit collection can be empty. That leads to crashes.
For now the hardcoded label is just updated to accommodate specialBlipGausHit.

The second bug is not critical, as access hitclusters this way is done through a blip, so the ID is the same as the blip you are looking at and the clusters included must be matched (into this blip). The hitcluster collection values are updated after the struct assignment (copy constructor), so I updated the blip alg to also update the blip.Cluster[i].blipID, and isMatched.

The third bug I am still digging into. I was worried that the switch from sim::energy_deposition to sim::channel->IDE was causing issues, but in the new arrangement (with an optional interface to use either) gives consistent results between them.

I have some notes I can build into slides for the first two but would like to spend a little more time on the third bug.

@Jjm321814 Jjm321814 self-assigned this Jan 20, 2026
@Jjm321814 Jjm321814 added bug Something isn't working reco1/reco2 Reconstruction labels Jan 20, 2026
@linyan-w linyan-w moved this to Waiting on Reviewer in SBND 2025 Fall Production Jan 21, 2026
@Jjm321814
Copy link
Contributor Author

https://docs.google.com/presentation/d/1igXPcS_Qst99layiHuMEZiUxz4EHMxGHutf7Mr-jbT0/edit?slide=id.g3bbdbff95f2_0_1#slide=id.g3bbdbff95f2_0_1 here is a useful overview set of slides for these changes. Sorry for the many commits most are just adding/removing debug.

For the hardcoded gaushit parameter the change in this PR is sufficient for any standard running someone does/for Fall production. Longterm I think these should not be hardcoded and just use the fcl-provided hit collection, and we assume the user knows what they are doing.

The existing MC truth back tracking works. I just added a few additional breaks in MC truth blips. I found a way to get labeling I am totally happy with but the performance hit is too large for anything other than a small production aimed specifically at blips.

I am happy with the current state of the MCTruth matching in this PR now. I promise some future slides looking at how the MC matching is doing in Fall production pre/post this PR in the next ~week

Copy link
Member

@henrylay97 henrylay97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jacob, thanks for the PR. I have a few comments.

My main concern is around the fix for the empty event problem. Rather than finding the root cause of the problem, the solution seems to be to provide a second vector that you hope won't be empty as well.

I would rather the actual cause of the segfault be found and prevented. This would also remove any need for hard coded module labels.

The rest looks good to me, happy to discuss more offline!

Comment on lines 388 to 389
if(evt.getByLabel("specialblipgaushit",hitHandleGH)) art::fill_ptr_vector(hitlistGH, hitHandleGH);
else if(evt.getByLabel("gaushit",hitHandleGH)) art::fill_ptr_vector(hitlistGH, hitHandleGH);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module labels should never be hard coded. I appreciate this was true before this PR but given you're touching the code at this point it needs fixing.

I also don't understand the use of two different hit collections - only one of them will have been an input to the BlipReco and so only one of them is a valid input for the truth matching. If the code crashes on an empty vector somewhere then the fix should be a catch to stop that happening rather than the providing of a different vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the hitHandleGH, and will remove the other spots that gaushit gets a special default privileging. It will just be the user's responsibility to make sure they have ran all the necessary inputs

… some branches to assume the hitHandle the user provided was run through truthmatching and pandoraTrack
@Jjm321814
Copy link
Contributor Author

Thanks for the suggestions, most are in. Let me check compilation and a simple output in a typical Fall production run

@Jjm321814
Copy link
Contributor Author

Merged in develop and the compiler works and entries in the reco2 files look good.

After getting the normal successful processing table I got this error
*** Error in `lar': corrupted size vs. prev_size: 0x000000002a0308b0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x80c37)[0x7fb0b2cb2c37]
/lib64/libc.so.6(+0x8120e)[0x7fb0b2cb320e]
/lib64/libc.so.6(+0x39ce9)[0x7fb0b2c6bce9]
/lib64/libc.so.6(+0x39d37)[0x7fb0b2c6bd37]
/lib64/libc.so.6(__libc_start_main+0xfc)[0x7fb0b2c5455c]
lar[0x4012d0]
Followed by a long memory map. I don't think this is my fault.

@Jjm321814
Copy link
Contributor Author

Fixed the above error.
After merging in develop it became v10_14_02_01 and my localproducts setup script was still asking for v10_14_02. Fixing that removed the error on exit.

@henrylay97
Copy link
Member

Thanks so much for responding so quickly. Looking much better.

I have a question. Will this now not segfault on empty events? Regardless of the label being applied.

I note you say: "I have removed the hitHandleGH, and will remove the other spots that gaushit gets a special default privileging. It will just be the user's responsibility to make sure they have ran all the necessary inputs"

Does this mean if the user mislabels the input we get a segfault? If so we need to avoid that and instead have a more graceful failure mode :)

@Jjm321814
Copy link
Contributor Author

I'll check that now. Noting down the possible combinations of wrong label:

  1. Hitlabel doesn't exist
  2. Track label and hit label are from different sets (regular pandora and special blip hits for example)
  3. MC Truth Matching done on a different hit set
  4. Track label doesn't exist
  5. MC truth label doesn't exist
    I'll check each combination and report back how it exits

@Jjm321814
Copy link
Contributor Author

Jjm321814 commented Jan 29, 2026

We don't ever segfault. But we do have some cases where a normal output will be produced and the user must be careful that the fcl used to make it was correct.

For case 1

%MSG-s ArtException:  PostEndJob 29-Jan-2026 09:25:51 CST ModuleEndJob
---- EventProcessorFailure BEGIN
  EventProcessor: an exception occurred during current event processing
  ---- ScheduleExecutionFailure BEGIN
    Path: ProcessingStopped.
    ---- ProductNotFound BEGIN
      Found zero products matching all selection criteria
        C++ type: std::vector<recob::Hit>
        Module label: 'specialblipgaushitIdontexist'
        Product instance name: ''
        Process name: (empty)
      The above exception was thrown while processing module BlipRecoProducer/blipreco run: 1001 subRun: 18 event: 1
    ---- ProductNotFound END
    Exception going through path reco2
  ---- ScheduleExecutionFailure END
---- EventProcessorFailure END
---- FatalRootError BEGIN
  Fatal Root Error: TTree::SetEntries
  Tree branches have different numbers of entries, eg EventAuxiliary has 0 entries while sim::AuxDetSimChannels_genericcrt__G4. has 13 entries.
  ROOT severity: 2000
---- FatalRootError END

For case 2
The code seems to just run normally. The track input is really just to skip a few hits that are not-blip-like. Different indexing in the two track associations may cause some hits that are really blips to be skipped. This would be a tricky bug to catch so its important the fcl are ironed out.

For case 3:
Runs normally. Inspection of output should give you some clue that things aren't working well. But again this is a be careful with fcl case.

For case 4:
This one actually just runs normally and I would expect basically correct outputs. It is mildly slower, but not enough for me to notice in the time I did this check. It will just run over extra hits that form tracks and notice they don't look blip like, so it will end up dropping the cluster.

For case 5:

%MSG-s ArtException:  PostEndJob 29-Jan-2026 09:54:03 CST ModuleEndJob
---- EventProcessorFailure BEGIN
  EventProcessor: an exception occurred during current event processing
  ---- ScheduleExecutionFailure BEGIN
    Path: ProcessingStopped.
    ---- LogicError BEGIN
      Invalid FindMany
      ---- ProductNotFound BEGIN
        Found zero products matching all selection criteria
          C++ type: art::Assns<recob::Hit,simb::MCParticle,anab::BackTrackerHitMatchingData>
          Module label: 'blipgaushitTruthMatchwjejefiej'
          Product instance name: ''
          Process name: (empty)
      ---- ProductNotFound END
      Attempt to use a FindMany where the underlying art::Assns product was not found.The above exception was thrown while processing module BlipRecoProducer/blipreco run: 860 subRun: 71 event: 1
    ---- LogicError END
    Exception going through path reco2
  ---- ScheduleExecutionFailure END
---- EventProcessorFailure END
---- FatalRootError BEGIN
  Fatal Root Error: TTree::SetEntries
  Tree branches have different numbers of entries, eg EventAuxiliary has 0 entries while sim::AuxDetSimChannels_genericcrt__G4. has 22 entries.
  ROOT severity: 2000
---- FatalRootError END
%MSG


Copy link
Member

@henrylay97 henrylay97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant. Thanks for the detailed checks Jacob! There is always some need for analysers / devs / release management to be careful with fcl labels so the responses to 2 and 3 are reasonable. Overall I'm happy that this will fail gracefully if at all.

As long as you're happy with the production fcl state then I'm happy with this now :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working reco1/reco2 Reconstruction

Projects

Status: Waiting on Reviewer

Development

Successfully merging this pull request may close these issues.

4 participants