TPC QC: Add GPUErrorQA task#14397
Conversation
|
REQUEST FOR PRODUCTION RELEASES: This will add The following labels are available |
wiechula
left a comment
There was a problem hiding this comment.
Thanks @ariedel-cern . Please check for compiler warnings and fix them.
Detectors/TPC/qc/src/GPUErrorQA.cxx
Outdated
| auto f = std::unique_ptr<TFile>(TFile::Open(filename.data(), "recreate")); | ||
| TObjArray arr; | ||
| arr.SetName("GPUErrorQA_Hists"); | ||
| for (const auto& [name, hist] : mMapHist) { |
There was a problem hiding this comment.
Avoid warning.
| for (const auto& [name, hist] : mMapHist) { | |
| for ([[maybe_unused]] const auto& [name, hist] : mMapHist) { |
There was a problem hiding this comment.
Added. Looking at other PRs, the full CI seems to be broken at the moment. Testing locally, I see no compiler warnings.
|
Error while checking build/O2/fullCI_slc9 for 41445c2 at 2025-06-12 14:55: Full log here. |
|
Error while checking build/O2/fullCI_slc9 for 46787b5 at 2025-06-15 21:24: Full log here. |
Detectors/TPC/qc/src/GPUErrorQA.cxx
Outdated
| // get gpu error names | ||
| // copied from GPUErrors.h | ||
| static std::unordered_map<uint32_t, const char*> errorNames = { | ||
| #define GPUCA_ERROR_CODE(num, name, ...) {num, GPUCA_M_STR(name)}, |
There was a problem hiding this comment.
Can't you just include GPUErrors.h instead of copy & pasting the code?
There was a problem hiding this comment.
Hi David, the errorNames were defined in GPUError.cxx, not in the header. I was not sure if this was done on purpose, so that's why I copied the code.
I moved the definition now to the header, so I can include it in the QA task.
There was a problem hiding this comment.
Ah, I was confused due to your comment // copied from GPUErrors.h. Indeed, the definition was in the CXX, and I'd prefer it to stay in the CXX. No need to full the header file with the strings.
Could you put a
#ifndef GPUCA_GPUCODE_DEVICE
static std::unordered_map<uint32_t, const char*> errorNames
#endif
in GPUErrors.h, but leave the actual definition in the cxx.
And please also protect the <unordered_map> include in the header with the same ifdef.
Detectors/TPC/qc/src/GPUErrorQA.cxx
Outdated
| // for convienence, label each bin with the error name | ||
| for (size_t bin = 1; bin < mMapHist["ErrorCounter"]->GetNbinsX(); bin++) { | ||
| auto const& it = o2::gpu::errorNames.find(bin); | ||
| mMapHist["ErrorCounter"]->GetXaxis()->SetBinLabel(bin, it->second); |
There was a problem hiding this comment.
Just in case we will have a gap in the error numbers in the future, I would propose to check if it is a valid error Name.
|
Error while checking build/O2/fullCI_slc9 for 7b5264a at 2025-06-17 10:44: Full log here. |
davidrohr
left a comment
There was a problem hiding this comment.
The new approach looks good in general, but see the 2 comments below:
GPU/GPUTracking/Global/GPUErrors.cxx
Outdated
| memset(mErrors, 0, GPUCA_MAX_ERRORS * sizeof(*mErrors)); | ||
| } | ||
|
|
||
| static std::unordered_map<uint32_t, const char*> errorNames = { |
There was a problem hiding this comment.
Can you move this as static variable inside getErrorNames, so that it is not a global symbol?
Alternatively, please put it in o2::gpu::internal namespace.
| #include "GPUCommonDef.h" | ||
| #include "GPUDefMacros.h" | ||
| #include <unordered_map> | ||
|
|
There was a problem hiding this comment.
Still, you need to protect unordered_map with #ifndeg GPUCA_GPUCODE_DEVICE.
We do not include any std headers in GPU kernel code.
And I think you don't need GPUDefMacros.h, do you?
There was a problem hiding this comment.
Done. I also fixed up the includes.
|
Error while checking build/O2/fullCI_slc9 for 8f676bc at 2025-06-17 18:10: Full log here. |
|
Hi @davidrohr , thanks a lot for all the feedback. Is this PR now OK to be merged? I think the 2 failed checks are due to some issue with O2Physics. |
|
Hi @ariedel-cern : The FullCI error is genuine, you also have to #ifdef-protect |
Following up on #13964