Skip to content

Conversation

@vikaskurapati
Copy link
Contributor

@vikaskurapati vikaskurapati commented Jan 26, 2026

On SuperMUC Phase 2, there were two double-free or Invalid Pointer Access errors, which caused the program to exit with a Segmentation fault (SIGSEGV). The issue was in freeMem method. It was crashing when we tried to destruct the map this->currentMemoryToSizeMap if it had already been destructed in another destructor. I modified the checks to return when we encounter empty objects.

@vikaskurapati vikaskurapati requested review from davschneller and removed request for davschneller January 26, 2026 08:27
@vikaskurapati vikaskurapati changed the title Chcek if currentMemoryToSizeMap is a nullptr for SYCL memory interface. Check if currentMemoryToSizeMap is a nullptr for SYCL memory interface. Jan 26, 2026
@vikaskurapati vikaskurapati force-pushed the vikas/fix-supermuc-segfault branch from 0ceee95 to 9701510 Compare January 26, 2026 09:31
@vikaskurapati vikaskurapati changed the title Check if currentMemoryToSizeMap is a nullptr for SYCL memory interface. Fix destructor Seg Faults for SYCl Jan 26, 2026
@vikaskurapati vikaskurapati force-pushed the vikas/fix-supermuc-segfault branch from 9701510 to 135acc1 Compare January 26, 2026 09:40
formatting
@vikaskurapati vikaskurapati force-pushed the vikas/fix-supermuc-segfault branch from 135acc1 to 00ba5f1 Compare January 26, 2026 09:42
@davschneller davschneller changed the title Fix destructor Seg Faults for SYCl Fix destructor Seg Faults for SYCL Jan 26, 2026
Copy link
Contributor

@davschneller davschneller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM (IMO almost merge-ready); but might it be possible to get a more exact picture of what's going on?
I.e. the probably statistics get deleted—and there's still memory to be de-allocated.
Maybe we could move the deletion of the statistics to the destructor?


// Use the first device context to free memory
DeviceContext* context = this->availableDevices[0];
if (!context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add brackets

}

// Use the first device context to free memory
DeviceContext* context = this->availableDevices[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the current device ID

@vikaskurapati
Copy link
Contributor Author

Generally LGTM (IMO almost merge-ready); but might it be possible to get a more exact picture of what's going on? I.e. the probably statistics get deleted—and there's still memory to be de-allocated. Maybe we could move the deletion of the statistics to the destructor?

On SuperMUC NG phase 2, I was running a few cases to study scaling, but it always ended with a Segfault. The backtrace looked like this

Thread 1 "SeisSol_Release" received signal SIGSEGV, Segmentation fault.

0x00001555491512be in device::ConcreteAPI::freeMem(void*) () from /dss/dsshome1/00/ge26cet3/seissol/build-pvc/submodules/Device/libdevice_Release_oneapi.so

#0  0x00001555491512be in device::ConcreteAPI::freeMem(void*) () from /dss/dsshome1/00/ge26cet3/seissol/build-pvc/submodules/Device/libdevice_Release_oneapi.so

#1  0x00000000004aea54 in seissol::writer::EnergyOutput::~EnergyOutput() ()

#2  0x000000000044b505 in seissol::SeisSol::~SeisSol() ()

#3  0x00000000004341aa in main ()

I thought that the currentMemoryToSizeMap was being destroyed and then being used again in freeMem method. I tried adding a check for that too in the if(devPtr != nullptr) condition, but then CI tests involving acpp complained that it wasn't a pointer, and I had to modify it to what it is now to get it fixed.

Probably, you are right, the API is still new to me, if you think that is a better solution, we could try that.

@vikaskurapati vikaskurapati force-pushed the vikas/fix-supermuc-segfault branch from 16354c3 to 63d75dc Compare January 26, 2026 13:44
@vikaskurapati vikaskurapati force-pushed the vikas/fix-supermuc-segfault branch from 63d75dc to 3f490be Compare January 26, 2026 13:48
@davschneller davschneller merged commit 772be05 into master Jan 27, 2026
17 checks passed
@davschneller davschneller deleted the vikas/fix-supermuc-segfault branch January 27, 2026 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants