HDDS-14020. Fix Deadlock in Background service shutdown#9382
HDDS-14020. Fix Deadlock in Background service shutdown#9382swamirishi wants to merge 2 commits intoapache:masterfrom
Conversation
Change-Id: Iaf49655e17a210e39d64aefc88cd6d0acd38100c
c3f8c1c to
d2e9c1c
Compare
Change-Id: Ie07003d6db643b63c4be87fcd015ef72728de40d
| synchronized (BackgroundService.this) { | ||
|
|
||
| try { | ||
| semaphore.acquire(); |
There was a problem hiding this comment.
background task are added to async executor, and comes out of loop. it will be fraction of second to be in this block, and task count will be max 10s in number. So this is not a problem.
There was a problem hiding this comment.
No this would be a problem because the shutdown thread would wait for the threads to finish for the tasks already present in the queue of the threadPoolExecutor. Those have to finish first. Right now the shutdown function would always end up doing a force shutdown if the interval is small enough.
This wait would happen unnecessarily everytime.
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
Closing this one in favor of #9390 |
What changes were proposed in this pull request?
Currently both background task and shutdown acquire a synchronized lock which can lead to a deadlock on shutdown which means the threadPool shutdown takes time unnecessarily. The shutdown thread is waiting for the task to finish while acquiring a synchronized lock and task requires a lock to finish which is not possible.
The synchronized block on task was taken to ensure there are no parallel tasks that run together. Introducing a semaphore with 1 would solve this issue.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14020
How was this patch tested?
Unit test in Defrag service finishes faster after this which takes a lot of time to shutdown otherwise