HDDS-14070. set statemachine ready on election performed for follower#9671
HDDS-14070. set statemachine ready on election performed for follower#9671sumitagrawl wants to merge 4 commits intoapache:masterfrom
Conversation
szetszwo
left a comment
There was a problem hiding this comment.
@sumitagrawl , thanks for working on this! Please see the comments inlined.
| if (!refreshedAfterLeaderReady.get()) { | ||
| // refresh and validate safe mode rules if it can exit safe mode | ||
| // if being leader, all previous term transactions have been applied | ||
| // if other states, just refresh safe mode rules, and transaction keeps flushing from leader | ||
| // and does not depend on pending transactions. | ||
| refreshedAfterLeaderReady.set(true); |
There was a problem hiding this comment.
This is not atomic. Use
if (refreshedAfterLeaderReady.compareAndSet(false, true)) {| if (currentLeaderTerm.get() == term) { | ||
| // Means all transactions before this term have been applied. | ||
| // This means after a restart, all pending transactions have been applied. | ||
| // Perform | ||
| // 1. Refresh Safemode rules state. | ||
| // 2. Start DN Rpc server. | ||
| if (!refreshedAfterLeaderReady.get()) { |
There was a problem hiding this comment.
Similarly, it needs compareAndSet.
BTW, why refreshedAfterLeaderReady won't be set to false after set to true?
There was a problem hiding this comment.
We do not move back to safemode once it exit, even later rules are not satisfied later. that is the reason we are not setting to false.
This is the best effort on startup only to wait till system moves to healthy.
| StorageContainerManager mockScmManager = mock(StorageContainerManager.class); | ||
| SCMHAManager mockScmhaManager = mock(SCMHAManager.class); | ||
| when(mockScmManager.getScmHAManager()).thenReturn(mockScmhaManager); | ||
| SCMRatisServer mockScmRatisServer = mock(SCMRatisServer.class); | ||
| when(mockScmhaManager.getRatisServer()).thenReturn(mockScmRatisServer); | ||
| SCMStateMachine mockScmStateMachine = mock(SCMStateMachine.class); | ||
| when(mockScmRatisServer.getSCMStateMachine()).thenReturn(mockScmStateMachine); | ||
| when((mockScmStateMachine.isRefreshedAfterLeaderReady())).thenReturn(true); | ||
| scmContext = new SCMContext.Builder().setSCM(mockScmManager).build(); |
There was a problem hiding this comment.
Please don't use mock. We need to a real cluster test which test SCM changing leader multiple times.
There was a problem hiding this comment.
added integration test case also
priyeshkaratha
left a comment
There was a problem hiding this comment.
Thanks @sumitagrawl for the patch. Changes LGTM
szetszwo
left a comment
There was a problem hiding this comment.
+1 the change looks good.
What changes were proposed in this pull request?
Exit of safemode status for follower when all rules are satisfied. This is as:
This change in reference to below point in JIRA HDDS-5263:
But once the SCM Ratis server started it will replay logs from Transactioninfo last applied Index, so after that I see all pipelines are removed. (might be due to close pipeline)So this have no issue for leader node once choosen during leader election. And for follower node, no need follow same restriction as dynamically the pipeline can be closed on runtime and same can be created by leader continuously.
This check is present during startup to ensure SCM is writable and allow time to create pipeline as best effort and its not true always for the case of pipeline closure later on due to system unhealthy or other reason (System never moves back to Safemode later on as current behavior for any dynamic change later on).
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14070
How was this patch tested?