HDDS-14070. set statemachine ready on election performed for follower by sumitagrawl · Pull Request #9671 · apache/ozone

sumitagrawl · 2026-01-26T20:01:54Z

What changes were proposed in this pull request?

Exit of safemode status for follower when all rules are satisfied. This is as:

For leader applyTransaction completion is implicit before being choosen as leader.
For follower, exit safe mode rule as transaction sync and apply transaction keeps happening from leader node.

This change in reference to below point in JIRA HDDS-5263:
But once the SCM Ratis server started it will replay logs from Transactioninfo last applied Index, so after that I see all pipelines are removed. (might be due to close pipeline)

So this have no issue for leader node once choosen during leader election. And for follower node, no need follow same restriction as dynamically the pipeline can be closed on runtime and same can be created by leader continuously.
This check is present during startup to ensure SCM is writable and allow time to create pipeline as best effort and its not true always for the case of pipeline closure later on due to system unhealthy or other reason (System never moves back to Safemode later on as current behavior for any dynamic change later on).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14070

How was this patch tested?

impact of existing test case
added new integration test case for follower exit validation (verified before fix that it never exit)

szetszwo

@sumitagrawl , thanks for working on this! Please see the comments inlined.

szetszwo · 2026-02-01T17:34:13Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java

+    if (!refreshedAfterLeaderReady.get()) {
+      // refresh and validate safe mode rules if it can exit safe mode
+      // if being leader, all previous term transactions have been applied
+      // if other states, just refresh safe mode rules, and transaction keeps flushing from leader
+      // and does not depend on pending transactions.
+      refreshedAfterLeaderReady.set(true);


This is not atomic. Use

if (refreshedAfterLeaderReady.compareAndSet(false, true)) {

szetszwo · 2026-02-01T17:35:23Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java

    if (currentLeaderTerm.get() == term) {
-      // Means all transactions before this term have been applied.
      // This means after a restart, all pending transactions have been applied.
-      // Perform
-      // 1. Refresh Safemode rules state.
-      // 2. Start DN Rpc server.
      if (!refreshedAfterLeaderReady.get()) {


Similarly, it needs compareAndSet.

BTW, why refreshedAfterLeaderReady won't be set to false after set to true?

We do not move back to safemode once it exit, even later rules are not satisfied later. that is the reason we are not setting to false.

This is the best effort on startup only to wait till system moves to healthy.

szetszwo · 2026-02-01T17:37:13Z

...dds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/safemode/TestSCMSafeModeManager.java

+    StorageContainerManager mockScmManager = mock(StorageContainerManager.class);
+    SCMHAManager mockScmhaManager = mock(SCMHAManager.class);
+    when(mockScmManager.getScmHAManager()).thenReturn(mockScmhaManager);
+    SCMRatisServer mockScmRatisServer = mock(SCMRatisServer.class);
+    when(mockScmhaManager.getRatisServer()).thenReturn(mockScmRatisServer);
+    SCMStateMachine mockScmStateMachine = mock(SCMStateMachine.class);
+    when(mockScmRatisServer.getSCMStateMachine()).thenReturn(mockScmStateMachine);
+    when((mockScmStateMachine.isRefreshedAfterLeaderReady())).thenReturn(true);
+    scmContext = new SCMContext.Builder().setSCM(mockScmManager).build();


Please don't use mock. We need to a real cluster test which test SCM changing leader multiple times.

added integration test case also

priyeshkaratha

Thanks @sumitagrawl for the patch. Changes LGTM

szetszwo

+1 the change looks good.

sumitagrawl added 2 commits January 26, 2026 11:43

HDDS-14070. set statemachine ready on election performed for follower

7c1ece2

test case added

42dee1d

sumitagrawl requested a review from szetszwo January 26, 2026 20:02

sumitagrawl marked this pull request as ready for review January 27, 2026 17:40

szetszwo reviewed Feb 1, 2026

View reviewed changes

sumitagrawl added 2 commits February 2, 2026 11:25

fix review comment

65e1245

fix review comment and test case

e791b20

priyeshkaratha approved these changes Feb 3, 2026

View reviewed changes

szetszwo approved these changes Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-14070. set statemachine ready on election performed for follower#9671

HDDS-14070. set statemachine ready on election performed for follower#9671
sumitagrawl wants to merge 4 commits intoapache:masterfrom
sumitagrawl:HDDS-14070

sumitagrawl commented Jan 26, 2026 •

edited

Loading

Uh oh!

szetszwo left a comment

Uh oh!

szetszwo Feb 1, 2026

Uh oh!

sumitagrawl Feb 2, 2026

Uh oh!

szetszwo Feb 1, 2026

Uh oh!

sumitagrawl Feb 2, 2026

Uh oh!

sumitagrawl Feb 2, 2026

Uh oh!

szetszwo Feb 1, 2026

Uh oh!

sumitagrawl Feb 2, 2026

Uh oh!

priyeshkaratha left a comment

Uh oh!

szetszwo left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sumitagrawl commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

szetszwo Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

sumitagrawl Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

szetszwo Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

sumitagrawl Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

sumitagrawl Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

szetszwo Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

sumitagrawl Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

priyeshkaratha left a comment

Choose a reason for hiding this comment

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sumitagrawl commented Jan 26, 2026 •

edited

Loading