HDDS-13572. SCMSnapshotProvider should not auto-create Ratis directory#9689
HDDS-13572. SCMSnapshotProvider should not auto-create Ratis directory#9689echonesis wants to merge 1 commit intoapache:masterfrom
Conversation
szetszwo
left a comment
There was a problem hiding this comment.
@echonesis , thanks a lot for working on this!
Both the ratisStorageDir and ratisSnapshotDir should only be created only if either --init or --bootstrap option is set in StorageContainerManagerStarter. Without those options, the directories should already exist. If not, we should fail the command.
Please see also the comments inlined.
| @VisibleForTesting | ||
| protected SCMSnapshotProvider newScmSnapshotProvider(StorageContainerManager storageContainerManager) { | ||
| return new SCMSnapshotProvider(storageContainerManager.getConfiguration(), | ||
| protected SCMSnapshotProvider newScmSnapshotProvider( | ||
| StorageContainerManager storageContainerManager) { |
There was a problem hiding this comment.
This method is not used in test. Let's remove @ VisibleForTesting
| public SCMSnapshotProvider(ConfigurationSource conf, | ||
| List<SCMNodeDetails> peerNodes, | ||
| CertificateClient scmCertificateClient) { |
There was a problem hiding this comment.
This method is used only in TestSCMSnapshotProvider, let's remove it and change the test.
| File snapshotDirFile = new File(snapshotDir); | ||
|
|
||
| SCMSnapshotProvider.StartupOption startupOption; | ||
| if (snapshotDirFile.exists()) { |
There was a problem hiding this comment.
Let's use
if (Files.isDirectory(Paths.get(snapshotDir))) {| wait_for_om_leader | ||
|
|
||
| echo "Deleting test keys to create tombstones" | ||
| docker exec "${om_container}" bash -c "kinit -kt /etc/security/keytabs/testuser.keytab testuser/om@EXAMPLE.COM && ozone fs -rm -R -skipTrash ofs://omservice/vol1/bucket1" | ||
| echo "Test keys deleted" | ||
|
|
||
| echo "Restarting OM after key deletion to flush tombstones to sst files" | ||
| docker restart "${om_container}" | ||
| wait_for_om_leader |
There was a problem hiding this comment.
This change seems unrelated? If yes, please revert it.
There was a problem hiding this comment.
These changes are related - the stricter Ratis directory checks affect startup timing.
The wait_for_om_leader ensures proper ordering after container restarts.
| SST Size Should Be Reduced | ||
| [Arguments] ${original_size} | ||
| ${current_size} = Get OM DB SST Files Size | ||
| Should Be True ${current_size} < ${original_size} | ||
| ... OM DB size should be reduced after compaction. Before: ${original_size}, After: ${current_size} | ||
|
|
||
| *** Test Cases *** | ||
| Testing OM DB Size Reduction After Compaction | ||
| # Test keys are already created and flushed | ||
| # Delete keys to create tombstones that need compaction | ||
| Delete Test Keys | ||
|
|
||
| # Test keys are already created, deleted and flushed by the test script | ||
| ${size_before_compaction} = Get OM DB SST Files Size | ||
|
|
||
| Compact OM DB Column Family fileTable | ||
| Compact OM DB Column Family deletedTable | ||
| Compact OM DB Column Family deletedDirectoryTable | ||
|
|
||
| ${size_after_compaction} = Get OM DB SST Files Size | ||
|
|
||
| Should Be True ${size_after_compaction} < ${size_before_compaction} | ||
| ... OM DB size should be reduced after compaction. Before: ${size_before_compaction}, After: ${size_after_compaction} | ||
| # Compaction is asynchronous, poll until size is reduced | ||
| Wait Until Keyword Succeeds 60sec 5sec SST Size Should Be Reduced ${size_before_compaction} |
There was a problem hiding this comment.
This change seems unrelated? If yes, please revert it.
There was a problem hiding this comment.
Same as #9689 (comment)
This ensures proper ordering after container restarts.
|
|
||
| // Create Ratis snapshot directory during initialization | ||
| // This ensures the directory exists before SCM starts normally | ||
| String snapshotDir = SCMHAUtils.getSCMRatisSnapshotDirectory(conf); |
There was a problem hiding this comment.
snapshotDir should be created only for scmInit or scmBootstrap. So, it should not be created here.
Thanks @szetszwo
|
Thanks for pointing out the upgrade scenario. The We probably should have --upgrade option. Any "auto-create directories" behavior is error prone and leads to data loss. |
In String snapshotDir = SCMHAUtils.getSCMRatisSnapshotDirectory(conf);
HddsUtils.createDir(snapshotDir);For I agree that adding an |
Sorry, I mean to ask -- When is the snapshotDir created for upgrade before this change? |
Before this change, both directories were auto-created in SCMSnapshotProvider constructor (L71, L74-75) via |
@echonesis , thanks for the info. For now, let's add the auto-create only for upgrade case but not inside |
Yes, that works. I'll move the auto-create logic to the upgrade case instead of |
What changes were proposed in this pull request?
This PR prevents SCMSnapshotProvider from auto-creating Ratis directory when it is not configured.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13572
How was this patch tested?
GitHub Actions CI: https://github.com/echonesis/ozone/actions/runs/21465081605