Skip to content

Conversation

@kgeisz
Copy link
Contributor

@kgeisz kgeisz commented Jan 22, 2026

https://issues.apache.org/jira/browse/HBASE-29826

This pull request makes a change that deletes the bulk load output directory when a continuous incremental backup is complete.

Summary

Originally, a continuous incremental backup was creating a backupRootDir/.tmp/backup_XXXX directory for bulk load output, but this directory was not being deleted after. This would cause an issue in MapReduceBackupMergeJob when performing a merge. During the merge, it tries to move the backup directory to the .tmp directory. Since the backup directory already exists inside of .tmp, the location of the backup manifest ends up being .tmp/backup_XXXX/backup_XXXX/.backup.manifest. As a result, the backup manifest cannot be found and an error is thrown because the code of lookin for the file in .tmp/backup_XXXX/.backup.manifest and not .tmp/backup_XXXX/backup_XXXX/.backup.manifest.

Testing

  • I have added an assert statement to a continuous incremental backup unit test that verifies the backup directory has been deleted after the incremental backup is complete.
  • I have manually verified the bulk load directory is empty right before it is deleted, but this check has not been implemented into a unit test. I looked into how this could be tested. This check would need to be performed after IncrementalTableBackupClient.handleBulkLoad(). The only place I see this method in current test code is in TestBackupBase, and this code is for non-continuous incremental backups. Based on what I have found, handling this test case would require code changes beyond the scope of this PR that involve creating a test similar to the one in TestBackupBase or overriding IncrementalTableBackupClient.execute() in a test class to have it check the directory is empty and then delete.

…be found

Change-Id: Ifc517dbaee09b290315853e831dda09542bbf72a
@kgeisz
Copy link
Contributor Author

kgeisz commented Jan 22, 2026

I have manually verified the bulk load directory is empty right before it is deleted, but this check has not been implemented into a unit test. I looked into how this could be tested. This check would need to be performed after IncrementalTableBackupClient.handleBulkLoad(). The only place I see this method in current test code is in TestBackupBase, and this code is for non-continuous incremental backups. Based on what I have found, handling this test case would require code changes beyond the scope of this PR that involve creating a test similar to the one in TestBackupBase or overriding IncrementalTableBackupClient.execute() in a test class to have it check the directory is empty and then delete.

Another thought I had is to add an assert directly in IncrementalTableBackupClient.execute() that checks the directory is empty before performing the delete, but I'm not sure that we want the backup to fail on the off chance that the assert fails.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ HBASE-28957_rebased Compile Tests _
+1 💚 mvninstall 3m 31s HBASE-28957_rebased passed
+1 💚 compile 0m 39s HBASE-28957_rebased passed
+1 💚 checkstyle 0m 15s HBASE-28957_rebased passed
+1 💚 spotbugs 0m 38s HBASE-28957_rebased passed
+1 💚 spotless 0m 53s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 17s the patch passed
+1 💚 compile 0m 33s the patch passed
+1 💚 javac 0m 33s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 12s the patch passed
+1 💚 spotbugs 0m 36s the patch passed
+1 💚 hadoopcheck 12m 5s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 spotless 0m 46s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
31m 56s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7664/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7664
JIRA Issue HBASE-29826
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 118b13ddc4ac 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957_rebased / 65b3453
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7664/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 1s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ HBASE-28957_rebased Compile Tests _
+1 💚 mvninstall 5m 16s HBASE-28957_rebased passed
+1 💚 compile 0m 36s HBASE-28957_rebased passed
+1 💚 javadoc 0m 32s HBASE-28957_rebased passed
+1 💚 shadedjars 8m 0s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 16s the patch passed
+1 💚 compile 0m 21s the patch passed
+1 💚 javac 0m 21s the patch passed
+1 💚 javadoc 0m 15s the patch passed
+1 💚 shadedjars 6m 13s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 19m 10s hbase-backup in the patch passed.
46m 8s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7664/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7664
JIRA Issue HBASE-29826
Optional Tests javac javadoc unit compile shadedjars
uname Linux 07f749332ef5 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957_rebased / 65b3453
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7664/1/testReport/
Max. process+thread count 3677 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7664/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Comment on lines +407 to +409
if (backupInfo.isContinuousBackupEnabled()) {
deleteBulkLoadDirectory();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why don't we always delete the deleteBulkLoadDirectory ? please see in the function of incrementalCopyHFiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out, incrementalCopyHFiles() will delete the directory. However, the directory will be created again for continuous backups when handleBulkLoad() is run. There is logic in handleBulkload() that runs BackupUtils.collectBulkFiles() for continuous backups. When you go further down the method call chain, you see this calls BulkFilesCollector.collectFromWalDirs(), and eventually the BulkLoadCollectorJob is run. This job sets the output path here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the if (backupInfo.isContinuousBackupEnabled()) statement since this directory is created again for continuous backups only. If we didn't have this if statement, then we would see something like Could not delete... for non-continuous backups (also pointed out by Copilot).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will approve it, but my point is that, if two steps have to delete the same root of BulkOutputDir , can't we delete only once ? or delete only the directory related to step of INCREMENTAL_COPY and INCR_BACKUP_COMPLETE , and at the end deleting the BulkOutputDir.

it's very hard to figure out why we would need to recreate the BulkOutputDir twice as well.

@taklwu
Copy link
Contributor

taklwu commented Jan 23, 2026

also can you post the result of IntegrationTestContinuousBackupRestore is running fine after this change?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses HBASE-29826 by ensuring that the temporary bulk load output directory used during continuous incremental backups under BACKUP_ROOT_DIR/.tmp/<backupId> is cleaned up so it no longer interferes with subsequent backup merge jobs (which expect manifests at .../.tmp/<backupId>/.backup.manifest).

Changes:

  • In IncrementalTableBackupClient.execute, add a finally block for continuous backups to delete the bulk load output directory when the backup completes (successfully or with failure) and improve a Javadoc comment and logging style.
  • In TestIncrementalBackupWithContinuous, introduce a shared FileSystem field and a new assertion verifying that the bulk load output directory has been deleted after a continuous incremental backup completes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java Adds continuous-backup-specific cleanup of the bulk load directory at the end of execute, plus minor Javadoc and logging improvements.
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestIncrementalBackupWithContinuous.java Adjusts test setup/teardown to reuse a FileSystem and adds an assertion that the .tmp/<backupId> bulk load output directory is removed after a continuous incremental backup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +406 to +409
} finally {
if (backupInfo.isContinuousBackupEnabled()) {
deleteBulkLoadDirectory();
}
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteBulkLoadDirectory() is now invoked both in incrementalCopyHFiles's finally block and here in execute() when continuous backup is enabled, which means continuous incremental backups will attempt to delete the same bulk load directory twice. On the second call, FileSystem.delete will typically return false because the directory has already been removed, causing a misleading WARN log ("Could not delete ...") on every successful run and making operational diagnostics harder. Consider making the cleanup idempotent without emitting a warning when the directory is already gone (for example, by checking existence before deleting or relaxing the log level), or centralizing the cleanup in a single place so the directory is only deleted once per backup run.

Suggested change
} finally {
if (backupInfo.isContinuousBackupEnabled()) {
deleteBulkLoadDirectory();
}

Copilot uses AI. Check for mistakes.
@kgeisz
Copy link
Contributor Author

kgeisz commented Jan 23, 2026

also can you post the result of IntegrationTestContinuousBackupRestore is running fine after this change?

@taklwu, the test passes with occasional failures due to HBASE-29825. These failures can be avoided depending on the time of day the test is run.

Comment on lines +407 to +409
if (backupInfo.isContinuousBackupEnabled()) {
deleteBulkLoadDirectory();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will approve it, but my point is that, if two steps have to delete the same root of BulkOutputDir , can't we delete only once ? or delete only the directory related to step of INCREMENTAL_COPY and INCR_BACKUP_COMPLETE , and at the end deleting the BulkOutputDir.

it's very hard to figure out why we would need to recreate the BulkOutputDir twice as well.

@taklwu taklwu merged commit 480fe04 into apache:HBASE-28957_rebased Jan 23, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants