-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29826: Backup merge is failing because .backup.manifest cannot be found #7664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-29826: Backup merge is failing because .backup.manifest cannot be found #7664
Conversation
…be found Change-Id: Ifc517dbaee09b290315853e831dda09542bbf72a
Another thought I had is to add an |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
| if (backupInfo.isContinuousBackupEnabled()) { | ||
| deleteBulkLoadDirectory(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why don't we always delete the deleteBulkLoadDirectory ? please see in the function of incrementalCopyHFiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you pointed out, incrementalCopyHFiles() will delete the directory. However, the directory will be created again for continuous backups when handleBulkLoad() is run. There is logic in handleBulkload() that runs BackupUtils.collectBulkFiles() for continuous backups. When you go further down the method call chain, you see this calls BulkFilesCollector.collectFromWalDirs(), and eventually the BulkLoadCollectorJob is run. This job sets the output path here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the if (backupInfo.isContinuousBackupEnabled()) statement since this directory is created again for continuous backups only. If we didn't have this if statement, then we would see something like Could not delete... for non-continuous backups (also pointed out by Copilot).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will approve it, but my point is that, if two steps have to delete the same root of BulkOutputDir , can't we delete only once ? or delete only the directory related to step of INCREMENTAL_COPY and INCR_BACKUP_COMPLETE , and at the end deleting the BulkOutputDir.
it's very hard to figure out why we would need to recreate the BulkOutputDir twice as well.
|
also can you post the result of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses HBASE-29826 by ensuring that the temporary bulk load output directory used during continuous incremental backups under BACKUP_ROOT_DIR/.tmp/<backupId> is cleaned up so it no longer interferes with subsequent backup merge jobs (which expect manifests at .../.tmp/<backupId>/.backup.manifest).
Changes:
- In
IncrementalTableBackupClient.execute, add afinallyblock for continuous backups to delete the bulk load output directory when the backup completes (successfully or with failure) and improve a Javadoc comment and logging style. - In
TestIncrementalBackupWithContinuous, introduce a sharedFileSystemfield and a new assertion verifying that the bulk load output directory has been deleted after a continuous incremental backup completes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java | Adds continuous-backup-specific cleanup of the bulk load directory at the end of execute, plus minor Javadoc and logging improvements. |
| hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestIncrementalBackupWithContinuous.java | Adjusts test setup/teardown to reuse a FileSystem and adds an assertion that the .tmp/<backupId> bulk load output directory is removed after a continuous incremental backup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } finally { | ||
| if (backupInfo.isContinuousBackupEnabled()) { | ||
| deleteBulkLoadDirectory(); | ||
| } |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleteBulkLoadDirectory() is now invoked both in incrementalCopyHFiles's finally block and here in execute() when continuous backup is enabled, which means continuous incremental backups will attempt to delete the same bulk load directory twice. On the second call, FileSystem.delete will typically return false because the directory has already been removed, causing a misleading WARN log ("Could not delete ...") on every successful run and making operational diagnostics harder. Consider making the cleanup idempotent without emitting a warning when the directory is already gone (for example, by checking existence before deleting or relaxing the log level), or centralizing the cleanup in a single place so the directory is only deleted once per backup run.
| } finally { | |
| if (backupInfo.isContinuousBackupEnabled()) { | |
| deleteBulkLoadDirectory(); | |
| } |
@taklwu, the test passes with occasional failures due to HBASE-29825. These failures can be avoided depending on the time of day the test is run. |
| if (backupInfo.isContinuousBackupEnabled()) { | ||
| deleteBulkLoadDirectory(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will approve it, but my point is that, if two steps have to delete the same root of BulkOutputDir , can't we delete only once ? or delete only the directory related to step of INCREMENTAL_COPY and INCR_BACKUP_COMPLETE , and at the end deleting the BulkOutputDir.
it's very hard to figure out why we would need to recreate the BulkOutputDir twice as well.
https://issues.apache.org/jira/browse/HBASE-29826
This pull request makes a change that deletes the bulk load output directory when a continuous incremental backup is complete.
Summary
Originally, a continuous incremental backup was creating a
backupRootDir/.tmp/backup_XXXXdirectory for bulk load output, but this directory was not being deleted after. This would cause an issue inMapReduceBackupMergeJobwhen performing a merge. During the merge, it tries to move the backup directory to the.tmpdirectory. Since the backup directory already exists inside of.tmp, the location of the backup manifest ends up being.tmp/backup_XXXX/backup_XXXX/.backup.manifest. As a result, the backup manifest cannot be found and an error is thrown because the code of lookin for the file in.tmp/backup_XXXX/.backup.manifestand not.tmp/backup_XXXX/backup_XXXX/.backup.manifest.Testing
assertstatement to a continuous incremental backup unit test that verifies the backup directory has been deleted after the incremental backup is complete.IncrementalTableBackupClient.handleBulkLoad(). The only place I see this method in current test code is in TestBackupBase, and this code is for non-continuous incremental backups. Based on what I have found, handling this test case would require code changes beyond the scope of this PR that involve creating a test similar to the one inTestBackupBaseor overridingIncrementalTableBackupClient.execute()in a test class to have it check the directory is empty and then delete.