CLDSRV-836 Fix deep healthcheck fail status #6055

BourgoisMickael · 2026-01-20T12:26:49Z

Now fail if ALL backends of only one client fails.

Previously, the deep healthcheck would fail if ALL backends/locations
failed globally across all clients (data, metadata, vault, kms).

This change modifies the logic to fail if ANY client has ALL its
backends/locations failing. This ensures:

For data backend with multiple sproxyd location constraints:
- Returns HTTP 200 if at least ONE location is healthy
- Returns HTTP 500 only if ALL locations fail
Each client (data, metadata, vault, kms) is evaluated independently
- If ALL locations of the data client fail, overall check fails
- If ALL locations of metadata fail, overall check fails
- etc.

The new logic uses:

results.some() to check across clients
keys.every() within each client to check all its locations

Now fail if ALL backends of only one client fails. Previously, the deep healthcheck would fail if ALL backends/locations failed globally across all clients (data, metadata, vault, kms). This change modifies the logic to fail if ANY client has ALL its backends/locations failing. This ensures: 1. For data backend with multiple sproxyd location constraints: - Returns HTTP 200 if at least ONE location is healthy - Returns HTTP 500 only if ALL locations fail 2. Each client (data, metadata, vault, kms) is evaluated independently - If ALL locations of the data client fail, overall check fails - If ALL locations of metadata fail, overall check fails - etc. The new logic uses: - `results.some()` to check across clients - `keys.every()` within each client to check all its locations

bert-e · 2026-01-20T12:26:53Z

Hello bourgoismickael,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options

name	description	privileged	authored
`/after_pull_request`	Wait for the given pull request id to be merged before continuing with the current one.
`/bypass_author_approval`	Bypass the pull request author's approval	⭐
`/bypass_build_status`	Bypass the build and test status	⭐
`/bypass_commit_size`	Bypass the check on the size of the changeset `TBA`	⭐
`/bypass_incompatible_branch`	Bypass the check on the source branch prefix	⭐
`/bypass_jira_check`	Bypass the Jira issue check	⭐
`/bypass_peer_approval`	Bypass the pull request peers' approval	⭐
`/bypass_leader_approval`	Bypass the pull request leaders' approval	⭐
`/approve`	Instruct Bert-E that the author has approved the pull request.		✍️
`/create_pull_requests`	Allow the creation of integration pull requests.
`/create_integration_branches`	Allow the creation of integration branches.
`/no_octopus`	Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
`/unanimity`	Change review acceptance criteria from `one reviewer at least` to `all reviewers`
`/wait`	Instruct Bert-E not to run until further notice.

Available commands

name	description	privileged
`/help`	Print Bert-E's manual in the pull request.
`/status`	Print Bert-E's current status in the pull request `TBA`
`/clear`	Remove all comments from Bert-E from the history `TBA`
`/retry`	Re-start a fresh build `TBA`
`/build`	Re-start a fresh build `TBA`
`/force_reset`	Delete integration branches & pull requests, and restart merge process from the beginning.
`/reset`	Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

bert-e · 2026-01-20T12:27:01Z

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

codecov · 2026-01-20T12:28:34Z

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.38%. Comparing base (9b02185) to head (e6007fc).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
lib/utilities/healthcheckHandler.js	83.33%	1 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
lib/utilities/healthcheckHandler.js	`88.88% <83.33%> (-1.31%)`	⬇️

... and 2 files with indirect coverage changes

@@                 Coverage Diff                 @@
##           development/9.2    #6055      +/-   ##
===================================================
- Coverage            84.41%   84.38%   -0.03%     
===================================================
  Files                  206      206              
  Lines                13016    13019       +3     
===================================================
- Hits                 10987    10986       -1     
- Misses                2029     2033       +4

Flag	Coverage Δ
file-ft-tests	`67.44% <83.33%> (-0.01%)`	⬇️
kmip-ft-tests	`28.11% <83.33%> (+<0.01%)`	⬆️
mongo-v0-ft-tests	`68.69% <83.33%> (-0.01%)`	⬇️
mongo-v1-ft-tests	`68.70% <83.33%> (+0.01%)`	⬆️
multiple-backend	`35.28% <83.33%> (+<0.01%)`	⬆️
sur-tests	`36.39% <83.33%> (+<0.01%)`	⬆️
sur-tests-inflights	`37.40% <83.33%> (-0.03%)`	⬇️
unit	`69.97% <83.33%> (-0.01%)`	⬇️
utapi-v2-tests	`34.29% <83.33%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR fixes the deep healthcheck logic to fail when ANY client has ALL its backends/locations failing, rather than only failing when ALL backends across ALL clients fail. This ensures better detection of client-specific failures, particularly for multi-location data backends.

Changes:

Modified the failure detection logic from checking all backends globally to checking each client independently
Added empty client handling to skip clients with no backends

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-20T13:13:30Z

lib/utilities/healthcheckHandler.js

+        const fail = results.some(clientResult => {
+            const keys = Object.keys(clientResult);
+            // If this client has no keys, skip it
+            if (keys.length === 0) {
+                return false;
+            }
+            // Check if ALL backends/locations of this client have errors
+            return keys.every(k =>
+                // if there is an error from an external backend,
+                // only return a 500 if it is on startup
+                // (flightCheckOnStartUp set to true)
+                clientResult[k].error && (flightCheckOnStartUp || !clientResult[k].external)
+            );
+        });


This PR introduces a significant change in the healthcheck failure logic - now failing when ANY client has ALL its backends failing (instead of only when ALL backends across ALL clients fail). This new behavior should be covered by tests to ensure it works as expected. Consider adding test cases for scenarios such as:

All backends of the data client failing while metadata client is healthy (should fail)

One data backend location failing while another is healthy and metadata is healthy (should succeed)

All backends of multiple clients failing (should fail)

BourgoisMickael requested review from a team, anurag4DSB, Copilot, leif-scality and nicolas2bert and removed request for a team January 20, 2026 13:10

Copilot started reviewing on behalf of BourgoisMickael January 20, 2026 13:10 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

BourgoisMickael requested a review from francoisferrand January 20, 2026 13:14

leif-scality approved these changes Jan 20, 2026

View reviewed changes

tcarmet approved these changes Jan 20, 2026

View reviewed changes

jonathan-gramain approved these changes Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CLDSRV-836 Fix deep healthcheck fail status #6055

CLDSRV-836 Fix deep healthcheck fail status #6055

BourgoisMickael commented Jan 20, 2026

Uh oh!

bert-e commented Jan 20, 2026

Uh oh!

bert-e commented Jan 20, 2026

Uh oh!

codecov bot commented Jan 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

CLDSRV-836 Fix deep healthcheck fail status #6055

Are you sure you want to change the base?

CLDSRV-836 Fix deep healthcheck fail status #6055

Conversation

BourgoisMickael commented Jan 20, 2026

Uh oh!

bert-e commented Jan 20, 2026

Hello bourgoismickael,

Uh oh!

bert-e commented Jan 20, 2026

Request integration branches

Uh oh!

codecov bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov bot commented Jan 20, 2026 •

edited

Loading