Skip to content

Conversation

@BourgoisMickael
Copy link
Contributor

Now fail if ALL backends of only one client fails.

Previously, the deep healthcheck would fail if ALL backends/locations
failed globally across all clients (data, metadata, vault, kms).

This change modifies the logic to fail if ANY client has ALL its
backends/locations failing. This ensures:

  1. For data backend with multiple sproxyd location constraints:

    • Returns HTTP 200 if at least ONE location is healthy
    • Returns HTTP 500 only if ALL locations fail
  2. Each client (data, metadata, vault, kms) is evaluated independently

    • If ALL locations of the data client fail, overall check fails
    • If ALL locations of metadata fail, overall check fails
    • etc.

The new logic uses:

  • results.some() to check across clients
  • keys.every() within each client to check all its locations

Now fail if ALL backends of only one client fails.

Previously, the deep healthcheck would fail if ALL backends/locations
failed globally across all clients (data, metadata, vault, kms).

This change modifies the logic to fail if ANY client has ALL its
backends/locations failing. This ensures:

1. For data backend with multiple sproxyd location constraints:
   - Returns HTTP 200 if at least ONE location is healthy
   - Returns HTTP 500 only if ALL locations fail

2. Each client (data, metadata, vault, kms) is evaluated independently
   - If ALL locations of the data client fail, overall check fails
   - If ALL locations of metadata fail, overall check fails
   - etc.

The new logic uses:
- `results.some()` to check across clients
- `keys.every()` within each client to check all its locations
@bert-e
Copy link
Contributor

bert-e commented Jan 20, 2026

Hello bourgoismickael,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Jan 20, 2026

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

@codecov
Copy link

codecov bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.38%. Comparing base (9b02185) to head (e6007fc).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
lib/utilities/healthcheckHandler.js 83.33% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
lib/utilities/healthcheckHandler.js 88.88% <83.33%> (-1.31%) ⬇️

... and 2 files with indirect coverage changes

@@                 Coverage Diff                 @@
##           development/9.2    #6055      +/-   ##
===================================================
- Coverage            84.41%   84.38%   -0.03%     
===================================================
  Files                  206      206              
  Lines                13016    13019       +3     
===================================================
- Hits                 10987    10986       -1     
- Misses                2029     2033       +4     
Flag Coverage Δ
file-ft-tests 67.44% <83.33%> (-0.01%) ⬇️
kmip-ft-tests 28.11% <83.33%> (+<0.01%) ⬆️
mongo-v0-ft-tests 68.69% <83.33%> (-0.01%) ⬇️
mongo-v1-ft-tests 68.70% <83.33%> (+0.01%) ⬆️
multiple-backend 35.28% <83.33%> (+<0.01%) ⬆️
sur-tests 36.39% <83.33%> (+<0.01%) ⬆️
sur-tests-inflights 37.40% <83.33%> (-0.03%) ⬇️
unit 69.97% <83.33%> (-0.01%) ⬇️
utapi-v2-tests 34.29% <83.33%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the deep healthcheck logic to fail when ANY client has ALL its backends/locations failing, rather than only failing when ALL backends across ALL clients fail. This ensures better detection of client-specific failures, particularly for multi-location data backends.

Changes:

  • Modified the failure detection logic from checking all backends globally to checking each client independently
  • Added empty client handling to skip clients with no backends

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +69 to +82
const fail = results.some(clientResult => {
const keys = Object.keys(clientResult);
// If this client has no keys, skip it
if (keys.length === 0) {
return false;
}
// Check if ALL backends/locations of this client have errors
return keys.every(k =>
// if there is an error from an external backend,
// only return a 500 if it is on startup
// (flightCheckOnStartUp set to true)
clientResult[k].error && (flightCheckOnStartUp || !clientResult[k].external)
);
});
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a significant change in the healthcheck failure logic - now failing when ANY client has ALL its backends failing (instead of only when ALL backends across ALL clients fail). This new behavior should be covered by tests to ensure it works as expected. Consider adding test cases for scenarios such as:

  • All backends of the data client failing while metadata client is healthy (should fail)
  • One data backend location failing while another is healthy and metadata is healthy (should succeed)
  • All backends of multiple clients failing (should fail)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants