Skip to content

Conversation

@mrm9084
Copy link
Member

@mrm9084 mrm9084 commented Jan 29, 2026

Description

Adds retry to startup. When all replicas fail there will be an attempt to retry the failed store for a period of time. By default 100s, minimal 30s, maximum 600s.

Also, refactors the load method to be split into a number of helper method to make this readable.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copilot AI review requested due to automatic review settings January 29, 2026 20:04
@github-actions github-actions bot added the azure-spring All azure-spring related issues label Jan 29, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a startup retry mechanism to Azure App Configuration for Java to handle transient failures during application startup. When all replicas fail to load configuration, the provider will automatically retry with exponential backoff until a configurable timeout expires.

Changes:

  • Added startup-timeout configuration property (default: 100s, min: 30s, max: 600s) to control retry duration during startup
  • Refactored AzureAppConfigDataLoader.load() into smaller helper methods (loadConfiguration, attemptLoadFromClients, setupMonitoringState, handleReplicaFailure) for improved readability
  • Implemented retry loop with intelligent backoff that waits until the next client becomes available before retrying

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
README.md Added documentation for new startup-timeout configuration option
CHANGELOG.md Documented the new startup retry feature
AppConfigurationProperties.java Added startupTimeout field with default value and validation (30-600 seconds)
AzureAppConfigDataResource.java Added startupTimeout parameter to constructor and getter method
AzureAppConfigDataLocationResolver.java Passed startupTimeout from properties to resources
AzureAppConfigDataLoader.java Refactored load method and implemented retry logic with backoff for startup failures
ConnectionManager.java Added getMillisUntilNextClientAvailable() to calculate wait time until next replica is available
AppConfigurationReplicaClientFactory.java Added wrapper method to expose getMillisUntilNextClientAvailable
ConfigStore.java Minor code quality improvements (variable naming, isEmpty() usage)
ConnectionManagerTest.java Added comprehensive tests for getMillisUntilNextClientAvailable method
AzureAppConfigDataResourceTest.java Updated test constructor calls to include startupTimeout parameter
AzureAppConfigDataLoaderTest.java Added tests for startup retry behavior and refresh non-retry behavior

mrm9084 and others added 3 commits January 29, 2026 14:18
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Comment on lines +137 to +142
if (startupTimeout == null) {
throw new IllegalArgumentException("startupTimeout cannot be null.");
}
if (startupTimeout.getSeconds() < 30 || startupTimeout.getSeconds() > 600) {
throw new IllegalArgumentException("startupTimeout must be between 30 and 600 seconds.");
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests verifying the startupTimeout validation logic that was added. Consider adding tests to verify that: 1) null startupTimeout throws IllegalArgumentException, 2) values below 30 seconds throw IllegalArgumentException, 3) values above 600 seconds throw IllegalArgumentException, and 4) values within the valid range (30-600) are accepted.

Copilot uses AI. Check for mistakes.
Comment on lines +192 to 215
if (Instant.now().isBefore(deadline)) {
long elapsedSeconds = Instant.now().getEpochSecond() - startTime.getEpochSecond();
Long backoffSeconds = getBackoffDuration(elapsedSeconds);

// If backoff is null, elapsed time exceeds fixed intervals - use exponential backoff
if (backoffSeconds == null) {
postFixedWindowAttempts++;
// Convert nanoseconds to seconds
backoffSeconds = BackoffTimeCalculator.calculateBackoff(postFixedWindowAttempts) / 1_000_000_000L;
}

// Don't wait longer than remaining time until deadline
long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();
long waitSeconds = Math.min(backoffSeconds, remainingSeconds);

if (waitSeconds > 0) {
logger.debug("All replicas in backoff for store: " + resource.getEndpoint()
+ ". Waiting " + waitSeconds + "s before retry (elapsed: " + elapsedSeconds + "s).");
try {
Thread.sleep(waitSeconds * 1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return lastException;
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition at line 192 is redundant because it's already guaranteed by the while loop condition at line 181. The code will only reach line 192 if Instant.now().isBefore(deadline) is true. Consider removing this redundant check to simplify the code.

Suggested change
if (Instant.now().isBefore(deadline)) {
long elapsedSeconds = Instant.now().getEpochSecond() - startTime.getEpochSecond();
Long backoffSeconds = getBackoffDuration(elapsedSeconds);
// If backoff is null, elapsed time exceeds fixed intervals - use exponential backoff
if (backoffSeconds == null) {
postFixedWindowAttempts++;
// Convert nanoseconds to seconds
backoffSeconds = BackoffTimeCalculator.calculateBackoff(postFixedWindowAttempts) / 1_000_000_000L;
}
// Don't wait longer than remaining time until deadline
long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();
long waitSeconds = Math.min(backoffSeconds, remainingSeconds);
if (waitSeconds > 0) {
logger.debug("All replicas in backoff for store: " + resource.getEndpoint()
+ ". Waiting " + waitSeconds + "s before retry (elapsed: " + elapsedSeconds + "s).");
try {
Thread.sleep(waitSeconds * 1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return lastException;
}
long elapsedSeconds = Instant.now().getEpochSecond() - startTime.getEpochSecond();
Long backoffSeconds = getBackoffDuration(elapsedSeconds);
// If backoff is null, elapsed time exceeds fixed intervals - use exponential backoff
if (backoffSeconds == null) {
postFixedWindowAttempts++;
// Convert nanoseconds to seconds
backoffSeconds = BackoffTimeCalculator.calculateBackoff(postFixedWindowAttempts) / 1_000_000_000L;
}
// Don't wait longer than remaining time until deadline
long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();
long waitSeconds = Math.min(backoffSeconds, remainingSeconds);
if (waitSeconds > 0) {
logger.debug("All replicas in backoff for store: " + resource.getEndpoint()
+ ". Waiting " + waitSeconds + "s before retry (elapsed: " + elapsedSeconds + "s).");
try {
Thread.sleep(waitSeconds * 1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return lastException;

Copilot uses AI. Check for mistakes.
}

// Don't wait longer than remaining time until deadline
long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 204, remainingSeconds could potentially be negative if there's a delay between the while condition check at line 181 and reaching line 204. While unlikely in practice, this could result in negative values being passed to Math.min() at line 205, which would then result in a negative waitSeconds. The check at line 207 prevents sleeping with negative values, but it would be clearer to use Math.max(0, deadline.getEpochSecond() - Instant.now().getEpochSecond()) to ensure remainingSeconds is never negative.

Suggested change
long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();
long remainingSeconds = Math.max(0L, deadline.getEpochSecond() - Instant.now().getEpochSecond());

Copilot uses AI. Check for mistakes.
Comment on lines +192 to +195
// Create a second client mock for the successful retry
AppConfigurationReplicaClient secondClientMock = Mockito.mock(AppConfigurationReplicaClient.class);
lenient().when(secondClientMock.getEndpoint()).thenReturn(ENDPOINT);

Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable secondClientMock is created but never used. It can be removed to clean up the test.

Suggested change
// Create a second client mock for the successful retry
AppConfigurationReplicaClient secondClientMock = Mockito.mock(AppConfigurationReplicaClient.class);
lenient().when(secondClientMock.getEndpoint()).thenReturn(ENDPOINT);

Copilot uses AI. Check for mistakes.
Comment on lines +251 to +280
long getMillisUntilNextClientAvailable() {
Instant now = Instant.now();
Instant earliestAvailable = Instant.MAX;

// Check configured clients
if (clients != null) {
for (AppConfigurationReplicaClient client : clients) {
Instant backoffEnd = client.getBackoffEndTime();
if (!backoffEnd.isAfter(now)) {
return 0; // Client available now
}
if (backoffEnd.isBefore(earliestAvailable)) {
earliestAvailable = backoffEnd;
}
}
}

// Check auto-failover clients
for (AppConfigurationReplicaClient client : autoFailoverClients.values()) {
Instant backoffEnd = client.getBackoffEndTime();
if (!backoffEnd.isAfter(now)) {
return 0; // Client available now
}
if (backoffEnd.isBefore(earliestAvailable)) {
earliestAvailable = backoffEnd;
}
}

return earliestAvailable.toEpochMilli() - now.toEpochMilli();
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method getMillisUntilNextClientAvailable is not used anywhere in the production code. It's only called in tests. This suggests that either the method should be removed as dead code, or the retry logic in AzureAppConfigDataLoader should be using this method to determine optimal wait times instead of fixed backoff intervals. Consider removing this method or integrating it into the actual retry logic.

Copilot uses AI. Check for mistakes.
Comment on lines +111 to 120
* Gets the duration in milliseconds until the next client becomes available for the specified store.
*
* @param originEndpoint the origin configuration store endpoint
* @return duration in milliseconds until next client is available, or 0 if one is available now
*/
long getMillisUntilNextClientAvailable(String originEndpoint) {
return CONNECTIONS.get(originEndpoint).getMillisUntilNextClientAvailable();
}

/**
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method getMillisUntilNextClientAvailable is not used anywhere in the production code. It's only called in tests. This suggests that either the method should be removed as dead code, or it should be integrated into the retry logic. Consider removing this method or using it in the startup retry implementation.

Suggested change
* Gets the duration in milliseconds until the next client becomes available for the specified store.
*
* @param originEndpoint the origin configuration store endpoint
* @return duration in milliseconds until next client is available, or 0 if one is available now
*/
long getMillisUntilNextClientAvailable(String originEndpoint) {
return CONNECTIONS.get(originEndpoint).getMillisUntilNextClientAvailable();
}
/**

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +93
private static final int[][] STARTUP_BACKOFF_INTERVALS = {
{100, 5}, // 0-100 seconds elapsed: 5 second backoff
{200, 10}, // 100-200 seconds elapsed: 10 second backoff
{600, 30} // 200-600 seconds elapsed: 30 second backoff
};
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed backoff intervals extend to 600 seconds, but the comment at line 90 says "0-100 seconds elapsed". This threshold value (100) should match the first interval threshold and appears correct. However, the last interval at line 92 covers "200-600 seconds elapsed" which seems inconsistent with the default and maximum timeout of 100-600 seconds. Since the default timeout is 100 seconds and the minimum is 30 seconds, many users will never reach the higher backoff intervals defined here. Consider whether these intervals align with the expected timeout ranges.

Copilot uses AI. Check for mistakes.
@rujche rujche added the azure-spring-app-configuration Spring app configuration related issues. label Feb 3, 2026
@rujche rujche moved this from Todo to In Progress in Spring Cloud Azure Feb 3, 2026
@rujche rujche added this to the 2026-02 milestone Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

azure-spring All azure-spring related issues azure-spring-app-configuration Spring app configuration related issues.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants