App Config - Startup retry #47857

mrm9084 · 2026-01-29T20:04:51Z

Description

Adds retry to startup. When all replicas fail there will be an attempt to retry the failed store for a period of time. By default 100s, minimal 30s, maximum 600s.

Also, refactors the load method to be split into a number of helper method to make this readable.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

Copilot

Pull request overview

This pull request adds a startup retry mechanism to Azure App Configuration for Java to handle transient failures during application startup. When all replicas fail to load configuration, the provider will automatically retry with exponential backoff until a configurable timeout expires.

Changes:

Added startup-timeout configuration property (default: 100s, min: 30s, max: 600s) to control retry duration during startup
Refactored AzureAppConfigDataLoader.load() into smaller helper methods (loadConfiguration, attemptLoadFromClients, setupMonitoringState, handleReplicaFailure) for improved readability
Implemented retry loop with intelligent backoff that waits until the next client becomes available before retrying

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
README.md	Added documentation for new startup-timeout configuration option
CHANGELOG.md	Documented the new startup retry feature
AppConfigurationProperties.java	Added startupTimeout field with default value and validation (30-600 seconds)
AzureAppConfigDataResource.java	Added startupTimeout parameter to constructor and getter method
AzureAppConfigDataLocationResolver.java	Passed startupTimeout from properties to resources
AzureAppConfigDataLoader.java	Refactored load method and implemented retry logic with backoff for startup failures
ConnectionManager.java	Added getMillisUntilNextClientAvailable() to calculate wait time until next replica is available
AppConfigurationReplicaClientFactory.java	Added wrapper method to expose getMillisUntilNextClientAvailable
ConfigStore.java	Minor code quality improvements (variable naming, isEmpty() usage)
ConnectionManagerTest.java	Added comprehensive tests for getMillisUntilNextClientAvailable method
AzureAppConfigDataResourceTest.java	Updated test constructor calls to include startupTimeout parameter
AzureAppConfigDataLoaderTest.java	Added tests for startup retry behavior and refresh non-retry behavior

...ring/cloud/appconfiguration/config/implementation/properties/AppConfigurationProperties.java

.../com/azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoader.java

.../azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoaderTest.java

...in/java/com/azure/spring/cloud/appconfiguration/config/implementation/ConnectionManager.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Copilot · 2026-01-30T19:20:18Z

...ring/cloud/appconfiguration/config/implementation/properties/AppConfigurationProperties.java

+        if (startupTimeout == null) {
+            throw new IllegalArgumentException("startupTimeout cannot be null.");
+        }
+        if (startupTimeout.getSeconds() < 30 || startupTimeout.getSeconds() > 600) {
+            throw new IllegalArgumentException("startupTimeout must be between 30 and 600 seconds.");
+        }


There are no tests verifying the startupTimeout validation logic that was added. Consider adding tests to verify that: 1) null startupTimeout throws IllegalArgumentException, 2) values below 30 seconds throw IllegalArgumentException, 3) values above 600 seconds throw IllegalArgumentException, and 4) values within the valid range (30-600) are accepted.

Copilot · 2026-01-30T19:20:18Z

.../com/azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoader.java

+            if (Instant.now().isBefore(deadline)) {
+                long elapsedSeconds = Instant.now().getEpochSecond() - startTime.getEpochSecond();
+                Long backoffSeconds = getBackoffDuration(elapsedSeconds);
+
+                // If backoff is null, elapsed time exceeds fixed intervals - use exponential backoff
+                if (backoffSeconds == null) {
+                    postFixedWindowAttempts++;
+                    // Convert nanoseconds to seconds
+                    backoffSeconds = BackoffTimeCalculator.calculateBackoff(postFixedWindowAttempts) / 1_000_000_000L;
+                }
+
+                // Don't wait longer than remaining time until deadline
+                long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();
+                long waitSeconds = Math.min(backoffSeconds, remainingSeconds);
+
+                if (waitSeconds > 0) {
+                    logger.debug("All replicas in backoff for store: " + resource.getEndpoint() 
+                        + ". Waiting " + waitSeconds + "s before retry (elapsed: " + elapsedSeconds + "s).");
+                    try {
+                        Thread.sleep(waitSeconds * 1000);
+                    } catch (InterruptedException e) {
+                        Thread.currentThread().interrupt();
+                        return lastException;
                    }


The condition at line 192 is redundant because it's already guaranteed by the while loop condition at line 181. The code will only reach line 192 if Instant.now().isBefore(deadline) is true. Consider removing this redundant check to simplify the code.

Suggested change

if (Instant.now().isBefore(deadline)) {

long elapsedSeconds = Instant.now().getEpochSecond() - startTime.getEpochSecond();

Long backoffSeconds = getBackoffDuration(elapsedSeconds);

// If backoff is null, elapsed time exceeds fixed intervals - use exponential backoff

if (backoffSeconds == null) {

postFixedWindowAttempts++;

// Convert nanoseconds to seconds

backoffSeconds = BackoffTimeCalculator.calculateBackoff(postFixedWindowAttempts) / 1_000_000_000L;

}

// Don't wait longer than remaining time until deadline

long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();

long waitSeconds = Math.min(backoffSeconds, remainingSeconds);

if (waitSeconds > 0) {

logger.debug("All replicas in backoff for store: " + resource.getEndpoint()

+ ". Waiting " + waitSeconds + "s before retry (elapsed: " + elapsedSeconds + "s).");

try {

Thread.sleep(waitSeconds * 1000);

} catch (InterruptedException e) {

Thread.currentThread().interrupt();

return lastException;

}

long elapsedSeconds = Instant.now().getEpochSecond() - startTime.getEpochSecond();

Long backoffSeconds = getBackoffDuration(elapsedSeconds);

// If backoff is null, elapsed time exceeds fixed intervals - use exponential backoff

if (backoffSeconds == null) {

postFixedWindowAttempts++;

// Convert nanoseconds to seconds

backoffSeconds = BackoffTimeCalculator.calculateBackoff(postFixedWindowAttempts) / 1_000_000_000L;

}

// Don't wait longer than remaining time until deadline

long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();

long waitSeconds = Math.min(backoffSeconds, remainingSeconds);

if (waitSeconds > 0) {

logger.debug("All replicas in backoff for store: " + resource.getEndpoint()

+ ". Waiting " + waitSeconds + "s before retry (elapsed: " + elapsedSeconds + "s).");

try {

Thread.sleep(waitSeconds * 1000);

} catch (InterruptedException e) {

Thread.currentThread().interrupt();

return lastException;

Copilot · 2026-01-30T19:20:19Z

.../com/azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoader.java

+                }
+
+                // Don't wait longer than remaining time until deadline
+                long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();


At line 204, remainingSeconds could potentially be negative if there's a delay between the while condition check at line 181 and reaching line 204. While unlikely in practice, this could result in negative values being passed to Math.min() at line 205, which would then result in a negative waitSeconds. The check at line 207 prevents sleeping with negative values, but it would be clearer to use Math.max(0, deadline.getEpochSecond() - Instant.now().getEpochSecond()) to ensure remainingSeconds is never negative.

Suggested change

long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();

long remainingSeconds = Math.max(0L, deadline.getEpochSecond() - Instant.now().getEpochSecond());

Copilot · 2026-01-30T19:20:19Z

.../azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoaderTest.java

+        // Create a second client mock for the successful retry
+        AppConfigurationReplicaClient secondClientMock = Mockito.mock(AppConfigurationReplicaClient.class);
+        lenient().when(secondClientMock.getEndpoint()).thenReturn(ENDPOINT);
+


The variable secondClientMock is created but never used. It can be removed to clean up the test.

Suggested change

// Create a second client mock for the successful retry

AppConfigurationReplicaClient secondClientMock = Mockito.mock(AppConfigurationReplicaClient.class);

lenient().when(secondClientMock.getEndpoint()).thenReturn(ENDPOINT);

Copilot · 2026-01-30T19:20:19Z

...in/java/com/azure/spring/cloud/appconfiguration/config/implementation/ConnectionManager.java

+    long getMillisUntilNextClientAvailable() {
+        Instant now = Instant.now();
+        Instant earliestAvailable = Instant.MAX;
+
+        // Check configured clients
+        if (clients != null) {
+            for (AppConfigurationReplicaClient client : clients) {
+                Instant backoffEnd = client.getBackoffEndTime();
+                if (!backoffEnd.isAfter(now)) {
+                    return 0; // Client available now
+                }
+                if (backoffEnd.isBefore(earliestAvailable)) {
+                    earliestAvailable = backoffEnd;
+                }
+            }
+        }
+
+        // Check auto-failover clients
+        for (AppConfigurationReplicaClient client : autoFailoverClients.values()) {
+            Instant backoffEnd = client.getBackoffEndTime();
+            if (!backoffEnd.isAfter(now)) {
+                return 0; // Client available now
+            }
+            if (backoffEnd.isBefore(earliestAvailable)) {
+                earliestAvailable = backoffEnd;
+            }
+        }
+
+        return earliestAvailable.toEpochMilli() - now.toEpochMilli();
+    }


The method getMillisUntilNextClientAvailable is not used anywhere in the production code. It's only called in tests. This suggests that either the method should be removed as dead code, or the retry logic in AzureAppConfigDataLoader should be using this method to determine optimal wait times instead of fixed backoff intervals. Consider removing this method or integrating it into the actual retry logic.

Copilot · 2026-01-30T19:20:19Z

...pring/cloud/appconfiguration/config/implementation/AppConfigurationReplicaClientFactory.java

+     * Gets the duration in milliseconds until the next client becomes available for the specified store.
+     * 
+     * @param originEndpoint the origin configuration store endpoint
+     * @return duration in milliseconds until next client is available, or 0 if one is available now
+     */
+    long getMillisUntilNextClientAvailable(String originEndpoint) {
+        return CONNECTIONS.get(originEndpoint).getMillisUntilNextClientAvailable();
+    }
+
    /**


The method getMillisUntilNextClientAvailable is not used anywhere in the production code. It's only called in tests. This suggests that either the method should be removed as dead code, or it should be integrated into the retry logic. Consider removing this method or using it in the startup retry implementation.

Suggested change

* Gets the duration in milliseconds until the next client becomes available for the specified store.

*

* @param originEndpoint the origin configuration store endpoint

* @return duration in milliseconds until next client is available, or 0 if one is available now

*/

long getMillisUntilNextClientAvailable(String originEndpoint) {

return CONNECTIONS.get(originEndpoint).getMillisUntilNextClientAvailable();

}

/**

Copilot · 2026-01-30T19:20:20Z

.../com/azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoader.java

+    private static final int[][] STARTUP_BACKOFF_INTERVALS = {
+        {100, 5},   // 0-100 seconds elapsed: 5 second backoff
+        {200, 10},  // 100-200 seconds elapsed: 10 second backoff
+        {600, 30}   // 200-600 seconds elapsed: 30 second backoff
+    };


The fixed backoff intervals extend to 600 seconds, but the comment at line 90 says "0-100 seconds elapsed". This threshold value (100) should match the first interval threshold and appears correct. However, the last interval at line 92 covers "200-600 seconds elapsed" which seems inconsistent with the default and maximum timeout of 100-600 seconds. Since the default timeout is 100 seconds and the minimum is 30 seconds, many users will never reach the higher backoff intervals defined here. Consider whether these intervals align with the expected timeout ranges.

mrm9084 added 4 commits January 29, 2026 10:44

Refactor + Startup Retry

3a43016

Update AzureAppConfigDataLoader.java

1a87bd9

Adding Tests

6c2a36e

Updating readme, correct location

dc1db90

Copilot AI review requested due to automatic review settings January 29, 2026 20:04

mrm9084 requested review from a team, Netyyyy, avanigupta, moarychan, rossgrambo, rujche and saragluna as code owners January 29, 2026 20:04

github-actions bot added the azure-spring All azure-spring related issues label Jan 29, 2026

Copilot started reviewing on behalf of mrm9084 January 29, 2026 20:05 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

mrm9084 and others added 3 commits January 29, 2026 14:18

Apply suggestions from code review

89f36c4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

interval change

29368b5

Merge branch 'main' into StartupRetry

3e187f7

mrm9084 requested a review from Copilot January 30, 2026 19:10

Copilot started reviewing on behalf of mrm9084 January 30, 2026 19:10 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

rujche assigned mrm9084 Feb 3, 2026

rujche added the azure-spring-app-configuration Spring app configuration related issues. label Feb 3, 2026

rujche added this to Spring Cloud Azure Feb 3, 2026

github-project-automation bot moved this to Todo in Spring Cloud Azure Feb 3, 2026

rujche moved this from Todo to In Progress in Spring Cloud Azure Feb 3, 2026

rujche added this to the 2026-02 milestone Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

App Config - Startup retry #47857

App Config - Startup retry #47857

Uh oh!

mrm9084 commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond();
	long remainingSeconds = Math.max(0L, deadline.getEpochSecond() - Instant.now().getEpochSecond());

	// Create a second client mock for the successful retry
	AppConfigurationReplicaClient secondClientMock = Mockito.mock(AppConfigurationReplicaClient.class);
	lenient().when(secondClientMock.getEndpoint()).thenReturn(ENDPOINT);

App Config - Startup retry #47857

Are you sure you want to change the base?

App Config - Startup retry #47857

Uh oh!

Conversation

mrm9084 commented Jan 29, 2026

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants