-
Notifications
You must be signed in to change notification settings - Fork 2.2k
App Config - Startup retry #47857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
App Config - Startup retry #47857
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds a startup retry mechanism to Azure App Configuration for Java to handle transient failures during application startup. When all replicas fail to load configuration, the provider will automatically retry with exponential backoff until a configurable timeout expires.
Changes:
- Added
startup-timeoutconfiguration property (default: 100s, min: 30s, max: 600s) to control retry duration during startup - Refactored
AzureAppConfigDataLoader.load()into smaller helper methods (loadConfiguration,attemptLoadFromClients,setupMonitoringState,handleReplicaFailure) for improved readability - Implemented retry loop with intelligent backoff that waits until the next client becomes available before retrying
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Added documentation for new startup-timeout configuration option |
| CHANGELOG.md | Documented the new startup retry feature |
| AppConfigurationProperties.java | Added startupTimeout field with default value and validation (30-600 seconds) |
| AzureAppConfigDataResource.java | Added startupTimeout parameter to constructor and getter method |
| AzureAppConfigDataLocationResolver.java | Passed startupTimeout from properties to resources |
| AzureAppConfigDataLoader.java | Refactored load method and implemented retry logic with backoff for startup failures |
| ConnectionManager.java | Added getMillisUntilNextClientAvailable() to calculate wait time until next replica is available |
| AppConfigurationReplicaClientFactory.java | Added wrapper method to expose getMillisUntilNextClientAvailable |
| ConfigStore.java | Minor code quality improvements (variable naming, isEmpty() usage) |
| ConnectionManagerTest.java | Added comprehensive tests for getMillisUntilNextClientAvailable method |
| AzureAppConfigDataResourceTest.java | Updated test constructor calls to include startupTimeout parameter |
| AzureAppConfigDataLoaderTest.java | Added tests for startup retry behavior and refresh non-retry behavior |
...ring/cloud/appconfiguration/config/implementation/properties/AppConfigurationProperties.java
Show resolved
Hide resolved
.../com/azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoader.java
Outdated
Show resolved
Hide resolved
.../com/azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoader.java
Show resolved
Hide resolved
.../com/azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoader.java
Outdated
Show resolved
Hide resolved
.../azure/spring/cloud/appconfiguration/config/implementation/AzureAppConfigDataLoaderTest.java
Show resolved
Hide resolved
...in/java/com/azure/spring/cloud/appconfiguration/config/implementation/ConnectionManager.java
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
| if (startupTimeout == null) { | ||
| throw new IllegalArgumentException("startupTimeout cannot be null."); | ||
| } | ||
| if (startupTimeout.getSeconds() < 30 || startupTimeout.getSeconds() > 600) { | ||
| throw new IllegalArgumentException("startupTimeout must be between 30 and 600 seconds."); | ||
| } |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no tests verifying the startupTimeout validation logic that was added. Consider adding tests to verify that: 1) null startupTimeout throws IllegalArgumentException, 2) values below 30 seconds throw IllegalArgumentException, 3) values above 600 seconds throw IllegalArgumentException, and 4) values within the valid range (30-600) are accepted.
| if (Instant.now().isBefore(deadline)) { | ||
| long elapsedSeconds = Instant.now().getEpochSecond() - startTime.getEpochSecond(); | ||
| Long backoffSeconds = getBackoffDuration(elapsedSeconds); | ||
|
|
||
| // If backoff is null, elapsed time exceeds fixed intervals - use exponential backoff | ||
| if (backoffSeconds == null) { | ||
| postFixedWindowAttempts++; | ||
| // Convert nanoseconds to seconds | ||
| backoffSeconds = BackoffTimeCalculator.calculateBackoff(postFixedWindowAttempts) / 1_000_000_000L; | ||
| } | ||
|
|
||
| // Don't wait longer than remaining time until deadline | ||
| long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond(); | ||
| long waitSeconds = Math.min(backoffSeconds, remainingSeconds); | ||
|
|
||
| if (waitSeconds > 0) { | ||
| logger.debug("All replicas in backoff for store: " + resource.getEndpoint() | ||
| + ". Waiting " + waitSeconds + "s before retry (elapsed: " + elapsedSeconds + "s)."); | ||
| try { | ||
| Thread.sleep(waitSeconds * 1000); | ||
| } catch (InterruptedException e) { | ||
| Thread.currentThread().interrupt(); | ||
| return lastException; | ||
| } |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition at line 192 is redundant because it's already guaranteed by the while loop condition at line 181. The code will only reach line 192 if Instant.now().isBefore(deadline) is true. Consider removing this redundant check to simplify the code.
| if (Instant.now().isBefore(deadline)) { | |
| long elapsedSeconds = Instant.now().getEpochSecond() - startTime.getEpochSecond(); | |
| Long backoffSeconds = getBackoffDuration(elapsedSeconds); | |
| // If backoff is null, elapsed time exceeds fixed intervals - use exponential backoff | |
| if (backoffSeconds == null) { | |
| postFixedWindowAttempts++; | |
| // Convert nanoseconds to seconds | |
| backoffSeconds = BackoffTimeCalculator.calculateBackoff(postFixedWindowAttempts) / 1_000_000_000L; | |
| } | |
| // Don't wait longer than remaining time until deadline | |
| long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond(); | |
| long waitSeconds = Math.min(backoffSeconds, remainingSeconds); | |
| if (waitSeconds > 0) { | |
| logger.debug("All replicas in backoff for store: " + resource.getEndpoint() | |
| + ". Waiting " + waitSeconds + "s before retry (elapsed: " + elapsedSeconds + "s)."); | |
| try { | |
| Thread.sleep(waitSeconds * 1000); | |
| } catch (InterruptedException e) { | |
| Thread.currentThread().interrupt(); | |
| return lastException; | |
| } | |
| long elapsedSeconds = Instant.now().getEpochSecond() - startTime.getEpochSecond(); | |
| Long backoffSeconds = getBackoffDuration(elapsedSeconds); | |
| // If backoff is null, elapsed time exceeds fixed intervals - use exponential backoff | |
| if (backoffSeconds == null) { | |
| postFixedWindowAttempts++; | |
| // Convert nanoseconds to seconds | |
| backoffSeconds = BackoffTimeCalculator.calculateBackoff(postFixedWindowAttempts) / 1_000_000_000L; | |
| } | |
| // Don't wait longer than remaining time until deadline | |
| long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond(); | |
| long waitSeconds = Math.min(backoffSeconds, remainingSeconds); | |
| if (waitSeconds > 0) { | |
| logger.debug("All replicas in backoff for store: " + resource.getEndpoint() | |
| + ". Waiting " + waitSeconds + "s before retry (elapsed: " + elapsedSeconds + "s)."); | |
| try { | |
| Thread.sleep(waitSeconds * 1000); | |
| } catch (InterruptedException e) { | |
| Thread.currentThread().interrupt(); | |
| return lastException; |
| } | ||
|
|
||
| // Don't wait longer than remaining time until deadline | ||
| long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond(); |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line 204, remainingSeconds could potentially be negative if there's a delay between the while condition check at line 181 and reaching line 204. While unlikely in practice, this could result in negative values being passed to Math.min() at line 205, which would then result in a negative waitSeconds. The check at line 207 prevents sleeping with negative values, but it would be clearer to use Math.max(0, deadline.getEpochSecond() - Instant.now().getEpochSecond()) to ensure remainingSeconds is never negative.
| long remainingSeconds = deadline.getEpochSecond() - Instant.now().getEpochSecond(); | |
| long remainingSeconds = Math.max(0L, deadline.getEpochSecond() - Instant.now().getEpochSecond()); |
| // Create a second client mock for the successful retry | ||
| AppConfigurationReplicaClient secondClientMock = Mockito.mock(AppConfigurationReplicaClient.class); | ||
| lenient().when(secondClientMock.getEndpoint()).thenReturn(ENDPOINT); | ||
|
|
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable secondClientMock is created but never used. It can be removed to clean up the test.
| // Create a second client mock for the successful retry | |
| AppConfigurationReplicaClient secondClientMock = Mockito.mock(AppConfigurationReplicaClient.class); | |
| lenient().when(secondClientMock.getEndpoint()).thenReturn(ENDPOINT); |
| long getMillisUntilNextClientAvailable() { | ||
| Instant now = Instant.now(); | ||
| Instant earliestAvailable = Instant.MAX; | ||
|
|
||
| // Check configured clients | ||
| if (clients != null) { | ||
| for (AppConfigurationReplicaClient client : clients) { | ||
| Instant backoffEnd = client.getBackoffEndTime(); | ||
| if (!backoffEnd.isAfter(now)) { | ||
| return 0; // Client available now | ||
| } | ||
| if (backoffEnd.isBefore(earliestAvailable)) { | ||
| earliestAvailable = backoffEnd; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Check auto-failover clients | ||
| for (AppConfigurationReplicaClient client : autoFailoverClients.values()) { | ||
| Instant backoffEnd = client.getBackoffEndTime(); | ||
| if (!backoffEnd.isAfter(now)) { | ||
| return 0; // Client available now | ||
| } | ||
| if (backoffEnd.isBefore(earliestAvailable)) { | ||
| earliestAvailable = backoffEnd; | ||
| } | ||
| } | ||
|
|
||
| return earliestAvailable.toEpochMilli() - now.toEpochMilli(); | ||
| } |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method getMillisUntilNextClientAvailable is not used anywhere in the production code. It's only called in tests. This suggests that either the method should be removed as dead code, or the retry logic in AzureAppConfigDataLoader should be using this method to determine optimal wait times instead of fixed backoff intervals. Consider removing this method or integrating it into the actual retry logic.
| * Gets the duration in milliseconds until the next client becomes available for the specified store. | ||
| * | ||
| * @param originEndpoint the origin configuration store endpoint | ||
| * @return duration in milliseconds until next client is available, or 0 if one is available now | ||
| */ | ||
| long getMillisUntilNextClientAvailable(String originEndpoint) { | ||
| return CONNECTIONS.get(originEndpoint).getMillisUntilNextClientAvailable(); | ||
| } | ||
|
|
||
| /** |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method getMillisUntilNextClientAvailable is not used anywhere in the production code. It's only called in tests. This suggests that either the method should be removed as dead code, or it should be integrated into the retry logic. Consider removing this method or using it in the startup retry implementation.
| * Gets the duration in milliseconds until the next client becomes available for the specified store. | |
| * | |
| * @param originEndpoint the origin configuration store endpoint | |
| * @return duration in milliseconds until next client is available, or 0 if one is available now | |
| */ | |
| long getMillisUntilNextClientAvailable(String originEndpoint) { | |
| return CONNECTIONS.get(originEndpoint).getMillisUntilNextClientAvailable(); | |
| } | |
| /** |
| private static final int[][] STARTUP_BACKOFF_INTERVALS = { | ||
| {100, 5}, // 0-100 seconds elapsed: 5 second backoff | ||
| {200, 10}, // 100-200 seconds elapsed: 10 second backoff | ||
| {600, 30} // 200-600 seconds elapsed: 30 second backoff | ||
| }; |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixed backoff intervals extend to 600 seconds, but the comment at line 90 says "0-100 seconds elapsed". This threshold value (100) should match the first interval threshold and appears correct. However, the last interval at line 92 covers "200-600 seconds elapsed" which seems inconsistent with the default and maximum timeout of 100-600 seconds. Since the default timeout is 100 seconds and the minimum is 30 seconds, many users will never reach the higher backoff intervals defined here. Consider whether these intervals align with the expected timeout ranges.
Description
Adds retry to startup. When all replicas fail there will be an attempt to retry the failed store for a period of time. By default 100s, minimal 30s, maximum 600s.
Also, refactors the load method to be split into a number of helper method to make this readable.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines