fix: BatchResult completion reason logic #286

yaythomas · 2026-01-30T01:52:45Z

Description of changes:
Fix BatchResult.from_items() to match JavaScript SDK behavior. The previous implementation incorrectly checked conditions in the wrong order, causing incorrect completion reasons for concurrent operations with failure tolerance configurations.

The checkpointed completion reason is preserved during replay, so existing executions are unaffected. Code with conditional logic based on completion_reason might see different values after this update.

Example: With 3 items (1 success, 2 failures) and tolerated_failure_count=1:

Before: ALL_COMPLETED (incorrect - all items finished)
After: FAILURE_TOLERANCE_EXCEEDED (correct - tolerance breached)

Changes:

Extract completion reason logic to _get_completion_reason() method
Check failure tolerance BEFORE checking if all completed
Implement proper fail-fast when no completion config provided
Add comprehensive unit tests covering all edge cases

This ensures tolerance checks take precedence over success criteria, preventing operations from incorrectly reporting ALL_COMPLETED when failure tolerance has been exceeded.

new behaviour

Different completion_reason values for map/parallel operations in these scenarios:

No completion config with failures - Now returns FAILURE_TOLERANCE_EXCEEDED (previously ALL_COMPLETED)
Empty completion config with failures - Now returns FAILURE_TOLERANCE_EXCEEDED (previously ALL_COMPLETED)
Tolerance exceeded with all completed - Now returns FAILURE_TOLERANCE_EXCEEDED (previously ALL_COMPLETED)
Both min_successful and tolerance met - Now returns FAILURE_TOLERANCE_EXCEEDED (previously MIN_SUCCESSFUL_REACHED)

reference logic

https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/handlers/concurrent-execution-handler/concurrent-execution-handler.ts#L41-L75

  private getCompletionReason<T, R>(
    failureCount: number,
    successCount: number,
    completedCount: number,
    items: ConcurrentExecutionItem<T>[],
    config: ConcurrencyConfig<R>,
  ): "ALL_COMPLETED" | "MIN_SUCCESSFUL_REACHED" | "FAILURE_TOLERANCE_EXCEEDED" {
    // Check tolerance first, before checking if all completed
    const completion = config.completionConfig;

    // Handle fail-fast behavior (no completion config or empty completion config)
    if (!completion) {
      if (failureCount > 0) return "FAILURE_TOLERANCE_EXCEEDED";
    } else {
      const hasAnyCompletionCriteria = Object.values(completion).some(
        (value) => value !== undefined,
      );
      if (!hasAnyCompletionCriteria) {
        if (failureCount > 0) return "FAILURE_TOLERANCE_EXCEEDED";
      } else {
        // Check specific tolerance thresholds
        if (
          completion.toleratedFailureCount !== undefined &&
          failureCount > completion.toleratedFailureCount
        ) {
          return "FAILURE_TOLERANCE_EXCEEDED";
        }
        if (completion.toleratedFailurePercentage !== undefined) {
          const failurePercentage = (failureCount / items.length) * 100;
          if (failurePercentage > completion.toleratedFailurePercentage) {
            return "FAILURE_TOLERANCE_EXCEEDED";
          }
        }
      }
    }

Issue #, if available:
closes #280

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Fix BatchResult.from_items() to match JavaScript SDK behavior. The previous implementation incorrectly checked conditions in the wrong order, causing incorrect completion reasons for concurrent operations with failure tolerance configurations. The checkpointed completion reason is preserved during replay, so existing executions are unaffected. Code with conditional logic based on completion_reason might see different values after this update. Example: With 3 items (1 success, 2 failures) and tolerated_failure_count=1: - Before: ALL_COMPLETED (incorrect - all items finished) - After: FAILURE_TOLERANCE_EXCEEDED (correct - tolerance breached) Changes: - Extract completion reason logic to _get_completion_reason() method - Check failure tolerance BEFORE checking if all completed - Implement proper fail-fast when no completion config provided - Add comprehensive unit tests covering all edge cases This ensures tolerance checks take precedence over success criteria, preventing operations from incorrectly reporting ALL_COMPLETED when failure tolerance has been exceeded. closes #280

yaythomas requested a review from ParidelPooya January 30, 2026 02:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: BatchResult completion reason logic #286

fix: BatchResult completion reason logic #286

Uh oh!

yaythomas commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: BatchResult completion reason logic #286

Are you sure you want to change the base?

fix: BatchResult completion reason logic #286

Uh oh!

Conversation

yaythomas commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

new behaviour

reference logic

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yaythomas commented Jan 30, 2026 •

edited

Loading