Skip to content

Conversation

@yaythomas
Copy link
Member

@yaythomas yaythomas commented Jan 30, 2026

Description of changes:
Fix BatchResult.from_items() to match JavaScript SDK behavior. The previous implementation incorrectly checked conditions in the wrong order, causing incorrect completion reasons for concurrent operations with failure tolerance configurations.

The checkpointed completion reason is preserved during replay, so existing executions are unaffected. Code with conditional logic based on completion_reason might see different values after this update.

Example: With 3 items (1 success, 2 failures) and tolerated_failure_count=1:

  • Before: ALL_COMPLETED (incorrect - all items finished)
  • After: FAILURE_TOLERANCE_EXCEEDED (correct - tolerance breached)

Changes:

  • Extract completion reason logic to _get_completion_reason() method
  • Check failure tolerance BEFORE checking if all completed
  • Implement proper fail-fast when no completion config provided
  • Add comprehensive unit tests covering all edge cases

This ensures tolerance checks take precedence over success criteria, preventing operations from incorrectly reporting ALL_COMPLETED when failure tolerance has been exceeded.

new behaviour

Different completion_reason values for map/parallel operations in these scenarios:

  1. No completion config with failures - Now returns FAILURE_TOLERANCE_EXCEEDED (previously ALL_COMPLETED)
  2. Empty completion config with failures - Now returns FAILURE_TOLERANCE_EXCEEDED (previously ALL_COMPLETED)
  3. Tolerance exceeded with all completed - Now returns FAILURE_TOLERANCE_EXCEEDED (previously ALL_COMPLETED)
  4. Both min_successful and tolerance met - Now returns FAILURE_TOLERANCE_EXCEEDED (previously MIN_SUCCESSFUL_REACHED)

reference logic

https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/handlers/concurrent-execution-handler/concurrent-execution-handler.ts#L41-L75

  private getCompletionReason<T, R>(
    failureCount: number,
    successCount: number,
    completedCount: number,
    items: ConcurrentExecutionItem<T>[],
    config: ConcurrencyConfig<R>,
  ): "ALL_COMPLETED" | "MIN_SUCCESSFUL_REACHED" | "FAILURE_TOLERANCE_EXCEEDED" {
    // Check tolerance first, before checking if all completed
    const completion = config.completionConfig;

    // Handle fail-fast behavior (no completion config or empty completion config)
    if (!completion) {
      if (failureCount > 0) return "FAILURE_TOLERANCE_EXCEEDED";
    } else {
      const hasAnyCompletionCriteria = Object.values(completion).some(
        (value) => value !== undefined,
      );
      if (!hasAnyCompletionCriteria) {
        if (failureCount > 0) return "FAILURE_TOLERANCE_EXCEEDED";
      } else {
        // Check specific tolerance thresholds
        if (
          completion.toleratedFailureCount !== undefined &&
          failureCount > completion.toleratedFailureCount
        ) {
          return "FAILURE_TOLERANCE_EXCEEDED";
        }
        if (completion.toleratedFailurePercentage !== undefined) {
          const failurePercentage = (failureCount / items.length) * 100;
          if (failurePercentage > completion.toleratedFailurePercentage) {
            return "FAILURE_TOLERANCE_EXCEEDED";
          }
        }
      }
    }

Issue #, if available:
closes #280

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Fix BatchResult.from_items() to match JavaScript SDK behavior.
The previous implementation incorrectly checked conditions in the
wrong order, causing incorrect completion reasons for concurrent
operations with failure tolerance configurations.

The checkpointed completion reason is preserved during replay,
so existing executions are unaffected. Code with conditional
logic based on completion_reason might see different values after
this update.

Example: With 3 items (1 success, 2 failures) and tolerated_failure_count=1:
- Before: ALL_COMPLETED (incorrect - all items finished)
- After: FAILURE_TOLERANCE_EXCEEDED (correct - tolerance breached)

Changes:
- Extract completion reason logic to _get_completion_reason() method
- Check failure tolerance BEFORE checking if all completed
- Implement proper fail-fast when no completion config provided
- Add comprehensive unit tests covering all edge cases

This ensures tolerance checks take precedence over success criteria,
preventing operations from incorrectly reporting ALL_COMPLETED when
failure tolerance has been exceeded.

closes #280
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: completion_reason add tolerance checks

1 participant