Skip to content

[Bug]: completion_reason add tolerance checks #280

@yaythomas

Description

@yaythomas

Expected Behavior

The Python SDK's completion reason logic for map() and parallel() operations does not match the TypeScript SDK implementation, leading to surprising CompletionReason values in BatchResult.

TypeScript implementation here: https://github.com/aws/aws-durable-execution-sdk-js/blob/64fc8752cc099dc96b45dbe96de69dd770c0bf50/packages/aws-durable-execution-sdk-js/src/handlers/concurrent-execution-handler/concurrent-execution-handler.ts#L41-L86

Based on the TypeScript SDK reference implementation (concurrent-execution-handler.ts lines 41-86), the completion reason should be determined using this logic:

Check failure tolerance FIRST (before checking if all completed):

If no completion_config exists: return FAILURE_TOLERANCE_EXCEEDED if any failures
If completion_config exists but has no criteria defined: return FAILURE_TOLERANCE_EXCEEDED if any failures
If toleratedFailureCount is set and failed_count > toleratedFailureCount: return FAILURE_TOLERANCE_EXCEEDED
If toleratedFailurePercentage is set and failure percentage exceeds it: return FAILURE_TOLERANCE_EXCEEDED
Then check if all items completed: return ALL_COMPLETED

Then check if minSuccessful threshold met: return MIN_SUCCESSFUL_REACHED

Default: return ALL_COMPLETED

# Scenario 1: No completion config with 1 failure
result = context.map(items, func, config=None)
# Expected: result.completion_reason == CompletionReason.FAILURE_TOLERANCE_EXCEEDED

# Scenario 2: Empty completion config with 1 failure
result = context.map(items, func, config=MapConfig())
# Expected: result.completion_reason == CompletionReason.FAILURE_TOLERANCE_EXCEEDED

# Scenario 3: toleratedFailureCount exceeded
result = context.map(items, func, config=MapConfig(
    completion_config=CompletionConfig(tolerated_failure_count=0)
))
# With 1 failure
# Expected: result.completion_reason == CompletionReason.FAILURE_TOLERANCE_EXCEEDED

# Scenario 4: All items completed successfully
result = context.map(items, func, config=MapConfig(
    completion_config=CompletionConfig(min_successful=2)
))
# With 3 successes, 0 failures, all 3 items completed
# Expected: result.completion_reason == CompletionReason.ALL_COMPLETED

# Scenario 5: minSuccessful reached with incomplete items
result = context.map(items, func, config=MapConfig(
    completion_config=CompletionConfig(min_successful=2)
))
# With 2 successes, 0 failures, 2 of 5 items completed
# Expected: result.completion_reason == CompletionReason.MIN_SUCCESSFUL_REACHED

Actual Behavior

The current Python implementation uses older logic which has been superseded.

  1. ❌ Does not check toleratedFailureCount or toleratedFailurePercentage
  2. ❌ Checks "all completed" before checking tolerance thresholds (wrong order)
  3. ❌ No fail-fast logic when completion_config is None or empty
  4. ❌ Defaults to FAILURE_TOLERANCE_EXCEEDED instead of ALL_COMPLETED
# Scenario: All items completed with 1 failure and minSuccessful=2
result = context.map([item1, item2, item3], func, config=MapConfig(
    completion_config=CompletionConfig(min_successful=2)
))
# Result: 2 successes, 1 failure, all 3 completed

# Actual: result.completion_reason == CompletionReason.ALL_COMPLETED
# Expected: result.completion_reason == CompletionReason.ALL_COMPLETED
# ✅ This case works correctly

# Scenario: No completion config with 1 failure
result = context.map([item1, item2], func, config=None)
# Result: 1 success, 1 failure, all 2 completed

# Actual: result.completion_reason == CompletionReason.ALL_COMPLETED
# Expected: result.completion_reason == CompletionReason.FAILURE_TOLERANCE_EXCEEDED
# ❌ WRONG - should fail-fast when no config provided

Steps to Reproduce

from aws_durable_execution_sdk_python import durable_execution, DurableContext
from aws_durable_execution_sdk_python.concurrency import MapConfig, CompletionConfig

@durable_execution
def handler(event: dict, context: DurableContext) -> dict:
    items = [1, 2, 3]
    
    def process_item(ctx: DurableContext, item: int, index: int):
        if item == 2:
            raise Exception("Item 2 failed")
        return item * 2
    
    # Test Case 1: No completion config (should fail-fast)
    result = context.map(
        items=items,
        func=process_item,
        config=None,  # No config
        name="test-map"
    )
    
    print(f"Completion reason: {result.completion_reason}")
    # Actual: ALL_COMPLETED
    # Expected: FAILURE_TOLERANCE_EXCEEDED
    
    return {"completion_reason": result.completion_reason}

vs ts

import { withDurableExecution, DurableContext } from '@aws/durable-execution-sdk-js';

export const handler = withDurableExecution(async (event, context: DurableContext) => {
  const items = [1, 2, 3];
  
  const result = await context.map(
    'test-map',
    items,
    async (ctx, item, index) => {
      if (item === 2) throw new Error('Item 2 failed');
      return item * 2;
    }
    // No config provided
  );
  
  console.log(`Completion reason: ${result.completionReason}`);
  // Returns: FAILURE_TOLERANCE_EXCEEDED (correct)
  
  return { completionReason: result.completionReason };
});

SDK Version

1.1.2

Python Version

3.14

Is this a regression?

No

Last Working Version

No response

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions