Skip to content

Conversation

@blublinsky
Copy link
Contributor

@blublinsky blublinsky commented Jan 30, 2026

Description

This PR achieves 3 main things:

  1. Fixed headers definition for llama stack 0.4.x, which introduced a new MCP parameter - authorization
  2. Add logging to simplify MCP execution debugging
  3. Add a new e2e test for MCP servers

This PR turned out much larger then expected because of 2 issues

Root Cause 1 - Streaming Response Bug: Llama Stack's MCP tool execution uses streaming responses (Server-Sent Events), which exposed critical bugs in FastAPI's BaseHTTPMiddleware - specifically the RuntimeError: "No response returned" error that occurs when middleware tries to handle streaming endpoints.
Root Cause 2 - MCP Cleanup & Connection Management: MCP server connections and LLM streaming calls need to be properly closed AFTER the response is fully streamed, but we also need to persist conversation data to the database without blocking the stream or delaying the client.

The Fix: Required a complete architectural change:

  1. Pure ASGI middleware - Rewrote from decorator-based (@app.middleware("http")) to pure ASGI classes that directly implement the ASGI protocol, bypassing the buggy BaseHTTPMiddleware entirely
  2. Background tasks - Moved database operations and MCP cleanup to asyncio.create_task() so they run AFTER streaming completes, without blocking the response
  3. Thread-safe DB operations - Used asyncio.to_thread() to run blocking database writes in a separate thread
  4. Integration test fixes - Tests now properly wait for background tasks and mock asyncio.to_thread() to use the test database

In short: MCP streaming responses + required cleanup + database persistence = complete architectural change from decorator-based to ASGI middleware + async background tasks for DB writes and MCP cleanup.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude
  • Generated by: N/A

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • JSON‑RPC MCP support (initialize, tools/list, tools/call) with simulated tool execution and multiple auth modes (file, Kubernetes, client); conversation cache (SQLite).
  • Improvements

    • Detached background tasks for async persistence and streaming cleanup; ASGI middleware for metrics and global exception handling; richer MCP/RPC logging; container startup writes test token and MCP mock-server port remapped.
  • Tests

    • Large new E2E MCP suite, many step helpers, and test updates for background-task synchronization.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

Warning

Rate limit exceeded

@blublinsky has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 34 minutes and 38 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Mock MCP server changed to JSON‑RPC (initialize, tools/list, tools/call) with error-mode and /debug/clear; app now records per-server authorization, schedules detached background tasks for cleanup and persistence, replaces REST middleware with ASGI middleware, and adds extensive MCP E2E tests and configs.

Changes

Cohort / File(s) Summary
MCP Mock Server & Docs
dev-tools/mcp-mock-server/server.py, dev-tools/mcp-mock-server/README.md
Rewrote mock server to JSON‑RPC (initialize, tools/list, tools/call); added error-mode token, enhanced request logging (rpc_method, tool_name), and /debug/clear; README updated to document JSON‑RPC methods and mocked tool execution.
Docker / Compose
docker-compose.yaml, docker-compose-library.yaml
Changed mcp-mock-server host mapping from 3000:3000 to 9000:3000; added lightspeed-stack entrypoint to write test token on startup.
Query, Streaming & Background Tasks
src/app/endpoints/query.py, src/app/endpoints/query_v2.py, src/app/endpoints/streaming_query_v2.py
Added detached background task utilities (background_tasks_set, create_background_task) and shifted persistence/cleanup to background; enhanced MCP/header resolution and logging; MCP tools now expose top-level authorization; quota computation guarded when limiters absent.
ASGI Middleware & App Lifespan
src/app/main.py, src/runners/uvicorn.py
Replaced REST middleware with ASGI MetricsMiddleware and ExceptionMiddleware; moved global exception handling to ASGI layer; startup logs MCP config and initializes tables; shutdown awaits pending background tasks.
Configs: MCP Test Modes
tests/e2e/configuration/*lightspeed-stack*.yaml, dev-tools/test-configs/mcp-mock-test-noop.yaml
Added MCP-specific configs (library/server modes) and conversation_cache; introduced MCP test servers demonstrating file-token, kubernetes, and client-provided Authorization strategies.
E2E Features & Steps
tests/e2e/features/mcp_tools.feature, tests/e2e/features/steps/mcp.py, tests/e2e/features/environment.py, tests/e2e/test_list.txt
Added comprehensive MCP E2E feature and many Behave steps covering header propagation, discovery, tool execution (streaming, multi-tool, error-mode), mock-server log inspection, and test-config switching.
E2E Utilities & Polling
tests/e2e/features/steps/conversation.py, tests/e2e/features/steps/feedback.py, tests/e2e/features/steps/llm_query_response.py, tests/e2e/features/steps/common.py
Added exponential-backoff polling helpers and test-isolation resets to wait for background persistence and avoid cross-scenario leakage.
Tests: Unit & Integration Updates
tests/unit/app/endpoints/test_query_v2.py, tests/unit/app/endpoints/test_streaming_query_v2.py, tests/integration/endpoints/test_query_v2_integration.py, tests/unit/app/test_main_middleware.py
Updated tests to expect top-level authorization instead of headers; streaming/unit tests adjusted to wait for background cleanup; integration tests patch asyncio.to_thread and add wait helpers; removed obsolete middleware unit tests.
Dev/Test Config Additions
dev-tools/test-configs/mcp-mock-test-noop.yaml, tests/e2e/configuration/*lightspeed-stack-mcp*.yaml
New test configs for MCP scenarios specifying auth sources and behaviors (file, kubernetes, client).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant App as Query Endpoint
    participant MCP as MCP Mock Server
    participant LLM as Llama Stack API

    Client->>App: POST query (may include MCP-HEADERS)
    App->>MCP: JSON-RPC initialize (any POST path)
    MCP-->>App: initialize response (protocolVersion, capabilities)
    App->>MCP: JSON-RPC tools/list
    MCP-->>App: list of tools (tool_name, inputSchema)
    App->>LLM: provide tool definitions (include per-server authorization)
    LLM-->>App: instruct tool call
    App->>MCP: JSON-RPC tools/call (tool_name, args)
    alt error_mode token present
        MCP-->>App: error result
    else
        MCP-->>App: success result (mocked)
    end
    App->>LLM: supply tool result
    LLM-->>App: final response
    App-->>Client: return augmented response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • tisnik
  • eranco74
  • asimurka
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: fixing MCP headers for Llama Stack 0.4.x and adding e2e tests for MCP servers, which aligns with the PR's core objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/mcp.py`:
- Around line 699-713: The step function
check_streaming_response_has_tool_results only checks status_code but doesn't
parse SSE events; update it to decode context.response.content (or iterate
response.iter_lines() if streaming), extract lines starting with "data:",
collect/parses SSE payloads, then assert that at least one event contains
tool-related content (e.g., contains "tool" or expected tool result keys) and
print the count of matching events; keep using context.response and check for
None/status before parsing.
- Around line 268-281: The step function check_service_logs_contain currently
only prints a message and requires manual verification; replace the manual
prints with a deterministic assertion that programmatically fetches and searches
the service logs (e.g., read from a known log file or query the running
service/container logs via context or a helper like get_service_logs) and raise
an AssertionError if log_fragment is not present; update the
check_service_logs_contain implementation to obtain logs from the test context
(or a utility function), search for log_fragment, and assert presence with a
clear failure message so the test becomes automated and deterministic.
- Around line 428-445: The step function check_tools_list_calls currently only
confirms the mock server is reachable but does not assert any requests contained
the "tools/list" method; either implement a real inspection of request bodies by
fetching the debug request list from the mock server (use the requests.get to
retrieve the full payloads from mock_server_url+"/debug/requests", iterate the
returned requests and assert any body or JSON contains "tools/list") or
explicitly document this limitation in the feature file/tests rather than
silently passing; update check_tools_list_calls to perform the body/JSON check
or adjust the feature to note the verification is not possible without
request-body logging.
- Around line 253-266: The step function check_mcp_tool_in_log currently only
prints and must instead retrieve the MCP mock server request log from the test
context (e.g., context.mcp_request_log or via the MCP test client available in
context), scan the recorded requests for the expected tool_name by inspecting
the Authorization header or the response body (tools/list) entries, and assert
that at least one request contains the expected tool_name; update the function
to raise an assertion (or use behave's assert) with a clear failure message if
no matching entry is found so the test actually fails when the tool is missing.
- Around line 531-541: The configure_mock_server_errors step currently only sets
context.expect_tool_errors and prints a placeholder but the mock server and
subsequent assertions never consume that flag; implement real error-mode support
by adding an error injection mechanism in the MCP mock server (e.g., a query
param/header or a toggle handler) that, when enabled, causes tool execution
endpoints used by tests to return deterministic error responses; update the
configure_mock_server_errors function to flip that toggle on (and ensure it
resets after tests), ensure the flag/context value is actually passed to
requests sent by tests (or the mock reads the toggle) so the mock returns
errors, and implement the corresponding step definitions for "The response
should indicate tool execution failed" and "The service logs should contain tool
failure information" to assert the error payload and log entries are present
(reference configure_mock_server_errors and the failing test scenario names to
locate where to wire the toggle and assertions).
🧹 Nitpick comments (9)
dev-tools/mcp-mock-server/server.py (1)

149-154: Consider consistent truncation handling.

The auth preview truncation adds "..." unconditionally in the result text, even when the header is shorter than 50 characters (line 154 always appends "...").

Minor fix for truncation suffix
-                auth_preview = (
-                    auth_header[:50] if len(auth_header) > 50 else auth_header
-                )
-                result_text = (
-                    f"Mock tool '{tool_called}' executed successfully "
-                    f"with arguments: {arguments}. Auth used: {auth_preview}..."
-                )
+                if len(auth_header) > 50:
+                    auth_preview = f"{auth_header[:50]}..."
+                else:
+                    auth_preview = auth_header
+                result_text = (
+                    f"Mock tool '{tool_called}' executed successfully "
+                    f"with arguments: {arguments}. Auth used: {auth_preview}"
+                )
src/app/endpoints/query_v2.py (2)

1040-1044: Consider using debug level for per-request tool preparation.

The logger.info for prepared MCP tools will emit a log line for every request. For high-traffic scenarios, this could be verbose. Consider using logger.debug for the per-request detail and keeping info for startup/configuration logging only.

Use debug level for per-request logging
-        logger.info(
+        logger.debug(
             "Prepared %d MCP tool(s) for request: %s",
             len(mcp_tools),
             ", ".join(mcp_server_names),
         )

1064-1070: Summary logging at info level may be too verbose for production.

Similar to the above, the summary log at info level will emit for every request. Consider debug level or gating behind a more restrictive condition.

Consider debug level for per-request summary
-    if toolgroups:
-        logger.info(
+    if toolgroups:
+        logger.debug(
             "Prepared %d total tool(s) for Responses API: %d RAG + %d MCP",
             len(toolgroups),
             len(rag_tools) if rag_tools else 0,
             len(mcp_tools) if mcp_tools else 0,
         )
tests/e2e/features/mcp_tools.feature (2)

20-25: Consider adding MCP-specific assertions to the info endpoint test.

This scenario verifies only that the info endpoint returns 200. To validate MCP configuration is reported correctly, consider adding a step that checks for MCP-related fields in the response body.


74-74: Magic number for request count may be fragile.

The assertion at least 6 requests assumes a specific number of MCP protocol exchanges. If the mock server or protocol changes, this could break. Consider documenting why 6 is expected (e.g., 2 requests × 3 servers = 6) or using a more resilient assertion.

tests/e2e/features/steps/mcp.py (4)

39-39: Extract mock server URL to a constant or context variable.

The mock server URL http://localhost:9000 is hardcoded in multiple places throughout this file. Consider extracting it to a module-level constant or storing it in the context for easier configuration and maintenance.

♻️ Suggested approach
+MCP_MOCK_SERVER_URL = "http://localhost:9000"
+
+
 `@given`("The MCP mock server request log is cleared")
 def clear_mcp_mock_server_log(context: Context) -> None:
     ...
-    mock_server_url = "http://localhost:9000"
+    mock_server_url = getattr(context, "mcp_mock_server_url", MCP_MOCK_SERVER_URL)

Apply similarly to all other occurrences in this file.


88-98: Fixed sleep is fragile for synchronization.

Using time.sleep(2) for waiting on async operations can lead to flaky tests if the server is slow, or wasted time if it's fast. Consider polling the mock server's request count with a timeout instead.

♻️ Suggested polling approach
 `@when`("I wait for MCP server to receive requests")
 def wait_for_mcp_requests(context: Context) -> None:
-    # Wait for requests to be processed
-    time.sleep(2)
-    print("⏱️  Waited for MCP server to process requests")
+    """Poll MCP mock server until requests are received or timeout."""
+    mock_server_url = "http://localhost:9000"
+    timeout = 10
+    poll_interval = 0.5
+    elapsed = 0
+    
+    while elapsed < timeout:
+        try:
+            response = requests.get(f"{mock_server_url}/debug/requests", timeout=2)
+            if response.status_code == 200 and len(response.json()) > 0:
+                print(f"⏱️  MCP server received requests after {elapsed:.1f}s")
+                return
+        except requests.RequestException:
+            pass
+        time.sleep(poll_interval)
+        elapsed += poll_interval
+    
+    print("⚠️  Timeout waiting for MCP server requests")

100-151: Consider extracting a helper for fetching mock server requests.

Both check_mcp_server_received_requests and check_mcp_server_request_count share the same pattern for fetching and parsing the debug requests. A helper function would reduce duplication.

♻️ Suggested helper
def _get_mock_server_requests(mock_server_url: str = "http://localhost:9000") -> list:
    """Fetch requests log from MCP mock server."""
    response = requests.get(f"{mock_server_url}/debug/requests", timeout=5)
    if response.status_code != 200:
        raise AssertionError(f"Failed to get debug requests: {response.status_code}")
    return response.json()

153-251: Three auth header verification functions share nearly identical logic.

check_file_auth_header, check_k8s_auth_header, and check_client_auth_header have the same structure with minor variations (exact match vs. contains). Consider consolidating into a reusable helper to reduce duplication.

Comment on lines +699 to +861
@then("The streaming response should contain tool execution results")
def check_streaming_response_has_tool_results(context: Context) -> None:
"""Verify streaming response contains tool execution results.

Parameters:
context (Context): Behave context.
"""
assert context.response is not None, "No response received"
assert (
context.response.status_code == 200
), f"Bad status: {context.response.status_code}"

# For streaming responses, we'd need to parse SSE events
# For now, just verify we got a successful response
print("✅ Streaming response contains tool execution results")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Streaming response verification doesn't parse SSE events.

The step acknowledges that SSE events should be parsed to verify tool execution results but only checks the status code. This means the assertion "streaming response contains tool execution results" is not actually verified.

💡 Consider implementing SSE parsing
`@then`("The streaming response should contain tool execution results")
def check_streaming_response_has_tool_results(context: Context) -> None:
    assert context.response is not None, "No response received"
    assert context.response.status_code == 200
    
    # Parse SSE events from streaming response
    content = context.response.content.decode("utf-8")
    events = [line for line in content.split("\n") if line.startswith("data:")]
    
    # Verify at least one event contains tool-related content
    tool_events = [e for e in events if "tool" in e.lower()]
    assert len(tool_events) > 0, "No tool execution events found in streaming response"
    print(f"✅ Found {len(tool_events)} tool-related events in streaming response")
🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/mcp.py` around lines 699 - 713, The step function
check_streaming_response_has_tool_results only checks status_code but doesn't
parse SSE events; update it to decode context.response.content (or iterate
response.iter_lines() if streaming), extract lines starting with "data:",
collect/parses SSE payloads, then assert that at least one event contains
tool-related content (e.g., contains "tool" or expected tool result keys) and
print the count of matching events; keep using context.response and check for
None/status before parsing.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • please add Jira issue into title (see contribution guide)
  • it changes the LCORE code too, so the title is improper and wrong people will review it
  • e2e part will be reviewed by QA

Copy link
Contributor

@jrobertboos jrobertboos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I think there are some more test scenarios that can be added but for now really good job getting the e2e tests for MCP started :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job adding all the logs!

Then The status code of the response is 200

# ============================================================================
# Authentication Methods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have more tests for incorrect tokens and make sure they return correct errors (e.g. 401). This can also be implemented later.

@blublinsky blublinsky changed the title adding additional e2e tests for mcp servers fixing MCP headers usag for Llama stack 0.4.x, adding additional e2e tests for mcp servers Jan 30, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/main.py (1)

120-125: ⚠️ Potential issue | 🟠 Major

Fix CORS allow_methods (currently bound to headers).

Line 124 sets allow_methods=cors.allow_headers, which breaks CORS preflight checks by exposing the headers list instead of the methods list. This should be cors.allow_methods.

🔧 Suggested fix
 app.add_middleware(
     CORSMiddleware,
     allow_origins=cors.allow_origins,
     allow_credentials=cors.allow_credentials,
-    allow_methods=cors.allow_headers,
+    allow_methods=cors.allow_methods,
     allow_headers=cors.allow_headers,
 )
🤖 Fix all issues with AI agents
In `@src/app/main.py`:
- Around line 269-308: The exception handler in __call__ should track whether
the application has already sent "http.response.start" to avoid sending a second
start and violating ASGI; wrap the send callable (e.g., create a local
wrapped_send that sets response_started = True when it sees message["type"] ==
"http.response.start" and forwards messages to the original send) and use that
wrapped_send when awaiting self.app(scope, receive, wrapped_send); then in the
except block check response_started and if True return early (skip sending
headers/body), otherwise proceed to send the InternalServerErrorResponse
start/body using the original send path (still using JSONResponse and
InternalServerErrorResponse.generic()).

In `@tests/e2e/features/steps/mcp.py`:
- Around line 31-48: The clear_mcp_mock_server_log step currently swallows
failures via print statements which can leave stale data; change function
clear_mcp_mock_server_log to fail fast by replacing the print warnings with
assertions or raised exceptions: after calling
requests.get(f"{MOCK_MCP_SERVER_URL}/debug/clear", timeout=2) assert
response.status_code == 200 (or call response.raise_for_status()) and include
the response content in the assertion message, and in the except
requests.RequestException block re-raise an AssertionError (or let the original
exception propagate) with a clear message including the caught exception so the
test setup fails immediately when the clear endpoint cannot be reached or
returns a non-200 result.
- Around line 533-548: The test check_tools_call_method currently only asserts
that MCP returned some requests; update it to assert that at least one logged
request corresponds to the tools/call endpoint by inspecting the parsed
requests_log (from response.json()) and searching entries for a path or method
field that equals or contains "tools/call"; add an assertion like "No tools/call
request received by MCP server" if none found and keep using mock_server_url and
the /debug/requests endpoint to fetch the log.
🧹 Nitpick comments (8)
src/app/endpoints/query_v2.py (2)

1043-1048: Consider using logger.debug instead of logger.info for MCP tool preparation.

The log at line 1044 uses logger.info for routine MCP tool preparation details. Per coding guidelines, info is for general execution milestones while debug is for diagnostic info. Since this is per-request diagnostic information rather than a significant operational milestone, debug may be more appropriate to avoid log noise in production.

Proposed change
-        logger.info(
+        logger.debug(
             "Prepared %d MCP tool(s) for request: %s",
             len(mcp_tools),
             ", ".join(mcp_server_names),
         )

1068-1074: Same consideration for summary log - use debug for per-request diagnostics.

This summary log (line 1069) also uses logger.info for per-request tool counts. Consider debug level for consistency with the diagnostic nature of this information.

Proposed change
-        logger.info(
+        logger.debug(
             "Prepared %d total tool(s) for Responses API: %d RAG + %d MCP",
             len(toolgroups),
             len(rag_tools) if rag_tools else 0,
             len(mcp_tools) if mcp_tools else 0,
         )
src/app/endpoints/query.py (3)

81-106: Background task pattern is well-designed; minor logging format issue.

The pattern correctly:

  • Maintains strong references to prevent GC
  • Auto-removes completed tasks via done callback
  • Handles task creation errors

However, lines 103 and 106 use f-strings instead of %-formatting for logging, which bypasses lazy evaluation:

Use %-style formatting for logger calls
-        logger.debug(
-            f"Background task created, active tasks: {len(background_tasks_set)}"
-        )
+        logger.debug(
+            "Background task created, active tasks: %d", len(background_tasks_set)
+        )
     except Exception as e:
-        logger.error(f"Failed to create background task: {e}", exc_info=True)
+        logger.error("Failed to create background task: %s", e, exc_info=True)

415-432: Background persistence approach is sound; consider extracting the delay constant.

The 500ms delay for MCP session cleanup is a reasonable heuristic. Consider extracting it as a named constant for clarity and easier tuning:

Extract delay constant
+# Delay before background task runs to allow MCP session cleanup
+MCP_CLEANUP_DELAY_SECONDS = 0.5
+
 async def persist_with_topic_summary() -> None:
     ...
     logger.debug("Background task: waiting for MCP cleanup")
-    await asyncio.sleep(0.5)  # 500ms should be enough for cleanup
+    await asyncio.sleep(MCP_CLEANUP_DELAY_SECONDS)
     logger.debug("Background task: MCP cleanup complete")

467-469: Broad exception handling is acceptable here but loses stack trace.

For background task resilience, catching all exceptions is reasonable. However, consider adding exc_info=True to preserve the stack trace for debugging:

Add exc_info for better debugging
             except Exception as e:  # pylint: disable=broad-exception-caught
-                    logger.error("Failed to generate topic summary: %s", e)
+                    logger.error("Failed to generate topic summary: %s", e, exc_info=True)
                     topic_summary = None
src/app/endpoints/streaming_query_v2.py (1)

80-93: Code duplication: create_background_task differs from query.py version.

This implementation lacks error handling and logging present in src/app/endpoints/query.py (lines 87-106). The query.py version has:

  • Try/except around task creation
  • Debug logging for task count
  • Error logging on failure

Consider extracting to a shared utility module for consistency, or at minimum align the implementations.

Align with query.py implementation
 def create_background_task(coro: Any) -> None:
     """Create a detached background task with strong reference.
 
     Args:
         coro: Coroutine to run as background task
     """
-    task = asyncio.create_task(coro)
-    background_tasks_set.add(task)
-    task.add_done_callback(background_tasks_set.discard)
+    try:
+        task = asyncio.create_task(coro)
+        background_tasks_set.add(task)
+        task.add_done_callback(background_tasks_set.discard)
+        logger.debug(
+            "Background task created, active tasks: %d", len(background_tasks_set)
+        )
+    except Exception as e:
+        logger.error("Failed to create background task: %s", e, exc_info=True)
tests/unit/app/endpoints/test_streaming_query_v2.py (1)

226-227: Timing-based test synchronization may be flaky.

The 0.7s sleep waits for the 0.5s background task delay plus 0.2s buffer. This could be flaky on slow CI systems. Consider:

  1. Increasing the buffer margin (e.g., 1.0s total)
  2. Mocking asyncio.sleep to make tests deterministic
  3. Using an event or condition for synchronization
Option 1: Increase buffer for CI reliability
     # Wait for background cleanup task to complete (has 0.5s delay)
-    await asyncio.sleep(0.7)
+    await asyncio.sleep(1.0)  # 0.5s delay + 0.5s buffer for slow CI
src/app/main.py (1)

4-6: Introduce ASGI type aliases to reduce repeated complex signatures.

This improves readability and aligns with module-level alias guidance.

♻️ Suggested refactor
 from collections.abc import Awaitable, Callable
 from contextlib import asynccontextmanager
 from typing import Any, AsyncIterator

+ASGIScope = dict[str, Any]
+ASGIReceive = Callable[[], Awaitable[dict[str, Any]]]
+ASGISend = Callable[[dict[str, Any]], Awaitable[None]]
+
@@
     async def __call__(
         self,
-        scope: dict[str, Any],
-        receive: Callable[[], Awaitable[dict[str, Any]]],
-        send: Callable[[dict[str, Any]], Awaitable[None]],
+        scope: ASGIScope,
+        receive: ASGIReceive,
+        send: ASGISend,
     ) -> None:
@@
     async def __call__(
         self,
-        scope: dict[str, Any],
-        receive: Callable[[], Awaitable[dict[str, Any]]],
-        send: Callable[[dict[str, Any]], Awaitable[None]],
+        scope: ASGIScope,
+        receive: ASGIReceive,
+        send: ASGISend,
     ) -> None:
As per coding guidelines: Define type aliases at module level for clarity in Python files.

Also applies to: 196-201, 269-273

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/main.py (1)

120-125: ⚠️ Potential issue | 🟠 Major

CORS allow_methods is wired to headers instead of methods.

Line 124 incorrectly assigns cors.allow_headers to allow_methods, which breaks CORS preflight behavior by sending header names instead of HTTP method names. The CORSConfiguration class defines both allow_methods (for HTTP methods like GET, POST) and allow_headers (for header names like Content-Type, Authorization).

🔧 Suggested fix
 app.add_middleware(
     CORSMiddleware,
     allow_origins=cors.allow_origins,
     allow_credentials=cors.allow_credentials,
-    allow_methods=cors.allow_headers,
+    allow_methods=cors.allow_methods,
     allow_headers=cors.allow_headers,
 )
🤖 Fix all issues with AI agents
In `@src/app/endpoints/query_v2.py`:
- Around line 905-960: The comparison that determines missing auth headers is
case-sensitive and will treat "authorization" vs "Authorization" as different;
update the resolution check to be case-insensitive by normalizing header names
(e.g., lowercasing) before comparing: when building required_headers from
mcp_server.authorization_headers.keys(), when creating resolved_headers from
headers.keys(), and when appending the special Authorization resolution (the
local variable authorization), use a consistent case (like .lower()) so
missing_headers is computed with a case-insensitive match for Authorization and
other headers; ensure the logging output still shows original names or a
readable form if needed.

In `@src/app/endpoints/query.py`:
- Around line 446-459: The conversation existence check uses conversation_id
directly, which may include a "conv_" prefix while the DB stores normalized IDs;
update the logic so check_conversation_exists normalizes conversation_id before
querying (e.g., strip a "conv_" prefix or run the same normalization used when
storing IDs) and then query UserConversation with the normalized id; ensure the
same normalization function or rules are applied (call or replicate
normalization used elsewhere) and keep the await
asyncio.to_thread(check_conversation_exists) usage intact so the DB query still
runs in the thread pool.

In `@tests/e2e/features/mcp_tools.feature`:
- Around line 110-117: Update the system_in_default_state step in
tests/e2e/features/steps/common.py to fully reset the MCP error mode by setting
context.expect_tool_errors = False (in addition to resetting
context.auth_headers). Locate the step implementation named
system_in_default_state and add the assignment to ensure the flag cleared
between scenarios so MCP mock error behavior does not leak across tests.
🧹 Nitpick comments (4)
src/app/endpoints/streaming_query_v2.py (1)

85-94: Add error handling/logging when creating background tasks.

This keeps parity with the non‑streaming path and avoids silent failures if task creation fails.

♻️ Suggested update
 def create_background_task(coro: Any) -> None:
     """Create a detached background task with strong reference.

     Args:
         coro: Coroutine to run as background task
     """
-    task = asyncio.create_task(coro)
-    background_tasks_set.add(task)
-    task.add_done_callback(background_tasks_set.discard)
+    try:
+        task = asyncio.create_task(coro)
+        background_tasks_set.add(task)
+        task.add_done_callback(background_tasks_set.discard)
+        logger.debug("Background task created, active tasks: %d", len(background_tasks_set))
+    except Exception as e:  # pylint: disable=broad-exception-caught
+        logger.error("Failed to create background task: %s", e, exc_info=True)
tests/integration/endpoints/test_query_v2_integration.py (1)

36-59: Avoid fixed sleeps; poll background tasks with a timeout to reduce flakiness.

This keeps tests stable under variable CI load.

♻️ Suggested update
-    # Wait for the initial 500ms delay + buffer
-    await asyncio.sleep(0.6)
-
-    # Wait for any remaining background tasks to complete
-    # pylint: disable=import-outside-toplevel
-    from app.endpoints.query import background_tasks_set
-
-    if background_tasks_set:
-        await asyncio.gather(*background_tasks_set, return_exceptions=True)
-
-    # Give thread pool operations extra time to complete
-    await asyncio.sleep(0.2)
+    # Wait for the initial 500ms delay + buffer
+    await asyncio.sleep(0.6)
+
+    # Wait for any remaining background tasks to complete (bounded)
+    # pylint: disable=import-outside-toplevel
+    from app.endpoints.query import background_tasks_set
+
+    if background_tasks_set:
+        timeout = 3.0
+        loop = asyncio.get_running_loop()
+        start = loop.time()
+        while background_tasks_set and (loop.time() - start) < timeout:
+            await asyncio.sleep(0.05)
+        await asyncio.gather(*background_tasks_set, return_exceptions=True)
tests/e2e/features/mcp_tools.feature (2)

20-24: Scenario assertions may be incomplete for its stated purpose.

The scenario is named "Service reports MCP configuration correctly" but only asserts the HTTP status code. Consider adding assertions to validate the actual MCP configuration content in the response (e.g., verifying MCP servers are listed, their auth types, etc.) to match what the scenario name implies.

💡 Suggested additional assertions
   Scenario: Service reports MCP configuration correctly
     Given The system is in default state
       And I set the Authorization header to Bearer test-token
      When I access REST API endpoint "info" using HTTP GET method
      Then The status code of the response is 200
+      And The response should contain MCP configuration
+      And The response should list configured MCP servers

48-56: Consider adding explicit token verification for client-provided auth.

For consistency with the file-based (line 37) and Kubernetes (line 46) scenarios, this scenario could benefit from an explicit assertion verifying the captured Authorization header from the client-auth server, rather than only checking which tools were called.

💡 Suggested additional assertion
      Then The MCP mock server should have received requests
       And The MCP mock server request log should contain exactly tools mock_tool_file, mock_tool_k8s, mock_tool_client
+      And The MCP mock server should have captured Authorization header containing "client-test-token" from mock-client-auth server

Comment on lines +446 to +455
def check_conversation_exists() -> bool:
"""Check if conversation exists in database (runs in thread pool)."""
with get_session() as session:
existing = (
session.query(UserConversation)
.filter_by(id=conversation_id)
.first()
)
return existing is not None

# Run database check in thread pool to avoid connection issues
conversation_exists = await asyncio.to_thread(
check_conversation_exists
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Normalize conversation IDs before the existence check.

conversation_id may include a conv_ prefix, while the DB stores normalized IDs. This check can misclassify existing conversations and trigger unnecessary topic-summary generation.

🔧 Suggested fix
                     def check_conversation_exists() -> bool:
                         """Check if conversation exists in database (runs in thread pool)."""
+                        normalized_id = normalize_conversation_id(conversation_id)
                         with get_session() as session:
                             existing = (
                                 session.query(UserConversation)
-                                .filter_by(id=conversation_id)
+                                .filter_by(id=normalized_id)
                                 .first()
                             )
                             return existing is not None
🤖 Prompt for AI Agents
In `@src/app/endpoints/query.py` around lines 446 - 459, The conversation
existence check uses conversation_id directly, which may include a "conv_"
prefix while the DB stores normalized IDs; update the logic so
check_conversation_exists normalizes conversation_id before querying (e.g.,
strip a "conv_" prefix or run the same normalization used when storing IDs) and
then query UserConversation with the normalized id; ensure the same
normalization function or rules are applied (call or replicate normalization
used elsewhere) and keep the await asyncio.to_thread(check_conversation_exists)
usage intact so the DB query still runs in the thread pool.

@blublinsky blublinsky force-pushed the mcp-e2e-tests branch 2 times, most recently from 3c4e8a9 to a362b01 Compare February 1, 2026 15:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/main.py (1)

120-125: ⚠️ Potential issue | 🟠 Major

Fix CORS allow_methods configuration parameter.

The allow_methods parameter expects HTTP verbs (e.g., GET, POST, PUT), not header names. Currently it's receiving cors.allow_headers, which is semantically incorrect. Use cors.allow_methods instead to ensure proper CORS preflight handling, especially when custom methods are configured.

🛠 Suggested fix
-    allow_methods=cors.allow_headers,
+    allow_methods=cors.allow_methods,
🤖 Fix all issues with AI agents
In `@src/app/endpoints/query_v2.py`:
- Around line 932-938: The debug log in the failed header resolution block
currently prints the raw header value (variable value) which may leak secrets;
update the logger.debug call in query_v2.py (the block that references
mcp_server.name, name, value) to avoid emitting the raw value—either remove the
value from the message and log only mcp_server.name and name (header name/type),
or replace value with a redacted/masked placeholder (e.g. "<redacted>" or a
short hash) before logging; ensure you update only the logger.debug invocation
in that resolution failure path.
🧹 Nitpick comments (1)
docker-compose.yaml (1)

87-91: Consider parameterizing the test token value.

The entrypoint correctly chains commands with && to ensure the Python script only executes after the token file is successfully created. Since this compose file is clearly for E2E testing, the hardcoded token is acceptable.

For additional flexibility (e.g., testing different token values or scenarios), you could parameterize the token via an environment variable:

♻️ Optional: Parameterize the test token
     environment:
       - OPENAI_API_KEY=${OPENAI_API_KEY}
       # Azure Entra ID credentials (AZURE_API_KEY is obtained dynamically)
       - TENANT_ID=${TENANT_ID:-}
       - CLIENT_ID=${CLIENT_ID:-}
       - CLIENT_SECRET=${CLIENT_SECRET:-}
+      - MCP_TEST_TOKEN=${MCP_TEST_TOKEN:-test-secret-token-123}
     entrypoint: >
       /bin/bash -c "
-      echo 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
+      echo \"$MCP_TEST_TOKEN\" > /tmp/lightspeed-mcp-test-token &&
       python3.12 /app-root/src/lightspeed_stack.py
       "

Comment on lines +932 to +891
# Log failed resolution
logger.debug(
"MCP server '%s': Header '%s' -> FAILED to resolve (value was: %s)",
mcp_server.name,
name,
value,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging raw auth header values.

On failed resolution, the log prints value, which can be a static secret. This leaks credentials into debug logs; log only the header name/type or redact.

🛡 Suggested fix
-                logger.debug(
-                    "MCP server '%s': Header '%s' -> FAILED to resolve (value was: %s)",
-                    mcp_server.name,
-                    name,
-                    value,
-                )
+                logger.debug(
+                    "MCP server '%s': Header '%s' -> FAILED to resolve",
+                    mcp_server.name,
+                    name,
+                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Log failed resolution
logger.debug(
"MCP server '%s': Header '%s' -> FAILED to resolve (value was: %s)",
mcp_server.name,
name,
value,
)
# Log failed resolution
logger.debug(
"MCP server '%s': Header '%s' -> FAILED to resolve",
mcp_server.name,
name,
)
🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 932 - 938, The debug log in the
failed header resolution block currently prints the raw header value (variable
value) which may leak secrets; update the logger.debug call in query_v2.py (the
block that references mcp_server.name, name, value) to avoid emitting the raw
value—either remove the value from the message and log only mcp_server.name and
name (header name/type), or replace value with a redacted/masked placeholder
(e.g. "<redacted>" or a short hash) before logging; ensure you update only the
logger.debug invocation in that resolution failure path.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/app/endpoints/query_v2.py (1)

179-270: ⚠️ Potential issue | 🟠 Major

Avoid logging raw MCP arguments/output content.

The new debug logs print tool arguments and output previews, which can include secrets or PII. Consider redacting values (log only metadata such as sizes/types) or gating this behind an explicit safe-debug flag.

🛡️ Suggested redaction
-        logger.debug("  Arguments: %s", args)
+        logger.debug("  Arguments: <redacted>")
...
-            logger.debug("  Output preview: %s", output_preview)
+            logger.debug("  Output preview: <redacted>")
src/app/endpoints/query.py (1)

372-379: ⚠️ Potential issue | 🟠 Major

Cache is not updated with generated topic_summary; conversation list queries will show empty summary.

Line 378 passes None for topic_summary to the cache immediately. The background task generates the summary and updates the database via persist_user_conversation_details(), but it never calls cache.set_topic_summary(). Since conversation list endpoints read from the cache (e.g., conversations_v2.py:102), users will see an empty topic_summary even after generation completes. Call configuration.conversation_cache.set_topic_summary() in the background task after generating the summary to keep cache and database in sync.

🤖 Fix all issues with AI agents
In `@dev-tools/mcp-mock-server/server.py`:
- Around line 235-238: The /debug/clear handler currently clears request_log but
leaves last_headers set, causing stale data to persist; update the clear branch
that handles "/debug/clear" to also reset the last_headers variable (and any
similar header-tracking state) when calling request_log.clear(), and ensure the
response still uses
self._send_json_response({"status":"cleared","request_count":0}) so
/debug/headers returns an empty/cleared state after the reset.

In `@src/app/main.py`:
- Line 124: The CORS configuration in main.py sets
allow_methods=cors.allow_headers which passes header names where HTTP methods
are expected; update the call that configures CORS (the parameter allow_methods)
to use cors.allow_methods instead of cors.allow_headers so the allowed HTTP
methods (GET, POST, PUT, DELETE, etc.) are provided to the CORS middleware.

In `@tests/e2e/features/steps/mcp.py`:
- Around line 626-639: The test currently sets
context.auth_headers["Authorization"] which the stack may not forward; update
the configure_mock_server_errors function to instead ensure context.mcp_headers
exists and propagate the error-mode Authorization through it (e.g., set
context.mcp_headers["Authorization"] = "Bearer error-mode"), keep setting
context.expect_tool_errors = True, and remove or avoid relying solely on
context.auth_headers so the mock server receives the MCP-HEADERS error token.

src/app/main.py Outdated
allow_origins=cors.allow_origins,
allow_credentials=cors.allow_credentials,
allow_methods=cors.allow_methods,
allow_methods=cors.allow_headers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the file around line 124
find . -type f -name "main.py" | grep -E "src/app" | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 93


🏁 Script executed:

# Read the main.py file to see context around line 124
head -150 src/app/main.py | tail -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 1824


🏁 Script executed:

# Also search for CORS configuration to understand the cors object structure
rg -A 5 -B 5 "cors\s*=" src/app/main.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 365


🏁 Script executed:

# Search for CORS configuration class/model definition
rg -A 20 "class.*[Cc]ors" src/

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Look for cors configuration in the configuration module
find . -type f -name "*.py" -path "*/configuration/*" | head -10

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Search for allow_methods in the codebase to understand what's expected
rg "allow_methods" src/

Repository: lightspeed-core/lightspeed-stack

Length of output: 322


🏁 Script executed:

# Examine the CORS configuration model
rg -B 5 -A 10 "allow_methods" src/models/config.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 1109


🏁 Script executed:

# Get the full CORS configuration class
rg -B 20 -A 30 "class.*Cors" src/models/config.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Let me see the full CORSConfiguration class definition
rg -B 40 "allow_methods.*Field" src/models/config.py | head -60

Repository: lightspeed-core/lightspeed-stack

Length of output: 1538


🌐 Web query:

FastAPI CORSMiddleware allow_methods parameter default behavior HTTP methods

💡 Result:

In FastAPI, CORSMiddleware is Starlette’s CORS middleware, and its allow_methods default is only GET:

  • FastAPI reference signature shows allow_methods=("GET",) by default. [1]
  • Starlette docs confirm allow_methods defaults to ['GET'], and you can use ['*'] to allow all standard methods. [2]

Practical effect: if your browser sends a CORS preflight (OPTIONS with Access-Control-Request-Method: POST/PUT/...) and that method isn’t in allow_methods, the preflight will fail and the browser won’t send the actual request. To allow all methods, set allow_methods=["*"] (or list the exact methods you need). [3]

Sources:
[1] FastAPI middleware reference (CORSMiddleware signature) (fastapi.tiangolo.com)
[2] Starlette middleware docs (CORSMiddleware defaults) (starlette.io)
[3] FastAPI CORS tutorial (using allow_methods=["*"]) (fastapi.tiangolo.com)

Citations:


Fix CORS allow_methods mapping.

Line 124 incorrectly passes cors.allow_headers to the allow_methods parameter. The allow_methods parameter expects HTTP method names (GET, POST, PUT, DELETE, etc.), while allow_headers contains header names (Content-Type, Authorization, etc.). This will cause CORS preflight failures because browsers request allowed methods but receive header names instead, blocking actual requests. Use cors.allow_methods instead.

🔧 Suggested fix
-    allow_methods=cors.allow_headers,
+    allow_methods=cors.allow_methods,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
allow_methods=cors.allow_headers,
allow_methods=cors.allow_methods,
🤖 Prompt for AI Agents
In `@src/app/main.py` at line 124, The CORS configuration in main.py sets
allow_methods=cors.allow_headers which passes header names where HTTP methods
are expected; update the call that configures CORS (the parameter allow_methods)
to use cors.allow_methods instead of cors.allow_headers so the allowed HTTP
methods (GET, POST, PUT, DELETE, etc.) are provided to the CORS middleware.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/app/main.py`:
- Around line 184-244: The code compares scope["path"] (concrete request path)
to app_routes_paths strings, so parameterized routes (e.g. /conversations/{id})
are never matched and using the raw scope path as a metric label risks
cardinality explosion; replace the simple membership check and label usage with
a route match lookup and use the route's template for labels: iterate your
registered routes (instead of comparing path strings), call route.matches(scope)
(or the equivalent match method on your framework's Route object) to find the
matching Route, skip if no match, set a label_path = route.path or
route.template (the template string like "/conversations/{conversation_id}") and
use label_path in metrics.response_duration_seconds.labels(...) and
metrics.rest_api_calls_total.labels(label_path, status_code).inc(); keep
send_wrapper and status capture as-is. Ensure app_routes_paths is the collection
of Route objects (or convert strings to Route objects) so route.matches() and
route.path are available.
🧹 Nitpick comments (2)
tests/integration/endpoints/test_query_v2_integration.py (1)

36-59: Reduce fixed-sleep flakiness in background-task waits.
The hardcoded 0.6s/0.2s sleeps can be brittle under slower CI. Prefer a timeout-based wait on the tracked tasks so tests fail fast if tasks stall.

♻️ Proposed refactor
 async def wait_for_background_tasks() -> None:
     """Wait for background tasks to complete.
@@
-    # Wait for the initial 500ms delay + buffer
-    await asyncio.sleep(0.6)
-
-    # Wait for any remaining background tasks to complete
+    # Wait for any remaining background tasks to complete (with timeout)
     # pylint: disable=import-outside-toplevel
     from app.endpoints.query import background_tasks_set
 
-    if background_tasks_set:
-        await asyncio.gather(*background_tasks_set, return_exceptions=True)
-
-    # Give thread pool operations extra time to complete
-    await asyncio.sleep(0.2)
+    if background_tasks_set:
+        await asyncio.wait_for(
+            asyncio.gather(*background_tasks_set, return_exceptions=True),
+            timeout=2.0,
+        )
tests/unit/app/endpoints/test_streaming_query_v2.py (1)

226-227: Avoid fixed sleep; wait on the tracked cleanup task instead.
This makes the test deterministic and less timing-sensitive.

♻️ Proposed refactor
-from app.endpoints.streaming_query_v2 import (
+from app.endpoints.streaming_query_v2 import (
+    background_tasks_set,
     retrieve_response,
     streaming_query_endpoint_handler_v2,
 )
@@
-    # Wait for background cleanup task to complete (has 0.5s delay)
-    await asyncio.sleep(0.7)
+    # Wait for background cleanup task to complete
+    await asyncio.wait_for(
+        asyncio.gather(*background_tasks_set, return_exceptions=True),
+        timeout=2.0,
+    )

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/app/endpoints/streaming_query_v2.py`:
- Around line 85-94: Wrap the asyncio.create_task call in create_background_task
with a try/except and log any exceptions so task creation failures aren't
silent: call asyncio.create_task(coro) inside a try block, add the resulting
task to background_tasks_set and attach
task.add_done_callback(background_tasks_set.discard) in that try, and in the
except Exception as e block log the error (use the module's logger, e.g.,
logger.exception(...) or process_logger.error(...) if present) including the
exception details; do not change the existing background_tasks_set or callback
names.
🧹 Nitpick comments (7)
tests/e2e/features/steps/common.py (1)

39-41: Consider also resetting expect_tool_errors flag for complete test isolation.

The auth_headers reset is good, but per the past review comment on mcp_tools.feature, the context.expect_tool_errors flag set by "The MCP mock server is configured to return errors" step can also persist across scenarios. For complete test isolation, consider adding:

     # Clear auth headers to ensure clean state between scenarios
     if hasattr(context, "auth_headers"):
         context.auth_headers = {}
+    # Clear MCP error mode flag to ensure clean state between scenarios
+    if hasattr(context, "expect_tool_errors"):
+        context.expect_tool_errors = False

Based on learnings: "In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared."

src/app/endpoints/streaming_query_v2.py (1)

80-94: Consider extracting shared background task infrastructure to a utility module.

The background_tasks_set and create_background_task pattern is duplicated between this file and src/app/endpoints/query.py. Extracting this to a shared utility (e.g., utils/background_tasks.py) would reduce duplication and ensure consistent behavior.

dev-tools/mcp-mock-server/server.py (1)

97-113: Consider using if/elif instead of match True for token matching.

The match True with case guards is an unconventional pattern. While it works, standard if/elif chains are more idiomatic for this type of string matching:

♻️ Alternative with if/elif (optional)
-        match True:
-            case _ if "test-secret-token" in auth_header:
-                tool_name = "mock_tool_file"
-                tool_desc = "Mock tool with file-based auth"
-            case _ if "my-k8s-token" in auth_header:
-                tool_name = "mock_tool_k8s"
-                tool_desc = "Mock tool with Kubernetes token"
-            case _ if "my-client-token" in auth_header:
-                tool_name = "mock_tool_client"
-                tool_desc = "Mock tool with client-provided token"
-            case _ if "error-mode" in auth_header:
-                tool_name = "mock_tool_error"
-                tool_desc = "Mock tool configured to return errors"
-                error_mode = True
-            case _:
-                # Default case already set above
-                pass
+        if "test-secret-token" in auth_header:
+            tool_name = "mock_tool_file"
+            tool_desc = "Mock tool with file-based auth"
+        elif "my-k8s-token" in auth_header:
+            tool_name = "mock_tool_k8s"
+            tool_desc = "Mock tool with Kubernetes token"
+        elif "my-client-token" in auth_header:
+            tool_name = "mock_tool_client"
+            tool_desc = "Mock tool with client-provided token"
+        elif "error-mode" in auth_header:
+            tool_name = "mock_tool_error"
+            tool_desc = "Mock tool configured to return errors"
+            error_mode = True
tests/integration/endpoints/test_query_v2_integration.py (1)

161-168: Good approach for test database isolation, but kwargs not forwarded.

The mock_to_thread function accepts **kwargs but doesn't pass them to func(). While asyncio.to_thread typically doesn't use kwargs for the target function (they're passed as positional args), the signature mismatch could cause subtle issues if kwargs are ever used.

♻️ Minor fix to forward kwargs
     async def mock_to_thread(func: Any, *args: Any, **kwargs: Any) -> Any:
         """Run function synchronously instead of in thread pool for tests."""
-        return func(*args, **kwargs)
+        return func(*args)

Or if kwargs support is intentional:

     async def mock_to_thread(func: Any, *args: Any, **kwargs: Any) -> Any:
         """Run function synchronously instead of in thread pool for tests."""
         return func(*args, **kwargs)
src/app/main.py (1)

81-85: Remove commented-out code.

The commented code block for MCP server registration (lines 81-85) should be removed rather than left as comments. If this code path is no longer needed for the Responses API, removing it improves maintainability.

♻️ Remove commented code
-    # NOTE: MCP server registration not needed for Responses API
-    # The Responses API takes inline tool definitions instead of pre-registered toolgroups
-    # logger.info("Registering MCP servers")
-    # await register_mcp_servers_async(logger, configuration.configuration)
-    # get_logger("app.endpoints.handlers")
     logger.info("App startup complete")
tests/e2e/features/steps/mcp.py (2)

831-848: Streaming response verification doesn't parse SSE events.

The step acknowledges that SSE events should be parsed (line 843 comment) but only checks the status code. This means the assertion "streaming response contains tool execution results" isn't actually verified.

💡 Consider implementing SSE parsing
`@then`("The streaming response should contain tool execution results")
def check_streaming_response_has_tool_results(context: Context) -> None:
    assert context.response is not None, "No response received"
    assert context.response.status_code == 200
    
    # Parse SSE events from streaming response
    content = context.response.content.decode("utf-8")
    events = [line for line in content.split("\n") if line.startswith("data:")]
    
    # Verify at least one event contains tool-related content
    tool_events = [e for e in events if "tool" in e.lower()]
    assert len(tool_events) > 0, "No tool execution events found in streaming response"
    print(f"✅ Found {len(tool_events)} tool-related events in streaming response")
    
    time.sleep(0.5)

78-87: Consider polling instead of fixed sleep for MCP request waits.

The fixed time.sleep(2) at line 86 may be insufficient under load or too long for fast scenarios. Consider polling the /debug/requests endpoint until requests appear or a timeout is reached.

💡 Polling approach example
`@when`("I wait for MCP server to receive requests")
def wait_for_mcp_requests(context: Context) -> None:
    """Wait for MCP server to receive and log requests with polling."""
    max_wait = 5  # seconds
    poll_interval = 0.2
    elapsed = 0
    
    while elapsed < max_wait:
        try:
            response = requests.get(f"{MOCK_MCP_SERVER_URL}/debug/requests", timeout=2)
            if response.status_code == 200 and len(response.json()) > 0:
                print(f"⏱️  MCP server received requests after {elapsed:.1f}s")
                return
        except requests.RequestException:
            pass
        time.sleep(poll_interval)
        elapsed += poll_interval
    
    print(f"⏱️  Waited {max_wait}s for MCP server (may have no requests)")

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/app/endpoints/query_v2.py`:
- Around line 179-203: The current logging of MCP tool calls exposes raw
arguments and output (see uses of mcp_call_item, args, content and
logger.debug/logger.warning); update these logs to redact sensitive data by
replacing raw args with a summary (e.g., if args is a dict/loggable mapping, log
keys and per-key lengths or types; if list/tuple, log element counts and types)
and remove or replace detailed output previews with only content length and a
fixed masked indicator (no raw substrings). Apply the same change to the
error/result logging paths (including where mcp_call_item.error is logged) so
only non-sensitive metadata (tool name, server_label, call_id, error
type/message without user data, args keys/counts, output length) are recorded
instead of raw values.

In `@src/app/endpoints/query.py`:
- Around line 415-491: The app currently schedules background tasks via
create_background_task(persist_with_topic_summary()) but the lifespan shutdown
does not wait for background_tasks_set, causing tasks (like
persist_with_topic_summary) to be cancelled before completion; update the
application shutdown in main.py to import background_tasks_set from
app.endpoints.query and await asyncio.gather(*background_tasks_set,
return_exceptions=True) (then call A2AStorageFactory.cleanup()) so all pending
tasks are awaited during shutdown, ensuring persist_with_topic_summary finishes
persistence and topic summary work before process exit.

In `@tests/e2e/features/mcp_tools.feature`:
- Around line 110-117: In the step function that implements "The system is in
default state" in tests/e2e/features/steps/common.py (the Given step for default
state), reset the MCP mock flag by adding context.expect_tool_errors = False
alongside the existing cleanup of context.auth_headers so test isolation is
preserved; ensure you set this attribute in the same function (the default-state
step) so tests that previously set context.expect_tool_errors = True are not
leaking into subsequent scenarios.
🧹 Nitpick comments (6)
docker-compose-library.yaml (1)

69-73: Use exec so Python receives SIGTERM directly.
Without exec, bash stays PID 1 and may not forward signals, which can slow container shutdowns.

Suggested tweak
 entrypoint: >
   /bin/bash -c "
   echo 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
-  /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+  exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
   "
tests/unit/app/endpoints/test_streaming_query_v2.py (1)

4-4: Prefer awaiting background tasks over a fixed sleep.
The 0.7s delay can be flaky on slower CI; waiting on the tracked tasks is deterministic.

Suggested tweak
-    # Wait for background cleanup task to complete (has 0.5s delay)
-    await asyncio.sleep(0.7)
+    # Wait for background cleanup task(s) deterministically
+    from app.endpoints.streaming_query_v2 import background_tasks_set
+
+    if background_tasks_set:
+        await asyncio.gather(*list(background_tasks_set), return_exceptions=True)

Also applies to: 226-227

tests/integration/endpoints/test_query_v2_integration.py (1)

7-59: Snapshot the background task set before gathering.
background_tasks_set can mutate while unpacking; snapshotting avoids potential RuntimeError.

Suggested tweak
-    if background_tasks_set:
-        await asyncio.gather(*background_tasks_set, return_exceptions=True)
+    tasks = list(background_tasks_set)
+    if tasks:
+        await asyncio.gather(*tasks, return_exceptions=True)
src/app/endpoints/query.py (2)

59-59: Logger name doesn't follow the __name__ pattern.

Per coding guidelines, modules should use logger = logging.getLogger(__name__). The hardcoded string "app.endpoints.handlers" may cause confusion if the module is renamed or moved.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)

102-106: Use %-style formatting in logger calls instead of f-strings.

Using f-strings in logging eagerly evaluates the format expression even when the log level is disabled. The %-style is more efficient and consistent with the rest of the codebase.

-        logger.debug(
-            f"Background task created, active tasks: {len(background_tasks_set)}"
-        )
+        logger.debug(
+            "Background task created, active tasks: %d", len(background_tasks_set)
+        )
     except Exception as e:
-        logger.error(f"Failed to create background task: {e}", exc_info=True)
+        logger.error("Failed to create background task: %s", e, exc_info=True)
tests/e2e/features/steps/mcp.py (1)

831-845: Streaming response verification doesn't parse SSE events.

The step acknowledges that SSE events should be parsed but only checks the status code. This means "streaming response contains tool execution results" isn't actually verified.

💡 Consider implementing SSE parsing
`@then`("The streaming response should contain tool execution results")
def check_streaming_response_has_tool_results(context: Context) -> None:
    assert context.response is not None, "No response received"
    assert context.response.status_code == 200
    
    # Parse SSE events from streaming response
    content = context.response.content.decode("utf-8")
    events = [line for line in content.split("\n") if line.startswith("data:")]
    
    # Verify at least one event contains tool-related content
    tool_events = [e for e in events if "tool" in e.lower()]
    assert len(tool_events) > 0, "No tool execution events found in streaming response"
    print(f"✅ Found {len(tool_events)} tool-related events in streaming response")

@blublinsky blublinsky force-pushed the mcp-e2e-tests branch 2 times, most recently from 219ac4f to d2923bf Compare February 1, 2026 19:35
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/integration/endpoints/test_query_v2_integration.py (1)

1350-1382: ⚠️ Potential issue | 🟡 Minor

Add a wait after the second request in transcript test.

The second DB lookup can race the background persistence task. Add another wait_for_background_tasks() after response_disabled before querying conversation_disabled.

🔧 Suggested fix
     # Verify response succeeded with transcripts disabled
     assert response_disabled.conversation_id is not None
     assert response_disabled.response is not None

+    await wait_for_background_tasks()
+
     # Verify conversation was still persisted (transcripts are independent)
     conversation_disabled = (
         patch_db_session.query(UserConversation)
         .filter_by(id=response_disabled.conversation_id)
         .first()
🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/conversation.py`:
- Around line 25-73: In poll_for_conversation, add an explicit guard at the top
to handle max_retries <= 0 (e.g., raise ValueError or set it to a safe default)
so that the variable response is never referenced before assignment; ensure the
function returns or raises immediately when max_retries is invalid to avoid
UnboundLocalError on the final return.

In `@tests/e2e/features/steps/feedback.py`:
- Around line 150-154: The code builds conversation_url by concatenating base
and a path then calling .replace("//","/"), which breaks the URL scheme; instead
construct the path separately from base and join them safely: build the path
using context.api_prefix and context.conversation_id (e.g.,
f"{context.api_prefix}/conversations/{context.conversation_id}" with leading
slashes trimmed) and then prepend base after trimming trailing slashes (or use a
URL-joining utility) to form conversation_url so the scheme (http://) is
preserved and no .replace("//","/") is used.

In `@tests/integration/endpoints/test_query_v2_integration.py`:
- Around line 51-56: background_tasks_set may mutate while being expanded into
asyncio.gather causing a "set changed size during iteration" error; snapshot it
first by converting background_tasks_set to a list (e.g., tasks =
list(background_tasks_set)) and then pass that list to await
asyncio.gather(*tasks, return_exceptions=True) so the set can't change during
iteration; update the code around background_tasks_set usage in
test_query_v2_integration.py accordingly.
🧹 Nitpick comments (5)
tests/e2e/features/steps/conversation.py (1)

251-289: Prefer shared retry/timeout constants for consistency.

Use MAX_RETRIES, INITIAL_RETRY_DELAY, MAX_RETRY_DELAY, and DEFAULT_TIMEOUT instead of literals to keep retry behavior aligned across the suite.

♻️ Proposed refactor
-    max_retries = 10
-    delay = 0.2  # Start with 200ms
+    max_retries = MAX_RETRIES
+    delay = INITIAL_RETRY_DELAY  # Start with configured delay
@@
-            delay = min(delay * 2, 2.0)  # Exponential backoff, cap at 2s
+            delay = min(delay * 2, MAX_RETRY_DELAY)  # Exponential backoff, cap
@@
-            context.response = requests.get(url, headers=headers, timeout=10)
+            context.response = requests.get(url, headers=headers, timeout=DEFAULT_TIMEOUT)
src/app/endpoints/streaming_query_v2.py (1)

80-100: Consider extracting shared background task utility.

The background_tasks_set and create_background_task pattern is duplicated between this file and query.py. Consider extracting to a shared utility module (e.g., utils/background_tasks.py) to avoid drift and ensure consistent behavior.

src/app/endpoints/query.py (1)

102-104: Use % formatting for logger calls instead of f-strings.

Line 103 uses an f-string (f"Background task created...") while other logger calls in this file and streaming_query_v2.py use % formatting (e.g., line 96 in streaming_query_v2.py). Use consistent formatting for better performance (deferred evaluation) and style uniformity.

♻️ Suggested fix
-        logger.debug(
-            f"Background task created, active tasks: {len(background_tasks_set)}"
-        )
+        logger.debug(
+            "Background task created, active tasks: %d", len(background_tasks_set)
+        )

Same applies to line 106:

-        logger.error(f"Failed to create background task: {e}", exc_info=True)
+        logger.error("Failed to create background task: %s", e, exc_info=True)
dev-tools/mcp-mock-server/server.py (1)

97-113: Consider simplifying the token matching logic.

The match True: with case _ if ... guards is unconventional. A more idiomatic approach would be using if/elif chains or matching directly on auth_header:

♻️ Alternative using if/elif
if "test-secret-token" in auth_header:
    tool_name = "mock_tool_file"
    tool_desc = "Mock tool with file-based auth"
elif "my-k8s-token" in auth_header:
    tool_name = "mock_tool_k8s"
    tool_desc = "Mock tool with Kubernetes token"
elif "my-client-token" in auth_header:
    tool_name = "mock_tool_client"
    tool_desc = "Mock tool with client-provided token"
elif "error-mode" in auth_header:
    tool_name = "mock_tool_error"
    tool_desc = "Mock tool configured to return errors"
    error_mode = True
# else: defaults already set above
src/app/main.py (1)

104-112: Bound shutdown waits to avoid hanging deployments.

If any background task stalls, shutdown will block indefinitely. Consider adding a timeout and canceling lingering tasks.

🔧 Suggested fix
-        all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks)
-        await asyncio.gather(*all_tasks, return_exceptions=True)
-        logger.info("All background tasks completed")
+        all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks)
+        try:
+            await asyncio.wait_for(
+                asyncio.gather(*all_tasks, return_exceptions=True),
+                timeout=10,
+            )
+            logger.info("All background tasks completed")
+        except asyncio.TimeoutError:
+            logger.warning("Background tasks did not finish before shutdown; cancelling")
+            for task in all_tasks:
+                task.cancel()
+            await asyncio.gather(*all_tasks, return_exceptions=True)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/features/steps/feedback.py (1)

116-150: ⚠️ Potential issue | 🟡 Minor

Docstring is misleading — claims polling occurs but it doesn't.

The docstring (lines 118-120) states the function "polls to ensure it's persisted to the database before proceeding," but there is no polling code in the function. The NOTE comment (lines 146-149) explicitly says polling is disabled. This contradiction will confuse future maintainers.

Either remove the polling claim from the docstring or, if polling is intended, implement it.

📝 Proposed fix to align docstring with actual behavior
 def create_conversation_with_user_id(
     context: Context, user_id: Optional[str] = None
 ) -> None:
-    """Create a conversation, optionally with a specific user_id query parameter.
-
-    After creating the conversation, polls to ensure it's persisted to the database
-    before proceeding. This handles the asynchronous background persistence.
-    """
+    """Create a conversation, optionally with a specific user_id query parameter."""
     endpoint = "query"

Then simplify the NOTE comment:

-    # NOTE: Polling disabled for feedback scenarios to avoid errors
-    # Feedback tests typically don't need to wait for background persistence
-    # as they interact with conversations immediately after creation.
-    # If polling is needed, it can be added back with proper error handling.
+    # NOTE: Feedback tests interact with conversations immediately after creation,
+    # so no polling for background persistence is performed here.
🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/common.py`:
- Around line 27-45: Fix the duplicated wording in the docstring at the top of
the shown block (the docstring that currently reads "Check the default state
system state."); update it to read "Check the default system state." (or "Check
the default state.") so the sentence is grammatically correct while keeping the
rest of the docstring intact; the change should be made in the docstring
immediately preceding the assert context is not None check in
tests/e2e/features/steps/common.py.
🧹 Nitpick comments (7)
docker-compose.yaml (1)

87-91: Use exec in the entrypoint for proper signal handling.

Running Python as a child of bash can swallow SIGTERM/SIGINT; exec makes the interpreter PID 1 for clean shutdowns.

🔧 Suggested change
     entrypoint: >
       /bin/bash -c "
       echo 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
-      /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+      exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
       "
tests/unit/app/endpoints/test_streaming_query_v2.py (1)

226-227: Avoid fixed sleep; await background cleanup deterministically.

Fixed delays can be flaky and slow. Consider awaiting the tracked background tasks instead.

🔧 Suggested change
 from app.endpoints.streaming_query_v2 import (
+    background_tasks_set,
     retrieve_response,
     streaming_query_endpoint_handler_v2,
 )
@@
-    # Wait for background cleanup task to complete (has 0.5s delay)
-    await asyncio.sleep(0.7)
+    # Wait for background cleanup task to complete deterministically
+    tasks = list(background_tasks_set)
+    if tasks:
+        await asyncio.gather(*tasks, return_exceptions=True)
docker-compose-library.yaml (1)

69-73: Use exec in the entrypoint for proper signal handling.

Running Python as a child of bash can swallow SIGTERM/SIGINT; exec makes the interpreter PID 1 for clean shutdowns.

🔧 Suggested change
     entrypoint: >
       /bin/bash -c "
       echo 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
-      /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+      exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
       "
src/app/endpoints/streaming_query_v2.py (1)

80-100: Consider centralizing the background-task helper.

It’s duplicated with src/app/endpoints/query.py; a shared utility would prevent drift.

tests/e2e/features/steps/conversation.py (1)

257-305: Use module-level constants for retry configuration.

The retry logic here duplicates values from the module constants. Using MAX_RETRIES, INITIAL_RETRY_DELAY, and MAX_RETRY_DELAY would improve consistency and maintainability.

♻️ Suggested refactor
 def check_returned_conversation_id(context: Context) -> None:
     """Check the conversation id in response.

     If the conversation is not found in the list, retries the GET request
     with exponential backoff to handle asynchronous background persistence.
     """
-    max_retries = 10
-    delay = 0.2  # Start with 200ms
+    max_retries = MAX_RETRIES
+    delay = INITIAL_RETRY_DELAY

     for attempt in range(max_retries):
         # ... existing code ...
         
-            delay = min(delay * 2, 2.0)  # Exponential backoff, cap at 2s
+            delay = min(delay * 2, MAX_RETRY_DELAY)  # Exponential backoff with cap
tests/e2e/features/steps/mcp.py (2)

831-848: Streaming response verification doesn't parse SSE events.

The step only checks status code but doesn't verify tool execution results are actually present in the SSE stream. Consider parsing SSE events to make this assertion meaningful.

💡 Suggested enhancement
`@then`("The streaming response should contain tool execution results")
def check_streaming_response_has_tool_results(context: Context) -> None:
    assert context.response is not None, "No response received"
    assert context.response.status_code == 200
    
    # Parse SSE events from streaming response
    content = context.response.content.decode("utf-8")
    events = [line for line in content.split("\n") if line.startswith("data:")]
    
    # Verify at least one event contains tool-related content
    tool_events = [e for e in events if "tool" in e.lower()]
    assert len(tool_events) > 0, "No tool execution events found in streaming response"
    print(f"✅ Found {len(tool_events)} tool-related events in streaming response")
    
    # Cleanup delay
    time.sleep(0.5)

78-87: Consider making wait time configurable or using polling.

The fixed time.sleep(2) may be too short under load or too long for fast runs. Consider polling the debug endpoint until requests appear or making the timeout configurable.

Comment on lines 27 to 45
"""Check the default state system state.

Ensure the Behave test context is present for steps that assume the system
is in its default state.

Note: Does NOT clear auth headers, as those may be set in Background section
and should persist for the entire scenario. Auth headers are automatically
reset by Behave between scenarios.

Parameters:
context (Context): Behave Context instance used to store and share test state.

Raises:
AssertionError: If `context` is None.
"""
assert context is not None
# Reset MCP error expectation flag for test isolation
if hasattr(context, "expect_tool_errors"):
context.expect_tool_errors = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the docstring typo (“default state system state”).

Minor wording duplication in the docstring.

🔧 Suggested change
-    """Check the default state system state.
+    """Check the default system state.
🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/common.py` around lines 27 - 45, Fix the duplicated
wording in the docstring at the top of the shown block (the docstring that
currently reads "Check the default state system state."); update it to read
"Check the default system state." (or "Check the default state.") so the
sentence is grammatically correct while keeping the rest of the docstring
intact; the change should be made in the docstring immediately preceding the
assert context is not None check in tests/e2e/features/steps/common.py.

@blublinsky blublinsky force-pushed the mcp-e2e-tests branch 2 times, most recently from ee1cec2 to d516d49 Compare February 1, 2026 22:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docker-compose-library.yaml`:
- Around line 69-73: In the entrypoint shell snippet, the test token is written
with echo which appends a trailing newline; replace the write command to use
printf without a newline (e.g., printf '%s' 'test-secret-token-123' >
/tmp/lightspeed-mcp-test-token) so the file contains the exact token, then
continue launching /app-root/.venv/bin/python3.12
/app-root/src/lightspeed_stack.py; update the entrypoint command that currently
uses echo to use printf '%s' instead.
🧹 Nitpick comments (2)
tests/e2e/features/steps/mcp.py (1)

646-661: Add an assertion for error indicators in the response.
The step name promises failure verification, but it currently doesn’t validate any error signal.

🧪 Suggested improvement
     assert context.response is not None, "No response received"
     assert (
         context.response.status_code == 200
     ), f"Bad status: {context.response.status_code}"
-
-    # In error mode, the response should still be 200 but contain error information
-    # The LLM will handle the tool error gracefully
-    print("✅ Response received (tool errors are handled gracefully by LLM)")
+    response_data = context.response.json()
+    response_text = json.dumps(response_data).lower()
+    assert (
+        "error" in response_text or "iserror" in response_text
+    ), "Response did not surface tool error information"
+    print("✅ Response indicates tool execution failure")
src/app/endpoints/streaming_query_v2.py (1)

337-369: Include stack traces in background cleanup failures.
This keeps post‑mortems actionable when async cleanup fails.

🔧 Suggested tweak
-            except Exception as e:  # pylint: disable=broad-exception-caught
-                logger.error("Background cleanup failed: %s", e)
+            except Exception as e:  # pylint: disable=broad-exception-caught
+                logger.error("Background cleanup failed: %s", e, exc_info=True)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/features/steps/feedback.py (1)

116-162: ⚠️ Potential issue | 🟡 Minor

Fail fast when persistence polling fails.

Line 156–161 only logs a warning on non-200 poll results, then continues. This can mask the root cause and push failures into later steps. Prefer failing immediately after retries to keep tests deterministic and errors actionable.

🔧 Suggested fix
     poll_response = poll_for_conversation(conversation_url, headers)
-    if poll_response.status_code != 200:
-        print(
-            f"⚠️  Warning: Conversation {context.conversation_id} not found after creation "
-            f"(status: {poll_response.status_code})"
-        )
+    assert poll_response.status_code == 200, (
+        f"Conversation {context.conversation_id} not found after creation "
+        f"(status: {poll_response.status_code})"
+    )
🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/conversation.py`:
- Around line 312-365: The loop assumes context.response contains a 200 JSON
with "conversations"; add an explicit check right after response_json =
context.response.json() (or better, before parsing) to validate
context.response.status_code == 200 and that "conversations" is in the parsed
body. If the status is non-200 or the key is missing, raise/assert with a clear
message that includes context.response.status_code and the raw response text (or
response_json) so the test fails with the real error; keep using the same retry
flow (re-fetch via the existing endpoint/url/headers logic) when status is
non-200 so transient errors are retried, but on final attempt surface the
detailed error mentioning context.response_data["conversation_id"] and the last
response payload.
🧹 Nitpick comments (2)
dev-tools/mcp-mock-server/server.py (1)

97-113: Consider using if-elif chain instead of match True pattern.

The match True with case _ if condition: is an unconventional pattern in Python. While functional, a standard if-elif chain would be more idiomatic and readable:

♻️ Suggested refactor
-        match True:
-            case _ if "test-secret-token" in auth_header:
-                tool_name = "mock_tool_file"
-                tool_desc = "Mock tool with file-based auth"
-            case _ if "my-k8s-token" in auth_header:
-                tool_name = "mock_tool_k8s"
-                tool_desc = "Mock tool with Kubernetes token"
-            case _ if "my-client-token" in auth_header:
-                tool_name = "mock_tool_client"
-                tool_desc = "Mock tool with client-provided token"
-            case _ if "error-mode" in auth_header:
-                tool_name = "mock_tool_error"
-                tool_desc = "Mock tool configured to return errors"
-                error_mode = True
-            case _:
-                # Default case already set above
-                pass
+        if "test-secret-token" in auth_header:
+            tool_name = "mock_tool_file"
+            tool_desc = "Mock tool with file-based auth"
+        elif "my-k8s-token" in auth_header:
+            tool_name = "mock_tool_k8s"
+            tool_desc = "Mock tool with Kubernetes token"
+        elif "my-client-token" in auth_header:
+            tool_name = "mock_tool_client"
+            tool_desc = "Mock tool with client-provided token"
+        elif "error-mode" in auth_header:
+            tool_name = "mock_tool_error"
+            tool_desc = "Mock tool configured to return errors"
+            error_mode = True
+        # else: defaults already set above
src/app/endpoints/streaming_query_v2.py (1)

337-369: Include stack traces for background cleanup failures.

Without exc_info=True (or logger.exception), failures in cleanup lose useful diagnostics.

🔧 Suggested tweak
-            except Exception as e:  # pylint: disable=broad-exception-caught
-                logger.error("Background cleanup failed: %s", e)
+            except Exception as e:  # pylint: disable=broad-exception-caught
+                logger.error("Background cleanup failed: %s", e, exc_info=True)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/app/main.py`:
- Around line 319-334: The global ASGI exception middleware currently maps all
exceptions to a 500 InternalServerErrorResponse; update the try/except in
main.py to explicitly handle APIConnectionError first (before the broad
Exception catch) by logging the connection error and returning
ServiceUnavailableResponse.generic() when response_started is False (and still
logging the "response already started" case using response_started and logger),
otherwise keep the existing fallback behavior that uses
InternalServerErrorResponse.generic() for other exceptions; reference
APIConnectionError, ServiceUnavailableResponse, InternalServerErrorResponse,
response_started, and logger to locate and implement the change.
🧹 Nitpick comments (5)
tests/e2e/features/steps/llm_query_response.py (1)

60-66: Avoid hard‑coded 1s sleep; make the wait configurable/adaptive.

A fixed delay can be too short on slow CI (flaky) or too long on fast runs (adds latency). Prefer polling with a timeout, or at least make the delay configurable via env/context so CI can tune it.

♻️ Suggested diff (configurable delay)
-import json
-import time
+import json
+import os
+import time
@@
 DEFAULT_LLM_TIMEOUT = 60
+PERSIST_WAIT_SECONDS = float(os.getenv("E2E_PERSIST_WAIT_SECONDS", "1"))
@@
-    time.sleep(1)
+    time.sleep(PERSIST_WAIT_SECONDS)
src/app/main.py (2)

104-112: Surface background task failures during shutdown.

asyncio.gather(..., return_exceptions=True) will swallow failures unless you log them.

🔧 Example
-        all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks)
-        await asyncio.gather(*all_tasks, return_exceptions=True)
+        all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks)
+        results = await asyncio.gather(*all_tasks, return_exceptions=True)
+        for result in results:
+            if isinstance(result, BaseException):
+                logger.error("Background task failed during shutdown", exc_info=result)

207-212: Align new docstrings with Google style (“Args/Returns”).

Use Args: instead of Parameters: in the new docstrings for consistency with Google conventions.

🔧 Example
-        Parameters:
-            app: The ASGI application instance to wrap
-            app_routes_paths: List of route paths to monitor (others ignored)
+        Args:
+            app: The ASGI application instance to wrap
+            app_routes_paths: List of route paths to monitor (others ignored)

As per coding guidelines Follow Google Python docstring conventions including Parameters, Returns, Raises, and Attributes sections as needed.

tests/e2e/features/steps/conversation.py (1)

317-361: Reuse retry constants for consistency.

This block redefines backoff values and timeout; consider using MAX_RETRIES, INITIAL_RETRY_DELAY, and DEFAULT_TIMEOUT so tuning stays centralized.

🔧 Example
-    max_retries = 10
-    delay = 0.2  # Start with 200ms
+    max_retries = MAX_RETRIES
+    delay = INITIAL_RETRY_DELAY
@@
-            context.response = requests.get(url, headers=headers, timeout=10)
+            context.response = requests.get(url, headers=headers, timeout=DEFAULT_TIMEOUT)
docker-compose.yaml (1)

87-91: Use exec in entrypoint to preserve signal handling.

Without exec, bash stays PID1 and may not forward SIGTERM to the Python process.

🔧 Suggested tweak
 entrypoint: >
   /bin/bash -c "
   printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
-  /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+  exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
   "

@blublinsky blublinsky force-pushed the mcp-e2e-tests branch 2 times, most recently from 58a07bd to 116f3b5 Compare February 2, 2026 11:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/features/steps/feedback.py (1)

116-162: ⚠️ Potential issue | 🟡 Minor

Fail fast if conversation persistence never completes. The current warning allows tests to proceed and fail later with less context. Consider asserting so the failure is immediate and actionable.

🔧 Suggested fix (fail fast)
-    poll_response = poll_for_conversation(conversation_url, headers)
-    if poll_response.status_code != 200:
-        print(
-            f"⚠️  Warning: Conversation {context.conversation_id} not found after creation "
-            f"(status: {poll_response.status_code})"
-        )
+    poll_response = poll_for_conversation(conversation_url, headers)
+    if poll_response.status_code != 200:
+        raise AssertionError(
+            "Conversation persistence failed after creation "
+            f"(status: {poll_response.status_code}, body: {poll_response.text})"
+        )
🤖 Fix all issues with AI agents
In `@src/app/main.py`:
- Around line 322-349: The 503 handler currently sends the full
ServiceUnavailableResponse (including status_code) which differs from the 500
response schema; change the body sent to match the 500 format by sending only
the detail wrapper. Specifically, after creating error_response
(ServiceUnavailableResponse) in the except APIConnectionError block, build the
payload as {"detail": error_response.detail} (or equivalent from error_response)
and send that JSON via send(...) instead of error_response.model_dump_json(), so
the response schema matches the 500 error format.

In `@tests/e2e/features/info.feature`:
- Line 19: Update the test assertion string that currently reads "And The body
of the response has llama-stack version 0.4.2" to expect "llama-stack version
0.4.3" so it matches the pinned dependency in pyproject.toml; locate and modify
the step definition or feature line containing that exact phrase to use 0.4.3.
🧹 Nitpick comments (4)
tests/e2e/configuration/library-mode/lightspeed-stack-mcp.yaml (1)

54-54: Add a trailing newline at end of file.

The file is missing a final newline, which is a common convention for text files and helps with certain tools (e.g., cat, diff utilities).

src/app/endpoints/query.py (2)

87-97: Consider using a more specific type hint for the coroutine parameter.

The coro parameter is typed as Any, but could be more precisely typed as Coroutine[Any, Any, Any] for better type checking.

♻️ Proposed fix
+from typing import Annotated, Any, Coroutine, Optional
-from typing import Annotated, Any, Optional

-def create_background_task(coro: Any) -> None:
+def create_background_task(coro: Coroutine[Any, Any, Any]) -> None:

415-433: Consider making the MCP cleanup delay configurable or documenting the rationale.

The 500ms delay is a reasonable pragmatic solution, but it's a magic number. Consider either:

  1. Extracting it to a named constant with documentation
  2. Making it configurable for different environments

This is a minor suggestion since the current implementation works.

♻️ Proposed constant extraction
+# Delay before making LLM calls in background tasks to allow MCP session cleanup
+MCP_CLEANUP_DELAY_SECONDS = 0.5

 async def persist_with_topic_summary() -> None:
     ...
     logger.debug("Background task: waiting for MCP cleanup")
-    await asyncio.sleep(0.5)  # 500ms should be enough for cleanup
+    await asyncio.sleep(MCP_CLEANUP_DELAY_SECONDS)
     logger.debug("Background task: MCP cleanup complete")
tests/e2e/features/mcp_tools.feature (1)

66-76: Consider documenting the magic number in the assertion.

The assertion should have received at least 6 requests uses a specific count. While this works, if the test configuration changes (e.g., adding/removing MCP servers), this number may become incorrect.

Consider adding a comment or using a more descriptive step that explains the expected breakdown (e.g., 2 requests per server × 3 servers = 6 requests minimum).

Comment on lines 322 to 371
except APIConnectionError as exc:
# Llama Stack connection failure - return 503 so clients know to retry
logger.error("Llama Stack connection error in middleware: %s", exc)

if response_started:
logger.error(
"Cannot send 503 response - response already started (likely streaming)"
)
return

# Return 503 Service Unavailable for Llama Stack connection issues
error_response = ServiceUnavailableResponse(
backend_name="Llama Stack", cause=str(exc)
)

Returns:
Response: The HTTP response produced by the next handler.
"""
path = request.url.path
logger.debug("Received request for path: %s", path)

# ignore paths that are not part of the app routes
if path not in app_routes_paths:
return await call_next(request)

logger.debug("Processing API request for path: %s", path)

# measure time to handle duration + update histogram
with metrics.response_duration_seconds.labels(path).time():
response = await call_next(request)

# ignore /metrics endpoint that will be called periodically
if not path.endswith("/metrics"):
# just update metrics
metrics.rest_api_calls_total.labels(path, response.status_code).inc()
return response


@app.middleware("http")
async def global_exception_middleware(
request: Request, call_next: Callable[[Request], Awaitable[Response]]
) -> Response:
"""Middleware to handle uncaught exceptions from all endpoints."""
try:
response = await call_next(request)
return response
except HTTPException:
raise
except Exception as exc: # pylint: disable=broad-exception-caught
logger.exception("Uncaught exception in endpoint: %s", exc)
error_response = InternalServerErrorResponse.generic()
return JSONResponse(
status_code=error_response.status_code,
content={"detail": error_response.detail.model_dump()},
)
await send(
{
"type": "http.response.start",
"status": error_response.status_code,
"headers": [[b"content-type", b"application/json"]],
}
)
await send(
{
"type": "http.response.body",
"body": error_response.model_dump_json().encode("utf-8"),
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "main.py" -path "*/app/*" 2>/dev/null | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# First, let's look at the file structure
git ls-files | grep -E "(main\.py|response|error)" | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 771


🏁 Script executed:

# Check if src/app/main.py exists and get its size
wc -l src/app/main.py 2>/dev/null || echo "File not found at src/app/main.py"

Repository: lightspeed-core/lightspeed-stack

Length of output: 95


🏁 Script executed:

# Let's search for error handling patterns in the codebase
rg "APIConnectionError|500|ServiceUnavailableResponse" -A 5 -B 2 --no-heading | head -100

Repository: lightspeed-core/lightspeed-stack

Length of output: 19631


🏁 Script executed:

# Read the main.py file focusing on the error handling section
sed -n '300,370p' src/app/main.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 3089


🏁 Script executed:

# Search for 500 error handling and other error responses in main.py
rg "500|APIConnectionError|except.*Error" src/app/main.py -B 2 -A 8

Repository: lightspeed-core/lightspeed-stack

Length of output: 2035


🏁 Script executed:

# Read the responses.py file to understand the response structures
head -150 src/models/responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 5278


🏁 Script executed:

# Continue reading main.py to see the 500 error handling
sed -n '350,380p' src/app/main.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 1465


🏁 Script executed:

# Search for InternalServerErrorResponse and ServiceUnavailableResponse definitions
rg "class.*ServerError|class.*ServiceUnavailable" src/models/responses.py -A 15

Repository: lightspeed-core/lightspeed-stack

Length of output: 1186


🏁 Script executed:

# Look for any ErrorResponse or base error response models
rg "class.*Error.*Response|class.*Response.*Error" src/models/responses.py -A 10

Repository: lightspeed-core/lightspeed-stack

Length of output: 4027


🏁 Script executed:

# Find the DetailModel class definition
rg "class DetailModel" src/models/responses.py -A 10

Repository: lightspeed-core/lightspeed-stack

Length of output: 406


🏁 Script executed:

# Check how ServiceUnavailableResponse is instantiated to understand its constructor
rg "ServiceUnavailableResponse" src/app/main.py -B 2 -A 5

Repository: lightspeed-core/lightspeed-stack

Length of output: 667


🏁 Script executed:

# Check if there are any other tests or examples showing expected response formats
rg "ServiceUnavailableResponse|InternalServerErrorResponse" tests/ -A 3 | head -40

Repository: lightspeed-core/lightspeed-stack

Length of output: 3753


🏁 Script executed:

# Check the ServiceUnavailableResponse test cases more fully
rg "ServiceUnavailableResponse" tests/unit/models/responses/test_error_responses.py -A 8

Repository: lightspeed-core/lightspeed-stack

Length of output: 2580


🏁 Script executed:

# Look at the ServiceUnavailableResponse class definition more completely
rg "class ServiceUnavailableResponse" src/models/responses.py -A 30

Repository: lightspeed-core/lightspeed-stack

Length of output: 1214


🏁 Script executed:

# Check if there's a factory method or __init__ for ServiceUnavailableResponse
ast-grep --pattern 'class ServiceUnavailableResponse {
  $$$
}'

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Look for any factory methods in error response classes
rg "@classmethod|def generic|def " src/models/responses.py | grep -A 2 -B 2 "ServiceUnavailable\|InternalServerError"

Repository: lightspeed-core/lightspeed-stack

Length of output: 765


🏁 Script executed:

# Read the complete ServiceUnavailableResponse __init__ to see the full constructor
sed -n '1,500p' src/models/responses.py | grep -A 30 "class ServiceUnavailableResponse"

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Check AbstractErrorResponse's __init__ to understand how detail is constructed
rg "class AbstractErrorResponse" src/models/responses.py -A 25 | head -40

Repository: lightspeed-core/lightspeed-stack

Length of output: 962


🏁 Script executed:

# Read the actual AbstractErrorResponse __init__ method
rg "class AbstractErrorResponse" src/models/responses.py -B 5 -A 50 | grep -A 20 "def __init__"

Repository: lightspeed-core/lightspeed-stack

Length of output: 913


🏁 Script executed:

# Verify what the actual serialized outputs look like by examining model_dump behavior
python3 << 'EOF'
# Simulate the response models
class DetailModel:
    def __init__(self, response: str, cause: str):
        self.response = response
        self.cause = cause
    
    def model_dump(self):
        return {"response": self.response, "cause": self.cause}

class ServiceUnavailableResponse:
    def __init__(self, backend_name: str, cause: str):
        self.status_code = 503
        self.detail = DetailModel(f"Unable to connect to {backend_name}", cause)
    
    def model_dump_json(self):
        import json
        return json.dumps({
            "status_code": self.status_code,
            "detail": self.detail.model_dump()
        })

# Test current 503 response
resp_503 = ServiceUnavailableResponse("Llama Stack", "Connection timeout")
print("Current 503 response body:")
print(resp_503.model_dump_json())

# Test what the 500 response format would be (from code snippet)
print("\n500 response format (from main.py):")
print('{"detail": {"response": "Internal server error", "cause": "..."}}')

# Test proposed fix format
print("\nProposed 503 fix format:")
import json
content = {"detail": resp_503.detail.model_dump()}
print(json.dumps(content))
EOF

Repository: lightspeed-core/lightspeed-stack

Length of output: 435


🏁 Script executed:

# Double-check what JSONResponse returns
python3 << 'EOF'
from fastapi.responses import JSONResponse
import json

# Simulate response detail
response_detail = {"response": "Unable to connect to Llama Stack", "cause": "Connection timeout"}

# Test the suggested fix approach
json_response = JSONResponse(content={"detail": response_detail})
print("JSONResponse body type:", type(json_response.body))
print("JSONResponse body value:", json_response.body)

# Verify it matches direct JSON encoding
direct_encoding = json.dumps({"detail": response_detail}).encode("utf-8")
print("\nDirect encoding matches:", json_response.body == direct_encoding)
EOF

Repository: lightspeed-core/lightspeed-stack

Length of output: 195


🏁 Script executed:

# Let's verify the actual current output one more time by reading the exact code
sed -n '322,349p' src/app/main.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 1147


Align 503 response schema with 500 response structure. The 503 response currently sends {"status_code": 503, "detail": {...}} while the 500 response sends only {"detail": {...}}, creating an inconsistent error schema. Fix by wrapping the detail object:

Suggested fix
-            await send(
-                {
-                    "type": "http.response.body",
-                    "body": error_response.model_dump_json().encode("utf-8"),
-                }
-            )
+            await send(
+                {
+                    "type": "http.response.body",
+                    "body": JSONResponse(
+                        content={"detail": error_response.detail.model_dump()}
+                    ).body,
+                }
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except APIConnectionError as exc:
# Llama Stack connection failure - return 503 so clients know to retry
logger.error("Llama Stack connection error in middleware: %s", exc)
if response_started:
logger.error(
"Cannot send 503 response - response already started (likely streaming)"
)
return
# Return 503 Service Unavailable for Llama Stack connection issues
error_response = ServiceUnavailableResponse(
backend_name="Llama Stack", cause=str(exc)
)
Returns:
Response: The HTTP response produced by the next handler.
"""
path = request.url.path
logger.debug("Received request for path: %s", path)
# ignore paths that are not part of the app routes
if path not in app_routes_paths:
return await call_next(request)
logger.debug("Processing API request for path: %s", path)
# measure time to handle duration + update histogram
with metrics.response_duration_seconds.labels(path).time():
response = await call_next(request)
# ignore /metrics endpoint that will be called periodically
if not path.endswith("/metrics"):
# just update metrics
metrics.rest_api_calls_total.labels(path, response.status_code).inc()
return response
@app.middleware("http")
async def global_exception_middleware(
request: Request, call_next: Callable[[Request], Awaitable[Response]]
) -> Response:
"""Middleware to handle uncaught exceptions from all endpoints."""
try:
response = await call_next(request)
return response
except HTTPException:
raise
except Exception as exc: # pylint: disable=broad-exception-caught
logger.exception("Uncaught exception in endpoint: %s", exc)
error_response = InternalServerErrorResponse.generic()
return JSONResponse(
status_code=error_response.status_code,
content={"detail": error_response.detail.model_dump()},
)
await send(
{
"type": "http.response.start",
"status": error_response.status_code,
"headers": [[b"content-type", b"application/json"]],
}
)
await send(
{
"type": "http.response.body",
"body": error_response.model_dump_json().encode("utf-8"),
}
)
except APIConnectionError as exc:
# Llama Stack connection failure - return 503 so clients know to retry
logger.error("Llama Stack connection error in middleware: %s", exc)
if response_started:
logger.error(
"Cannot send 503 response - response already started (likely streaming)"
)
return
# Return 503 Service Unavailable for Llama Stack connection issues
error_response = ServiceUnavailableResponse(
backend_name="Llama Stack", cause=str(exc)
)
await send(
{
"type": "http.response.start",
"status": error_response.status_code,
"headers": [[b"content-type", b"application/json"]],
}
)
await send(
{
"type": "http.response.body",
"body": json.dumps({"detail": error_response.detail.model_dump()}).encode("utf-8"),
}
)
🤖 Prompt for AI Agents
In `@src/app/main.py` around lines 322 - 349, The 503 handler currently sends the
full ServiceUnavailableResponse (including status_code) which differs from the
500 response schema; change the body sent to match the 500 format by sending
only the detail wrapper. Specifically, after creating error_response
(ServiceUnavailableResponse) in the except APIConnectionError block, build the
payload as {"detail": error_response.detail} (or equivalent from error_response)
and send that JSON via send(...) instead of error_response.model_dump_json(), so
the response schema matches the 500 error format.

Then The status code of the response is 200
And The body of the response has proper name Lightspeed Core Service (LCS) and version 0.4.0
And The body of the response has llama-stack version 0.4.3
And The body of the response has llama-stack version 0.4.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for llama-stack version references in config, dependency, and Docker files
echo "=== Searching for version references (0.4.x) ==="
rg "0\.4\.[0-9]" --type yaml --type toml --type txt -g '!*.lock' -g '!node_modules' 2>/dev/null || true

echo -e "\n=== Searching for llama-stack in requirements/dependencies ==="
fd -e "requirements.txt" -e "setup.py" -e "setup.cfg" -e "pyproject.toml" -e "poetry.lock" | head -20

echo -e "\n=== Searching for Dockerfile or docker-compose ==="
fd -e "Dockerfile*" -e "docker-compose*" | head -10

echo -e "\n=== Searching for all occurrences of 'llama-stack' in feature files ==="
rg "llama-stack" tests/e2e/features/ 2>/dev/null || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 4675


🏁 Script executed:

#!/bin/bash
# Check the feature file context around line 19
echo "=== Content of info.feature around the changed line ==="
cat -n tests/e2e/features/info.feature | head -30

echo -e "\n=== Search for version expectations in all feature files ==="
rg -n "version" tests/e2e/features/ --type txt 2>/dev/null || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 1650


🏁 Script executed:

#!/bin/bash
# Check git history to understand the context of this change
echo "=== Recent commits touching this file ==="
git log --oneline -n 10 -- tests/e2e/features/info.feature 2>/dev/null || echo "Git history not available"

echo -e "\n=== Check if there's a version file or environment configuration ==="
fd "version" -e "txt" -e "py" -e "yaml" -e "env" | grep -E "(version|config|env)" | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 371


🏁 Script executed:

#!/bin/bash
echo "=== Check llama_stack_version.py utility ==="
cat -n src/utils/llama_stack_version.py | head -50

echo -e "\n=== Check version.py ==="
cat -n src/version.py

echo -e "\n=== Check the test step implementation for version checking ==="
cat -n tests/e2e/features/steps/info.py | head -40

Repository: lightspeed-core/lightspeed-stack

Length of output: 4347


Fix the llama-stack version mismatch in the test assertion.

The test expects llama-stack version 0.4.2, but pyproject.toml pins llama-stack==0.4.3. This mismatch will cause the e2e test to fail when run. The assertion on line 19 should expect 0.4.3 to match the pinned dependency.

🤖 Prompt for AI Agents
In `@tests/e2e/features/info.feature` at line 19, Update the test assertion string
that currently reads "And The body of the response has llama-stack version
0.4.2" to expect "llama-stack version 0.4.3" so it matches the pinned dependency
in pyproject.toml; locate and modify the step definition or feature line
containing that exact phrase to use 0.4.3.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@docker-compose-library.yaml`:
- Around line 69-73: The entrypoint currently starts a bash shell that writes
the token and launches Python as a child; replace that with a shell sequence
that tightens token permissions and then uses exec to replace PID 1 with the
Python process: set umask 077 before creating the token (or explicitly chmod the
token to 0600 after printf to /tmp/lightspeed-mcp-test-token) and then exec
/app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py so the Python
process becomes PID 1 and receives signals correctly; update the entrypoint
block that uses /bin/bash -c and printf accordingly.

In `@docker-compose.yaml`:
- Around line 87-91: The entrypoint should drop into the Python process as PID 1
and create the token file with owner-only permissions; update the entrypoint
snippet that writes the token and launches /app-root/.venv/bin/python3.12
/app-root/src/lightspeed_stack.py to first set umask 077 (so the token file is
created with owner-only perms) and then use exec to replace the shell with the
Python process (so the Python process becomes PID 1 and receives signals);
locate the multline entrypoint block that prints 'test-secret-token-123' into
/tmp/lightspeed-mcp-test-token and modify it accordingly.

In `@src/app/endpoints/query_v2.py`:
- Around line 261-268: The log prints the full args dict for MCP approval
requests (logger.debug("  Arguments: %s", args)), which may leak sensitive data;
change this to log a redacted version of args using the same redaction used for
mcp_call (apply the existing redaction helper used around mcp_call at line ~186)
and log that redacted payload instead, referencing
approval_request_item.name/server_label and the redacted args so raw user
data/secrets are never emitted.
🧹 Nitpick comments (4)
tests/e2e/features/steps/llm_query_response.py (1)

63-68: Avoid a hardcoded 3s sleep; make the wait configurable (or poll with a deadline).
A fixed sleep can be too short in slower CI (flaky) and too long when persistence is fast. Consider a configurable wait (via behave userdata) or polling for readiness with a max timeout.

💡 Example tweak (configurable wait)
-    time.sleep(3)
+    wait_s = float(context.config.userdata.get("persistence_wait_s", 3))
+    time.sleep(wait_s)
src/app/endpoints/query.py (1)

430-433: Hardcoded delay for MCP cleanup is fragile.

The 500ms sleep is a best-effort heuristic. Under high load or slow network conditions, MCP cleanup might not complete in time. Consider making this configurable or implementing a more robust synchronization mechanism if issues arise in production.

tests/e2e/features/steps/mcp.py (2)

553-613: Multiple verification steps only check status code.

Steps like check_response_has_tool_results, check_successful_tool_execution, check_response_has_tool_info, and check_tool_results_in_response all make claims about tool execution but only verify that response key exists. This makes tests pass even when tools weren't actually used.

Consider strengthening these assertions to check for tool-specific fields (tool_calls, tool_results) when they mature beyond initial coverage.


10-11: Consider making mock server URL configurable.

MOCK_MCP_SERVER_URL is hardcoded. If the mock server runs on a different port or host in CI environments, tests will fail. Consider using an environment variable with a default:

MOCK_MCP_SERVER_URL = os.environ.get("MCP_MOCK_SERVER_URL", "http://localhost:9000")

Comment on lines +69 to +73
entrypoint: >
/bin/bash -c "
printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
/app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "docker-compose-library.yaml" -o -name "*compose*.yaml" | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 127


🏁 Script executed:

sed -n '60,80p' ./docker-compose-library.yaml

Repository: lightspeed-core/lightspeed-stack

Length of output: 1078


🌐 Web query:

bash PID1 signal handling Docker entrypoint

💡 Result:

When your Docker container’s PID 1 is bash (or /bin/sh -c), signal handling often surprises you:

  • PID 1 is special in Linux namespaces: if PID 1 doesn’t install handlers, some signals (notably SIGTERM/SIGINT) won’t have their normal “default action” of terminating the process. So docker stop may hang until Docker sends SIGKILL. [1]
  • If you use shell-form CMD/ENTRYPOINT (string form), Docker effectively runs your command under /bin/sh -c, and signals may not reach your real app (your app isn’t PID 1). Docker explicitly warns that shell-form ENTRYPOINT “does not pass signals”. [2]

Best practices for Docker entrypoints

  1. Prefer exec-form ENTRYPOINT / CMD (JSON array)
ENTRYPOINT ["/usr/local/bin/my-app"]
# or
CMD ["/usr/local/bin/my-app"]

This avoids /bin/sh -c wrappers and lets your app receive signals directly. [2]

  1. If you must use a shell script entrypoint, exec the long-running process
#!/bin/sh
set -e
# setup...
exec my-app "$@"

This replaces the shell with your app, so the app becomes PID 1 and receives SIGTERM. Docker’s own docs call out using exec for clean docker stop behavior. [2]

  1. Use a minimal init as PID 1 (recommended): --init (tini) or bake it in
  • Docker CLI: docker run --init … makes an init process run as PID 1 (backed by tini) and handle duties like signal forwarding and zombie reaping. [3]
  • Or explicitly: ENTRYPOINT ["tini","--","/your/entrypoint.sh"] (tini docs). [4]

Most common fix: switch to exec-form + ensure your entrypoint script ends with exec …, or run with --init.

Sources:
[1] Docker run reference (PID 1 signal behavior) https://www.docs4dev.com/docs/docker/19.03/engine/reference/run/
[2] Dockerfile reference (shell vs exec form; shell form doesn’t pass signals; use exec) https://docs.docker.com/reference/builder
[3] docker container run docs (--init, docker-init backed by tini) https://docs.docker.com/reference/cli/docker/container/run
[4] tini README (--init and usage) https://github.com/krallin/tini


🌐 Web query:

umask 077 file permissions /tmp default permissions bash

💡 Result:

  • umask 077 affects new files/directories your process creates anywhere, including under /tmp. It removes all group/other permissions from the requested mode. (sans.org)

    • Typical result:
      • New files: 0666 & ~0077 = 0600 (rw-------)
      • New directories: 0777 & ~0077 = 0700 (rwx------) (sans.org)
  • /tmp itself is normally world-writable with the sticky bit: mode 1777 (drwxrwxrwt). Your umask does not change /tmp’s existing permissions; it only affects what you create inside it. (linuxconfig.org)

  • Bash is not special here: if you set umask 077 in your shell (or in ~/.bashrc, etc.), then commands you run from that shell inherit it and will create /tmp files accordingly (unless the program explicitly chmods after creation or requests a more restrictive mode). (bs2manuals.ts.fujitsu.com)

  • Many systems ensure /tmp has mode 1777 at boot via systemd-tmpfiles rules (example entries show 1777 directories under /tmp). (freedesktop.org)

Citations:


🌐 Web query:

docker-compose entrypoint exec umask syntax YAML multiline string

💡 Result:

Use exec-form (entrypoint: as a YAML list) and run a shell explicitly so you can set umask and then exec the real process (so it becomes PID 1 and receives signals). Compose allows entrypoint as either a string or a list; the list form is the “Dockerfile-like”/exec style. [1]

services:
  app:
    image: your-image
    entrypoint:
      - /bin/sh
      - -c
      - |
          umask 002
          exec "$@"
      - --
    command: ["your-binary", "arg1", "arg2"]

Notes:

  • | is just YAML “multiline string”; it becomes the single script argument to sh -c (newlines included).
  • The -- is used as the $0 placeholder for sh -c; then command: becomes $@, which exec "$@" runs.
  • Using exec ... ensures the final process is PID 1 (important for signal handling), as recommended in Docker’s ENTRYPOINT docs. [2]

Sources: Docker Compose entrypoint reference [1]; Docker ENTRYPOINT shell vs exec form + exec for proper PID 1 behavior [2].

[1] (docs.docker.com)
[2] (docs.docker.com)

Citations:


Use exec and tighten token file permissions in the entrypoint.

Bash as PID1 prevents proper signal forwarding—docker stop may hang until SIGKILL. The token file in /tmp defaults to world-readable mode. Add umask 077 and exec the Python process to make it PID1 and restrict the token file to owner-only (0600).

🛠️ Proposed fix
-    entrypoint: >
-      /bin/bash -c "
-      printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
-      /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
-      "
+    entrypoint: >
+      /bin/bash -c "
+      umask 077
+      printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
+      exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+      "
🤖 Prompt for AI Agents
In `@docker-compose-library.yaml` around lines 69 - 73, The entrypoint currently
starts a bash shell that writes the token and launches Python as a child;
replace that with a shell sequence that tightens token permissions and then uses
exec to replace PID 1 with the Python process: set umask 077 before creating the
token (or explicitly chmod the token to 0600 after printf to
/tmp/lightspeed-mcp-test-token) and then exec /app-root/.venv/bin/python3.12
/app-root/src/lightspeed_stack.py so the Python process becomes PID 1 and
receives signals correctly; update the entrypoint block that uses /bin/bash -c
and printf accordingly.

Comment on lines +87 to +91
entrypoint: >
/bin/bash -c "
printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
/app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n docker-compose.yaml | sed -n '85,95p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 519


🌐 Web query:

bash PID1 signal handling swallow signals docker

💡 Result:

When you run a shell script as the container “main process”, bash becomes PID 1, and PID 1 has special signal semantics in Linux containers:

  • PID 1 doesn’t get the normal “default action” for many fatal signals (like SIGTERM): if PID 1 hasn’t installed a handler, the signal may effectively do nothing, so docker stop (SIGTERM) appears “swallowed”. This is why containers can linger until Docker eventually sends SIGKILL. [1], [2]
  • Even if bash traps/exits on SIGTERM, it typically won’t forward that signal to its child process(es) unless you explicitly do so—so your actual app keeps running. [3]

What to do instead (common fixes)

  1. Prefer making your app PID 1 (avoid bash as PID 1)
    Use Docker “exec form” so your process replaces the shell (no extra PID 1 shell layer). (This avoids the “shell doesn’t forward signals” problem.) [3]

  2. Add a minimal init as PID 1 (recommended for general robustness)
    Run with Docker’s built-in init (tini) so signals are forwarded and zombies are reaped: docker run --init … (or init: true in Compose). [4]

  3. If you must use bash as PID 1: trap + forward signals
    At minimum, install a SIGTERM trap (so PID 1 actually exits) and explicitly propagate termination to the child process group (otherwise children may ignore it). The need to trap TERM as PID 1 is the key point. [3]

Sources:
[1] https://stackoverflow.com/questions/71448484/why-does-foreground-job-ignore-job-control-signals-when-bash-is-running-as-pid-1
[2] https://semisignal.com/sigterm-and-pid1/
[3] https://cloud.theodo.com/en/blog/docker-processes-container
[4] https://raby.sh/sigterm-and-pid-1-why-does-a-container-linger-after-receiving-a-sigterm.html


🌐 Web query:

/tmp file permissions default linux umask security

💡 Result:

  • Default /tmp directory permissions on most Linux distros are drwxrwxrwt (octal 1777) owned by root:root: world-writable with the sticky bit so users can’t delete/rename other users’ files in /tmp even though everyone can create files there. This is commonly enforced at boot by systemd-tmpfiles rules (often D /tmp 1777 root root -). [1] [2]

  • umask controls the default permissions of new files/dirs created inside /tmp (it does not change /tmp’s own 1777 mode). When a program creates a file, the kernel applies:
    final_mode = requested_mode & ~umask (and similarly for directories). Typical default umask is 022, so a file created with requested mode 0666 becomes 0644. [3]

  • Security implications / best practice

    • /tmp is a shared namespace: predictable names like /tmp/foo can be pre-created by another user to cause failures or trick a program (DoS, symlink/hardlink-style attacks). Use unique, non-guessable names and safe creation patterns (e.g., “create-new” semantics), and prefer honoring $TMPDIR when set. [4]
    • The sticky bit does not prevent reads. If your program creates temp files with permissive modes (because it requests 0666 and relies on umask), other users may read them. Set sensitive temp files to owner-only permissions explicitly (and don’t rely on umask alone). [3]
    • For services, consider systemd PrivateTmp=yes to give the service its own private /tmp (and /var/tmp) namespace, reducing cross-process attacks via shared temp directories. [5]
  • Persistence note (/tmp vs /var/tmp)

    • /tmp is often cleared at boot and may be on tmpfs (RAM/swap), while /var/tmp is generally more persistent and intended to survive reboots. [6] [4]

Sources:
[1] https://www.baeldung.com/linux/sticky-bit
[2] https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html
[3] https://manpages.debian.org/testing/manpages-dev/umask.2.en.html
[4] https://systemd.io/TEMPORARY_DIRECTORIES/
[5] https://www.redhat.com/de/blog/new-red-hat-enterprise-linux-7-security-feature-privatetmp
[6] https://unix.stackexchange.com/questions/30489/what-is-the-difference-between-tmp-and-var-tmp


Use exec and tighten token file permissions in the entrypoint.

Bash as PID 1 in containers won't forward SIGTERM to child processes and may ignore the signal itself without an explicit trap, preventing graceful shutdown. Additionally, the token file is created with default umask (typically 022), making it world-readable in the container. Apply umask 077 to restrict the token to owner-only permissions and use exec to make the Python process PID 1:

🛠️ Proposed fix
-    entrypoint: >
-      /bin/bash -c "
-      printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
-      /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
-      "
+    entrypoint: >
+      /bin/bash -c "
+      umask 077
+      printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
+      exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+      "
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
entrypoint: >
/bin/bash -c "
printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
/app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
"
entrypoint: >
/bin/bash -c "
umask 077
printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
"
🤖 Prompt for AI Agents
In `@docker-compose.yaml` around lines 87 - 91, The entrypoint should drop into
the Python process as PID 1 and create the token file with owner-only
permissions; update the entrypoint snippet that writes the token and launches
/app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py to first set
umask 077 (so the token file is created with owner-only perms) and then use exec
to replace the shell with the Python process (so the Python process becomes PID
1 and receives signals); locate the multline entrypoint block that prints
'test-secret-token-123' into /tmp/lightspeed-mcp-test-token and modify it
accordingly.

@blublinsky blublinsky force-pushed the mcp-e2e-tests branch 2 times, most recently from 924666a to 490a94c Compare February 2, 2026 13:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/app/main.py`:
- Around line 236-241: The current string check using scope["path"] against
self.app_routes_paths should be replaced with route template matching so
parameterized routes are recognized and metric labels use the template, not raw
paths; locate the middleware logic referencing scope["path"],
self.app_routes_paths and the call to await self.app(scope, receive, send) and
use each Starlette Route.matches(scope) to find the matching Route, skip metrics
only when no route matches, and derive the metric label from the Route.path
(template) rather than scope["path"] to avoid Prometheus cardinality explosion.
🧹 Nitpick comments (2)
dev-tools/mcp-mock-server/server.py (2)

63-67: Consider simplifying the function signature formatting.

The multiline return type annotation with None on its own line and trailing comment is unconventional. A single line suffices:

def do_POST(self) -> None:  # pylint: disable=invalid-name,too-many-locals,...

97-113: Consider using standard if-elif for token matching.

The match True: pattern with guard clauses is unconventional. A standard if-elif chain would be more readable:

♻️ Suggested refactor
-        match True:
-            case _ if "test-secret-token" in auth_header:
-                tool_name = "mock_tool_file"
-                tool_desc = "Mock tool with file-based auth"
-            case _ if "my-k8s-token" in auth_header:
-                tool_name = "mock_tool_k8s"
-                tool_desc = "Mock tool with Kubernetes token"
-            case _ if "my-client-token" in auth_header:
-                tool_name = "mock_tool_client"
-                tool_desc = "Mock tool with client-provided token"
-            case _ if "error-mode" in auth_header:
-                tool_name = "mock_tool_error"
-                tool_desc = "Mock tool configured to return errors"
-                error_mode = True
-            case _:
-                # Default case already set above
-                pass
+        if "test-secret-token" in auth_header:
+            tool_name = "mock_tool_file"
+            tool_desc = "Mock tool with file-based auth"
+        elif "my-k8s-token" in auth_header:
+            tool_name = "mock_tool_k8s"
+            tool_desc = "Mock tool with Kubernetes token"
+        elif "my-client-token" in auth_header:
+            tool_name = "mock_tool_client"
+            tool_desc = "Mock tool with client-provided token"
+        elif "error-mode" in auth_header:
+            tool_name = "mock_tool_error"
+            tool_desc = "Mock tool configured to return errors"
+            error_mode = True

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/e2e/features/environment.py`:
- Around line 263-273: Multiple auth-related tag blocks (e.g., the MCP block
using create_config_backup, switch_config, restart_container) can run for the
same feature and cause double swaps/restores; add a guard so config swapping
happens only once per feature by tracking the active swap on the context (for
example set context.feature_config and a boolean like context.config_swapped or
context.active_feature_config) and only call
create_config_backup/switch_config/restart_container when that flag is not set,
and ensure the corresponding restore logic checks the same flag and clears it
after restoring so the second tag won't attempt to restore a removed backup.
🧹 Nitpick comments (2)
tests/e2e/features/steps/conversation.py (1)

317-361: Reuse shared retry/timeout constants for consistency.
Keeps polling behavior aligned with the module-wide configuration and makes the logged “waited” time accurate under the cap.

♻️ Suggested refactor
-    max_retries = 10
-    delay = 0.2  # Start with 200ms
+    max_retries = MAX_RETRIES
+    delay = INITIAL_RETRY_DELAY  # Start with 200ms
@@
-                    f"(waited {sum(0.2 * (2 ** i) for i in range(attempt)):.2f}s)"
+                    f"(waited {sum(min(INITIAL_RETRY_DELAY * (2 ** i), MAX_RETRY_DELAY) for i in range(attempt)):.2f}s)"
@@
-            delay = min(delay * 2, 2.0)  # Exponential backoff, cap at 2s
+            delay = min(delay * 2, MAX_RETRY_DELAY)  # Exponential backoff, cap at 2s
@@
-            context.response = requests.get(url, headers=headers, timeout=10)
+            context.response = requests.get(url, headers=headers, timeout=DEFAULT_TIMEOUT)
tests/e2e/features/steps/llm_query_response.py (1)

3-4: Make the persistence wait configurable for environment-specific tuning (optional improvement)

The documented persistence timing includes topic summary generation (2-5s), and code comments note empirically observed persistence at 2-3s in CI. The 3s delay was deliberately chosen to account for this. While making the wait configurable via environment variable is reasonable for different test environments, this is an optional refactor rather than a flakiness fix, since the current value appears well-calibrated to observed CI behavior.

Suggested approach
+import os
 import time
@@
-    time.sleep(3)
+    wait_seconds = float(os.getenv("E2E_PERSIST_WAIT_SECONDS", "3"))
+    time.sleep(wait_seconds)

Comment on lines +263 to +278
if "MCP" in feature.tags:
# For MCP tests, we need noop-with-token auth to support k8s token forwarding
# Use mode-specific configs (server vs library)
mode_dir = "library-mode" if context.is_library_mode else "server-mode"
context.feature_config = (
f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-mcp.yaml"
)
context.default_config_backup = create_config_backup("lightspeed-stack.yaml")
switch_config(context.feature_config)
restart_container("lightspeed-stack")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against double config swaps if feature tags overlap.

Because each auth-related tag uses independent if blocks, a feature tagged with MCP and another auth tag would switch configs twice and restore twice; the second restore can fail once the backup is removed. Consider enforcing mutual exclusivity or tracking the active feature config and restoring once.

Also applies to: 301-305

🤖 Prompt for AI Agents
In `@tests/e2e/features/environment.py` around lines 263 - 273, Multiple
auth-related tag blocks (e.g., the MCP block using create_config_backup,
switch_config, restart_container) can run for the same feature and cause double
swaps/restores; add a guard so config swapping happens only once per feature by
tracking the active swap on the context (for example set context.feature_config
and a boolean like context.config_swapped or context.active_feature_config) and
only call create_config_backup/switch_config/restart_container when that flag is
not set, and ensure the corresponding restore logic checks the same flag and
clears it after restoring so the second tag won't attempt to restore a removed
backup.

@blublinsky blublinsky force-pushed the mcp-e2e-tests branch 2 times, most recently from b5c8a15 to 00e41fd Compare February 2, 2026 14:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/common.py`:
- Around line 32-34: The docstring is incorrect about Behave auto-resetting auth
headers; update the after_scenario hook in environment.py to explicitly clear
any leaked auth headers by checking for and deleting context.auth_headers (e.g.,
if hasattr(context, "auth_headers"): del context.auth_headers), or
alternatively, revise the docstring to state that Behave preserves custom
context attributes across scenarios and explain why clearing is intentionally
skipped; locate the after_scenario function and apply one of these two fixes
referencing context.auth_headers and after_scenario.
🧹 Nitpick comments (6)
tests/integration/endpoints/test_query_v2_integration.py (2)

36-62: Consider moving wait_for_background_tasks to conftest.py for reuse.

This helper is valuable for coordinating with background persistence. According to your project conventions, shared fixtures and test utilities should be in conftest.py. If other test modules need this pattern (e.g., streaming tests), centralizing it would reduce duplication.

The implementation itself is sound - it properly snapshots background_tasks_set to avoid iteration errors and provides adequate delays for MCP cleanup and thread pool operations.

Based on learnings: "Use conftest.py for shared fixtures in pytest"


163-170: Mock signature accepts **kwargs but asyncio.to_thread doesn't support keyword arguments.

asyncio.to_thread(func, /, *args) only accepts positional arguments for the target function. The mock accepts **kwargs, which could hide bugs if production code mistakenly passes keyword arguments (they'd work in tests but fail in production).

🔧 Suggested fix to match real signature
-    async def mock_to_thread(func: Any, *args: Any, **kwargs: Any) -> Any:
-        """Run function synchronously instead of in thread pool for tests."""
-        return func(*args, **kwargs)
+    async def mock_to_thread(func: Any, *args: Any) -> Any:
+        """Run function synchronously instead of in thread pool for tests."""
+        return func(*args)
dev-tools/mcp-mock-server/server.py (1)

97-113: Non-idiomatic match True pattern for conditional branching.

Using match True: with case _ if condition: works but is an unusual pattern that reduces readability. Since you're matching against string content, a series of if/elif statements would be more conventional and clearer.

♻️ Suggested refactor using if/elif
-        match True:
-            case _ if "test-secret-token" in auth_header:
-                tool_name = "mock_tool_file"
-                tool_desc = "Mock tool with file-based auth"
-            case _ if "my-k8s-token" in auth_header:
-                tool_name = "mock_tool_k8s"
-                tool_desc = "Mock tool with Kubernetes token"
-            case _ if "my-client-token" in auth_header:
-                tool_name = "mock_tool_client"
-                tool_desc = "Mock tool with client-provided token"
-            case _ if "error-mode" in auth_header:
-                tool_name = "mock_tool_error"
-                tool_desc = "Mock tool configured to return errors"
-                error_mode = True
-            case _:
-                # Default case already set above
-                pass
+        if "test-secret-token" in auth_header:
+            tool_name = "mock_tool_file"
+            tool_desc = "Mock tool with file-based auth"
+        elif "my-k8s-token" in auth_header:
+            tool_name = "mock_tool_k8s"
+            tool_desc = "Mock tool with Kubernetes token"
+        elif "my-client-token" in auth_header:
+            tool_name = "mock_tool_client"
+            tool_desc = "Mock tool with client-provided token"
+        elif "error-mode" in auth_header:
+            tool_name = "mock_tool_error"
+            tool_desc = "Mock tool configured to return errors"
+            error_mode = True
+        # else: defaults already set above
tests/e2e/features/steps/conversation.py (1)

310-371: Minor duplication of retry/backoff logic.

This function implements its own retry loop with exponential backoff, duplicating the pattern in poll_for_conversation. The use cases differ (searching a list vs. fetching by ID), so some divergence is acceptable. However, extracting a common backoff utility could reduce maintenance burden.

The implementation is correct and addresses the past review concern about validating response status and payload.

src/app/main.py (2)

96-113: Add a shutdown timeout to avoid hanging on stuck background tasks.
If a background task stalls, shutdown can block indefinitely. Consider a bounded wait and cancel-on-timeout.

🛠️ Possible refinement
-        all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks)
-        await asyncio.gather(*all_tasks, return_exceptions=True)
+        all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks)
+        try:
+            await asyncio.wait_for(
+                asyncio.gather(*all_tasks, return_exceptions=True),
+                timeout=30,
+            )
+        except asyncio.TimeoutError:
+            logger.warning("Background tasks did not finish within 30s; cancelling")
+            for task in all_tasks:
+                task.cancel()
+            await asyncio.gather(*all_tasks, return_exceptions=True)

205-257: Use Starlette's Route.matches() for more robust route matching.

The current manual segment matching doesn't handle path converters (e.g., {param:path}) or mounted sub-applications. While the codebase currently uses only simple parameterized routes, using Route.matches(scope) is the proper Starlette approach and future-proofs the metrics collection against additional route patterns.

♻️ Suggested approach
-from starlette.routing import Mount, Route, WebSocketRoute
+from starlette.routing import Match, Mount, Route, WebSocketRoute
@@
-    def __init__(  # pylint: disable=redefined-outer-name
-        self, app: Any, app_routes_paths: list[str]
-    ) -> None:
+    def __init__(  # pylint: disable=redefined-outer-name
+        self, app: Any, app_routes: list[Route]
+    ) -> None:
@@
-        self.app_routes_paths = app_routes_paths
+        self.app_routes = app_routes
@@
-        route_path = None
-        for route_template in self.app_routes_paths:
-            # Simple parameterized route matching: check if path segments match template
-            # This handles routes like /conversations/{conversation_id}
-            path_parts = path.strip("/").split("/")
-            template_parts = route_template.strip("/").split("/")
-
-            if len(path_parts) == len(template_parts):
-                match = all(
-                    tp.startswith("{")
-                    and tp.endswith("}")  # Template parameter
-                    or tp == pp  # Exact match
-                    for tp, pp in zip(template_parts, path_parts)
-                )
-                if match:
-                    route_path = route_template
-                    break
+        matched_route = next(
+            (r for r in self.app_routes if r.matches(scope)[0] == Match.FULL),
+            None,
+        )
+        route_path = matched_route.path if matched_route else None
-app_routes_paths = [
-    route.path
-    for route in app.routes
-    if isinstance(route, (Mount, Route, WebSocketRoute))
-]
+app_routes = [
+    route
+    for route in app.routes
+    if isinstance(route, (Mount, Route, WebSocketRoute))
+]
@@
-app.add_middleware(MetricsMiddleware, app_routes_paths=app_routes_paths)
+app.add_middleware(MetricsMiddleware, app_routes=app_routes)

Comment on lines 32 to 34
Note: Does NOT clear auth headers, as those may be set in Background section
and should persist for the entire scenario. Auth headers are automatically
reset by Behave between scenarios.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for Behave hooks that might clear auth headers between scenarios
find . -path '*/e2e/*' -name 'environment.py' -type f -exec cat -n {} +

# Search for auth header related code in e2e tests
rg -n 'auth.*header|Authorization|\.auth|headers.*=.*\{\}' --glob '**/e2e/**/*.py' -A 2 -B 2

Repository: lightspeed-core/lightspeed-stack

Length of output: 46948


Update the docstring or add explicit auth header clearing to after_scenario hook in environment.py.

The docstring states auth headers are "automatically reset by Behave between scenarios," but this is incorrect. Behave's Context object persists custom attributes across scenarios unless explicitly cleared. Since there is no clearing of auth_headers in the after_scenario hook in environment.py, auth headers set in one scenario will leak into the next scenario, causing test pollution.

Either:

  1. Add explicit clearing in after_scenario hook: if hasattr(context, "auth_headers"): del context.auth_headers
  2. Update the docstring to reflect actual behavior and clarify why clearing is intentionally skipped
🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/common.py` around lines 32 - 34, The docstring is
incorrect about Behave auto-resetting auth headers; update the after_scenario
hook in environment.py to explicitly clear any leaked auth headers by checking
for and deleting context.auth_headers (e.g., if hasattr(context,
"auth_headers"): del context.auth_headers), or alternatively, revise the
docstring to state that Behave preserves custom context attributes across
scenarios and explain why clearing is intentionally skipped; locate the
after_scenario function and apply one of these two fixes referencing
context.auth_headers and after_scenario.

@blublinsky
Copy link
Contributor Author

/retest

@openshift-ci
Copy link

openshift-ci bot commented Feb 2, 2026

@blublinsky: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/app/endpoints/query.py`:
- Line 17: There's a syntax typo in the import statement for TextContentItem:
change the malformed "rom
llama_stack_client.types.shared.interleaved_content_item import TextContentItem"
to a proper "from ..." import so the module can load; locate the import that
references TextContentItem in the file (query.py) and correct the keyword to
"from" to fix the SyntaxError.
- Line 4: Remove the unused import of the ast module from the top of
src/app/endpoints/query.py by deleting the "import ast" line; update the import
block in that file (e.g., near other imports in the module) so only actually
used modules remain (no changes to functions/classes like any query handler
functions are required).
🧹 Nitpick comments (6)
tests/e2e/features/steps/conversation.py (1)

310-371: Reuse shared retry constants to avoid drift.
This block redefines retry values and timeout literals that already exist. Using the shared constants keeps backoff/timeout behavior consistent across helpers.

♻️ Suggested refactor
-    max_retries = 10
-    delay = 0.2  # Start with 200ms
+    max_retries = MAX_RETRIES
+    delay = INITIAL_RETRY_DELAY  # Start with configured delay
@@
-                print(
-                    f"✅ Conversation found in list after {attempt + 1} attempts "
-                    f"(waited {sum(0.2 * (2 ** i) for i in range(attempt)):.2f}s)"
-                )
+                print(
+                    f"✅ Conversation found in list after {attempt + 1} attempts "
+                    f"(waited {sum(min(INITIAL_RETRY_DELAY * (2 ** i), MAX_RETRY_DELAY) for i in range(attempt)):.2f}s)"
+                )
@@
-            delay = min(delay * 2, 2.0)  # Exponential backoff, cap at 2s
+            delay = min(delay * 2, MAX_RETRY_DELAY)  # Exponential backoff, cap at configured max
@@
-            context.response = requests.get(url, headers=headers, timeout=10)
+            context.response = requests.get(
+                url, headers=headers, timeout=DEFAULT_TIMEOUT
+            )
tests/e2e/configuration/server-mode/lightspeed-stack-mcp.yaml (1)

51-51: Add newline at end of file.

POSIX convention requires files to end with a newline. This avoids issues with tools that expect it.

src/app/endpoints/streaming_query_v2.py (1)

337-369: Background cleanup design is sound, but the 0.5s delay is a magic number.

The detached background task pattern correctly avoids MCP session cancellation issues. However, the hardcoded 0.5 second delay is undocumented beyond inline comments. Consider extracting this to a named constant for clarity and easier adjustment.

♻️ Optional: Extract delay to constant
+# Delay before background cleanup to allow MCP sessions to close
+MCP_CLEANUP_DELAY_SECONDS = 0.5
+
 async def cleanup_task() -> None:
     """Background cleanup after streaming response is sent."""
     # Small delay to ensure response is fully sent and MCP cleanup completes
-    await asyncio.sleep(0.5)
+    await asyncio.sleep(MCP_CLEANUP_DELAY_SECONDS)
src/app/endpoints/query.py (2)

101-102: Use % formatting instead of f-strings in logger calls.

F-strings are evaluated immediately, while % formatting with logger methods enables lazy evaluation—the string is only formatted if the log level is enabled. This is a minor performance optimization.

♻️ Suggested change
         logger.debug(
-            f"Background task created, active tasks: {len(background_tasks_set)}"
+            "Background task created, active tasks: %d", len(background_tasks_set)
         )
     except Exception as e:
-        logger.error(f"Failed to create background task: {e}", exc_info=True)
+        logger.error("Failed to create background task: %s", e, exc_info=True)

80-105: Consider extracting shared create_background_task to a utility module.

This function is duplicated in both query.py and streaming_query_v2.py. Extract to a shared utility (e.g., utils/background_tasks.py) to maintain a single source of truth.

src/app/main.py (1)

193-288: Consider using Route.matches for robust path matching.
The manual segment matcher won’t handle mounts or path converters (e.g., {path:path}), which can silently drop metrics for valid routes. Using Starlette’s route matching avoids these false negatives and keeps labels stable.

♻️ Suggested refactor using Starlette route matching
-from starlette.routing import Mount, Route, WebSocketRoute
+from starlette.routing import Mount, Route, WebSocketRoute, Match
@@
-    def __init__(  # pylint: disable=redefined-outer-name
-        self, app: Any, app_routes_paths: list[str]
-    ) -> None:
+    def __init__(  # pylint: disable=redefined-outer-name
+        self, app: Any, app_routes: list[Route]
+    ) -> None:
@@
-        self.app_routes_paths = app_routes_paths
+        self.app_routes = app_routes
@@
-        route_path = None
-        for route_template in self.app_routes_paths:
-            ...
-        if route_path is None:
+        matched_route = next(
+            (r for r in self.app_routes if r.matches(scope)[0] == Match.FULL),
+            None,
+        )
+        if matched_route is None:
             await self.app(scope, receive, send)
             return
+        route_path = matched_route.path
@@
-app_routes_paths = [
-    route.path
-    for route in app.routes
-    if isinstance(route, (Mount, Route, WebSocketRoute))
-]
+app_routes = [
+    route
+    for route in app.routes
+    if isinstance(route, (Mount, Route, WebSocketRoute))
+]
@@
-app.add_middleware(MetricsMiddleware, app_routes_paths=app_routes_paths)
+app.add_middleware(MetricsMiddleware, app_routes=app_routes)

"""Handler for REST API call to provide answer to query."""

import asyncio
import ast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n src/app/endpoints/query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 32894


🏁 Script executed:

rg -n '\bast\.' src/app/endpoints/query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

rg -n '\bast\b' src/app/endpoints/query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 88


Remove unused ast import at line 4.

The ast module is imported but never used in this file. Remove it to keep imports clean.

🤖 Prompt for AI Agents
In `@src/app/endpoints/query.py` at line 4, Remove the unused import of the ast
module from the top of src/app/endpoints/query.py by deleting the "import ast"
line; update the import block in that file (e.g., near other imports in the
module) so only actually used modules remain (no changes to functions/classes
like any query handler functions are required).

RateLimitError, # type: ignore
)
from llama_stack_client.types.model_list_response import ModelListResponse
rom llama_stack_client.types.shared.interleaved_content_item import TextContentItem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical syntax error: rom should be from.

This typo will cause a SyntaxError at module load time, preventing the application from starting.

🐛 Fix the import statement
-rom llama_stack_client.types.shared.interleaved_content_item import TextContentItem
+from llama_stack_client.types.shared.interleaved_content_item import TextContentItem
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rom llama_stack_client.types.shared.interleaved_content_item import TextContentItem
from llama_stack_client.types.shared.interleaved_content_item import TextContentItem
🤖 Prompt for AI Agents
In `@src/app/endpoints/query.py` at line 17, There's a syntax typo in the import
statement for TextContentItem: change the malformed "rom
llama_stack_client.types.shared.interleaved_content_item import TextContentItem"
to a proper "from ..." import so the module can load; locate the import that
references TextContentItem in the file (query.py) and correct the keyword to
"from" to fix the SyntaxError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants