-
Notifications
You must be signed in to change notification settings - Fork 70
fixing MCP headers usag for Llama stack 0.4.x, adding additional e2e tests for mcp servers #1080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughMock MCP server changed to JSON‑RPC (initialize, tools/list, tools/call) with error-mode and /debug/clear; app now records per-server Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant App as Query Endpoint
participant MCP as MCP Mock Server
participant LLM as Llama Stack API
Client->>App: POST query (may include MCP-HEADERS)
App->>MCP: JSON-RPC initialize (any POST path)
MCP-->>App: initialize response (protocolVersion, capabilities)
App->>MCP: JSON-RPC tools/list
MCP-->>App: list of tools (tool_name, inputSchema)
App->>LLM: provide tool definitions (include per-server authorization)
LLM-->>App: instruct tool call
App->>MCP: JSON-RPC tools/call (tool_name, args)
alt error_mode token present
MCP-->>App: error result
else
MCP-->>App: success result (mocked)
end
App->>LLM: supply tool result
LLM-->>App: final response
App-->>Client: return augmented response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
b30f848 to
4b358ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/mcp.py`:
- Around line 699-713: The step function
check_streaming_response_has_tool_results only checks status_code but doesn't
parse SSE events; update it to decode context.response.content (or iterate
response.iter_lines() if streaming), extract lines starting with "data:",
collect/parses SSE payloads, then assert that at least one event contains
tool-related content (e.g., contains "tool" or expected tool result keys) and
print the count of matching events; keep using context.response and check for
None/status before parsing.
- Around line 268-281: The step function check_service_logs_contain currently
only prints a message and requires manual verification; replace the manual
prints with a deterministic assertion that programmatically fetches and searches
the service logs (e.g., read from a known log file or query the running
service/container logs via context or a helper like get_service_logs) and raise
an AssertionError if log_fragment is not present; update the
check_service_logs_contain implementation to obtain logs from the test context
(or a utility function), search for log_fragment, and assert presence with a
clear failure message so the test becomes automated and deterministic.
- Around line 428-445: The step function check_tools_list_calls currently only
confirms the mock server is reachable but does not assert any requests contained
the "tools/list" method; either implement a real inspection of request bodies by
fetching the debug request list from the mock server (use the requests.get to
retrieve the full payloads from mock_server_url+"/debug/requests", iterate the
returned requests and assert any body or JSON contains "tools/list") or
explicitly document this limitation in the feature file/tests rather than
silently passing; update check_tools_list_calls to perform the body/JSON check
or adjust the feature to note the verification is not possible without
request-body logging.
- Around line 253-266: The step function check_mcp_tool_in_log currently only
prints and must instead retrieve the MCP mock server request log from the test
context (e.g., context.mcp_request_log or via the MCP test client available in
context), scan the recorded requests for the expected tool_name by inspecting
the Authorization header or the response body (tools/list) entries, and assert
that at least one request contains the expected tool_name; update the function
to raise an assertion (or use behave's assert) with a clear failure message if
no matching entry is found so the test actually fails when the tool is missing.
- Around line 531-541: The configure_mock_server_errors step currently only sets
context.expect_tool_errors and prints a placeholder but the mock server and
subsequent assertions never consume that flag; implement real error-mode support
by adding an error injection mechanism in the MCP mock server (e.g., a query
param/header or a toggle handler) that, when enabled, causes tool execution
endpoints used by tests to return deterministic error responses; update the
configure_mock_server_errors function to flip that toggle on (and ensure it
resets after tests), ensure the flag/context value is actually passed to
requests sent by tests (or the mock reads the toggle) so the mock returns
errors, and implement the corresponding step definitions for "The response
should indicate tool execution failed" and "The service logs should contain tool
failure information" to assert the error payload and log entries are present
(reference configure_mock_server_errors and the failing test scenario names to
locate where to wire the toggle and assertions).
🧹 Nitpick comments (9)
dev-tools/mcp-mock-server/server.py (1)
149-154: Consider consistent truncation handling.The auth preview truncation adds
"..."unconditionally in the result text, even when the header is shorter than 50 characters (line 154 always appends"...").Minor fix for truncation suffix
- auth_preview = ( - auth_header[:50] if len(auth_header) > 50 else auth_header - ) - result_text = ( - f"Mock tool '{tool_called}' executed successfully " - f"with arguments: {arguments}. Auth used: {auth_preview}..." - ) + if len(auth_header) > 50: + auth_preview = f"{auth_header[:50]}..." + else: + auth_preview = auth_header + result_text = ( + f"Mock tool '{tool_called}' executed successfully " + f"with arguments: {arguments}. Auth used: {auth_preview}" + )src/app/endpoints/query_v2.py (2)
1040-1044: Consider using debug level for per-request tool preparation.The
logger.infofor prepared MCP tools will emit a log line for every request. For high-traffic scenarios, this could be verbose. Consider usinglogger.debugfor the per-request detail and keepinginfofor startup/configuration logging only.Use debug level for per-request logging
- logger.info( + logger.debug( "Prepared %d MCP tool(s) for request: %s", len(mcp_tools), ", ".join(mcp_server_names), )
1064-1070: Summary logging at info level may be too verbose for production.Similar to the above, the summary log at
infolevel will emit for every request. Considerdebuglevel or gating behind a more restrictive condition.Consider debug level for per-request summary
- if toolgroups: - logger.info( + if toolgroups: + logger.debug( "Prepared %d total tool(s) for Responses API: %d RAG + %d MCP", len(toolgroups), len(rag_tools) if rag_tools else 0, len(mcp_tools) if mcp_tools else 0, )tests/e2e/features/mcp_tools.feature (2)
20-25: Consider adding MCP-specific assertions to the info endpoint test.This scenario verifies only that the info endpoint returns 200. To validate MCP configuration is reported correctly, consider adding a step that checks for MCP-related fields in the response body.
74-74: Magic number for request count may be fragile.The assertion
at least 6 requestsassumes a specific number of MCP protocol exchanges. If the mock server or protocol changes, this could break. Consider documenting why 6 is expected (e.g., 2 requests × 3 servers = 6) or using a more resilient assertion.tests/e2e/features/steps/mcp.py (4)
39-39: Extract mock server URL to a constant or context variable.The mock server URL
http://localhost:9000is hardcoded in multiple places throughout this file. Consider extracting it to a module-level constant or storing it in the context for easier configuration and maintenance.♻️ Suggested approach
+MCP_MOCK_SERVER_URL = "http://localhost:9000" + + `@given`("The MCP mock server request log is cleared") def clear_mcp_mock_server_log(context: Context) -> None: ... - mock_server_url = "http://localhost:9000" + mock_server_url = getattr(context, "mcp_mock_server_url", MCP_MOCK_SERVER_URL)Apply similarly to all other occurrences in this file.
88-98: Fixed sleep is fragile for synchronization.Using
time.sleep(2)for waiting on async operations can lead to flaky tests if the server is slow, or wasted time if it's fast. Consider polling the mock server's request count with a timeout instead.♻️ Suggested polling approach
`@when`("I wait for MCP server to receive requests") def wait_for_mcp_requests(context: Context) -> None: - # Wait for requests to be processed - time.sleep(2) - print("⏱️ Waited for MCP server to process requests") + """Poll MCP mock server until requests are received or timeout.""" + mock_server_url = "http://localhost:9000" + timeout = 10 + poll_interval = 0.5 + elapsed = 0 + + while elapsed < timeout: + try: + response = requests.get(f"{mock_server_url}/debug/requests", timeout=2) + if response.status_code == 200 and len(response.json()) > 0: + print(f"⏱️ MCP server received requests after {elapsed:.1f}s") + return + except requests.RequestException: + pass + time.sleep(poll_interval) + elapsed += poll_interval + + print("⚠️ Timeout waiting for MCP server requests")
100-151: Consider extracting a helper for fetching mock server requests.Both
check_mcp_server_received_requestsandcheck_mcp_server_request_countshare the same pattern for fetching and parsing the debug requests. A helper function would reduce duplication.♻️ Suggested helper
def _get_mock_server_requests(mock_server_url: str = "http://localhost:9000") -> list: """Fetch requests log from MCP mock server.""" response = requests.get(f"{mock_server_url}/debug/requests", timeout=5) if response.status_code != 200: raise AssertionError(f"Failed to get debug requests: {response.status_code}") return response.json()
153-251: Three auth header verification functions share nearly identical logic.
check_file_auth_header,check_k8s_auth_header, andcheck_client_auth_headerhave the same structure with minor variations (exact match vs. contains). Consider consolidating into a reusable helper to reduce duplication.
| @then("The streaming response should contain tool execution results") | ||
| def check_streaming_response_has_tool_results(context: Context) -> None: | ||
| """Verify streaming response contains tool execution results. | ||
|
|
||
| Parameters: | ||
| context (Context): Behave context. | ||
| """ | ||
| assert context.response is not None, "No response received" | ||
| assert ( | ||
| context.response.status_code == 200 | ||
| ), f"Bad status: {context.response.status_code}" | ||
|
|
||
| # For streaming responses, we'd need to parse SSE events | ||
| # For now, just verify we got a successful response | ||
| print("✅ Streaming response contains tool execution results") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streaming response verification doesn't parse SSE events.
The step acknowledges that SSE events should be parsed to verify tool execution results but only checks the status code. This means the assertion "streaming response contains tool execution results" is not actually verified.
💡 Consider implementing SSE parsing
`@then`("The streaming response should contain tool execution results")
def check_streaming_response_has_tool_results(context: Context) -> None:
assert context.response is not None, "No response received"
assert context.response.status_code == 200
# Parse SSE events from streaming response
content = context.response.content.decode("utf-8")
events = [line for line in content.split("\n") if line.startswith("data:")]
# Verify at least one event contains tool-related content
tool_events = [e for e in events if "tool" in e.lower()]
assert len(tool_events) > 0, "No tool execution events found in streaming response"
print(f"✅ Found {len(tool_events)} tool-related events in streaming response")🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/mcp.py` around lines 699 - 713, The step function
check_streaming_response_has_tool_results only checks status_code but doesn't
parse SSE events; update it to decode context.response.content (or iterate
response.iter_lines() if streaming), extract lines starting with "data:",
collect/parses SSE payloads, then assert that at least one event contains
tool-related content (e.g., contains "tool" or expected tool result keys) and
print the count of matching events; keep using context.response and check for
None/status before parsing.
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- please add Jira issue into title (see contribution guide)
- it changes the LCORE code too, so the title is improper and wrong people will review it
- e2e part will be reviewed by QA
jrobertboos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I think there are some more test scenarios that can be added but for now really good job getting the e2e tests for MCP started :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job adding all the logs!
| Then The status code of the response is 200 | ||
|
|
||
| # ============================================================================ | ||
| # Authentication Methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have more tests for incorrect tokens and make sure they return correct errors (e.g. 401). This can also be implemented later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/main.py (1)
120-125:⚠️ Potential issue | 🟠 MajorFix CORS allow_methods (currently bound to headers).
Line 124 sets
allow_methods=cors.allow_headers, which breaks CORS preflight checks by exposing the headers list instead of the methods list. This should becors.allow_methods.🔧 Suggested fix
app.add_middleware( CORSMiddleware, allow_origins=cors.allow_origins, allow_credentials=cors.allow_credentials, - allow_methods=cors.allow_headers, + allow_methods=cors.allow_methods, allow_headers=cors.allow_headers, )
🤖 Fix all issues with AI agents
In `@src/app/main.py`:
- Around line 269-308: The exception handler in __call__ should track whether
the application has already sent "http.response.start" to avoid sending a second
start and violating ASGI; wrap the send callable (e.g., create a local
wrapped_send that sets response_started = True when it sees message["type"] ==
"http.response.start" and forwards messages to the original send) and use that
wrapped_send when awaiting self.app(scope, receive, wrapped_send); then in the
except block check response_started and if True return early (skip sending
headers/body), otherwise proceed to send the InternalServerErrorResponse
start/body using the original send path (still using JSONResponse and
InternalServerErrorResponse.generic()).
In `@tests/e2e/features/steps/mcp.py`:
- Around line 31-48: The clear_mcp_mock_server_log step currently swallows
failures via print statements which can leave stale data; change function
clear_mcp_mock_server_log to fail fast by replacing the print warnings with
assertions or raised exceptions: after calling
requests.get(f"{MOCK_MCP_SERVER_URL}/debug/clear", timeout=2) assert
response.status_code == 200 (or call response.raise_for_status()) and include
the response content in the assertion message, and in the except
requests.RequestException block re-raise an AssertionError (or let the original
exception propagate) with a clear message including the caught exception so the
test setup fails immediately when the clear endpoint cannot be reached or
returns a non-200 result.
- Around line 533-548: The test check_tools_call_method currently only asserts
that MCP returned some requests; update it to assert that at least one logged
request corresponds to the tools/call endpoint by inspecting the parsed
requests_log (from response.json()) and searching entries for a path or method
field that equals or contains "tools/call"; add an assertion like "No tools/call
request received by MCP server" if none found and keep using mock_server_url and
the /debug/requests endpoint to fetch the log.
🧹 Nitpick comments (8)
src/app/endpoints/query_v2.py (2)
1043-1048: Consider usinglogger.debuginstead oflogger.infofor MCP tool preparation.The log at line 1044 uses
logger.infofor routine MCP tool preparation details. Per coding guidelines,infois for general execution milestones whiledebugis for diagnostic info. Since this is per-request diagnostic information rather than a significant operational milestone,debugmay be more appropriate to avoid log noise in production.Proposed change
- logger.info( + logger.debug( "Prepared %d MCP tool(s) for request: %s", len(mcp_tools), ", ".join(mcp_server_names), )
1068-1074: Same consideration for summary log - usedebugfor per-request diagnostics.This summary log (line 1069) also uses
logger.infofor per-request tool counts. Considerdebuglevel for consistency with the diagnostic nature of this information.Proposed change
- logger.info( + logger.debug( "Prepared %d total tool(s) for Responses API: %d RAG + %d MCP", len(toolgroups), len(rag_tools) if rag_tools else 0, len(mcp_tools) if mcp_tools else 0, )src/app/endpoints/query.py (3)
81-106: Background task pattern is well-designed; minor logging format issue.The pattern correctly:
- Maintains strong references to prevent GC
- Auto-removes completed tasks via done callback
- Handles task creation errors
However, lines 103 and 106 use f-strings instead of %-formatting for logging, which bypasses lazy evaluation:
Use %-style formatting for logger calls
- logger.debug( - f"Background task created, active tasks: {len(background_tasks_set)}" - ) + logger.debug( + "Background task created, active tasks: %d", len(background_tasks_set) + ) except Exception as e: - logger.error(f"Failed to create background task: {e}", exc_info=True) + logger.error("Failed to create background task: %s", e, exc_info=True)
415-432: Background persistence approach is sound; consider extracting the delay constant.The 500ms delay for MCP session cleanup is a reasonable heuristic. Consider extracting it as a named constant for clarity and easier tuning:
Extract delay constant
+# Delay before background task runs to allow MCP session cleanup +MCP_CLEANUP_DELAY_SECONDS = 0.5 + async def persist_with_topic_summary() -> None: ... logger.debug("Background task: waiting for MCP cleanup") - await asyncio.sleep(0.5) # 500ms should be enough for cleanup + await asyncio.sleep(MCP_CLEANUP_DELAY_SECONDS) logger.debug("Background task: MCP cleanup complete")
467-469: Broad exception handling is acceptable here but loses stack trace.For background task resilience, catching all exceptions is reasonable. However, consider adding
exc_info=Trueto preserve the stack trace for debugging:Add exc_info for better debugging
except Exception as e: # pylint: disable=broad-exception-caught - logger.error("Failed to generate topic summary: %s", e) + logger.error("Failed to generate topic summary: %s", e, exc_info=True) topic_summary = Nonesrc/app/endpoints/streaming_query_v2.py (1)
80-93: Code duplication:create_background_taskdiffers fromquery.pyversion.This implementation lacks error handling and logging present in
src/app/endpoints/query.py(lines 87-106). Thequery.pyversion has:
- Try/except around task creation
- Debug logging for task count
- Error logging on failure
Consider extracting to a shared utility module for consistency, or at minimum align the implementations.
Align with query.py implementation
def create_background_task(coro: Any) -> None: """Create a detached background task with strong reference. Args: coro: Coroutine to run as background task """ - task = asyncio.create_task(coro) - background_tasks_set.add(task) - task.add_done_callback(background_tasks_set.discard) + try: + task = asyncio.create_task(coro) + background_tasks_set.add(task) + task.add_done_callback(background_tasks_set.discard) + logger.debug( + "Background task created, active tasks: %d", len(background_tasks_set) + ) + except Exception as e: + logger.error("Failed to create background task: %s", e, exc_info=True)tests/unit/app/endpoints/test_streaming_query_v2.py (1)
226-227: Timing-based test synchronization may be flaky.The 0.7s sleep waits for the 0.5s background task delay plus 0.2s buffer. This could be flaky on slow CI systems. Consider:
- Increasing the buffer margin (e.g., 1.0s total)
- Mocking
asyncio.sleepto make tests deterministic- Using an event or condition for synchronization
Option 1: Increase buffer for CI reliability
# Wait for background cleanup task to complete (has 0.5s delay) - await asyncio.sleep(0.7) + await asyncio.sleep(1.0) # 0.5s delay + 0.5s buffer for slow CIsrc/app/main.py (1)
4-6: Introduce ASGI type aliases to reduce repeated complex signatures.This improves readability and aligns with module-level alias guidance.
As per coding guidelines: Define type aliases at module level for clarity in Python files.♻️ Suggested refactor
from collections.abc import Awaitable, Callable from contextlib import asynccontextmanager from typing import Any, AsyncIterator +ASGIScope = dict[str, Any] +ASGIReceive = Callable[[], Awaitable[dict[str, Any]]] +ASGISend = Callable[[dict[str, Any]], Awaitable[None]] + @@ async def __call__( self, - scope: dict[str, Any], - receive: Callable[[], Awaitable[dict[str, Any]]], - send: Callable[[dict[str, Any]], Awaitable[None]], + scope: ASGIScope, + receive: ASGIReceive, + send: ASGISend, ) -> None: @@ async def __call__( self, - scope: dict[str, Any], - receive: Callable[[], Awaitable[dict[str, Any]]], - send: Callable[[dict[str, Any]], Awaitable[None]], + scope: ASGIScope, + receive: ASGIReceive, + send: ASGISend, ) -> None:Also applies to: 196-201, 269-273
c736daa to
becb161
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/main.py (1)
120-125:⚠️ Potential issue | 🟠 MajorCORS
allow_methodsis wired to headers instead of methods.Line 124 incorrectly assigns
cors.allow_headerstoallow_methods, which breaks CORS preflight behavior by sending header names instead of HTTP method names. TheCORSConfigurationclass defines bothallow_methods(for HTTP methods like GET, POST) andallow_headers(for header names like Content-Type, Authorization).🔧 Suggested fix
app.add_middleware( CORSMiddleware, allow_origins=cors.allow_origins, allow_credentials=cors.allow_credentials, - allow_methods=cors.allow_headers, + allow_methods=cors.allow_methods, allow_headers=cors.allow_headers, )
🤖 Fix all issues with AI agents
In `@src/app/endpoints/query_v2.py`:
- Around line 905-960: The comparison that determines missing auth headers is
case-sensitive and will treat "authorization" vs "Authorization" as different;
update the resolution check to be case-insensitive by normalizing header names
(e.g., lowercasing) before comparing: when building required_headers from
mcp_server.authorization_headers.keys(), when creating resolved_headers from
headers.keys(), and when appending the special Authorization resolution (the
local variable authorization), use a consistent case (like .lower()) so
missing_headers is computed with a case-insensitive match for Authorization and
other headers; ensure the logging output still shows original names or a
readable form if needed.
In `@src/app/endpoints/query.py`:
- Around line 446-459: The conversation existence check uses conversation_id
directly, which may include a "conv_" prefix while the DB stores normalized IDs;
update the logic so check_conversation_exists normalizes conversation_id before
querying (e.g., strip a "conv_" prefix or run the same normalization used when
storing IDs) and then query UserConversation with the normalized id; ensure the
same normalization function or rules are applied (call or replicate
normalization used elsewhere) and keep the await
asyncio.to_thread(check_conversation_exists) usage intact so the DB query still
runs in the thread pool.
In `@tests/e2e/features/mcp_tools.feature`:
- Around line 110-117: Update the system_in_default_state step in
tests/e2e/features/steps/common.py to fully reset the MCP error mode by setting
context.expect_tool_errors = False (in addition to resetting
context.auth_headers). Locate the step implementation named
system_in_default_state and add the assignment to ensure the flag cleared
between scenarios so MCP mock error behavior does not leak across tests.
🧹 Nitpick comments (4)
src/app/endpoints/streaming_query_v2.py (1)
85-94: Add error handling/logging when creating background tasks.This keeps parity with the non‑streaming path and avoids silent failures if task creation fails.
♻️ Suggested update
def create_background_task(coro: Any) -> None: """Create a detached background task with strong reference. Args: coro: Coroutine to run as background task """ - task = asyncio.create_task(coro) - background_tasks_set.add(task) - task.add_done_callback(background_tasks_set.discard) + try: + task = asyncio.create_task(coro) + background_tasks_set.add(task) + task.add_done_callback(background_tasks_set.discard) + logger.debug("Background task created, active tasks: %d", len(background_tasks_set)) + except Exception as e: # pylint: disable=broad-exception-caught + logger.error("Failed to create background task: %s", e, exc_info=True)tests/integration/endpoints/test_query_v2_integration.py (1)
36-59: Avoid fixed sleeps; poll background tasks with a timeout to reduce flakiness.This keeps tests stable under variable CI load.
♻️ Suggested update
- # Wait for the initial 500ms delay + buffer - await asyncio.sleep(0.6) - - # Wait for any remaining background tasks to complete - # pylint: disable=import-outside-toplevel - from app.endpoints.query import background_tasks_set - - if background_tasks_set: - await asyncio.gather(*background_tasks_set, return_exceptions=True) - - # Give thread pool operations extra time to complete - await asyncio.sleep(0.2) + # Wait for the initial 500ms delay + buffer + await asyncio.sleep(0.6) + + # Wait for any remaining background tasks to complete (bounded) + # pylint: disable=import-outside-toplevel + from app.endpoints.query import background_tasks_set + + if background_tasks_set: + timeout = 3.0 + loop = asyncio.get_running_loop() + start = loop.time() + while background_tasks_set and (loop.time() - start) < timeout: + await asyncio.sleep(0.05) + await asyncio.gather(*background_tasks_set, return_exceptions=True)tests/e2e/features/mcp_tools.feature (2)
20-24: Scenario assertions may be incomplete for its stated purpose.The scenario is named "Service reports MCP configuration correctly" but only asserts the HTTP status code. Consider adding assertions to validate the actual MCP configuration content in the response (e.g., verifying MCP servers are listed, their auth types, etc.) to match what the scenario name implies.
💡 Suggested additional assertions
Scenario: Service reports MCP configuration correctly Given The system is in default state And I set the Authorization header to Bearer test-token When I access REST API endpoint "info" using HTTP GET method Then The status code of the response is 200 + And The response should contain MCP configuration + And The response should list configured MCP servers
48-56: Consider adding explicit token verification for client-provided auth.For consistency with the file-based (line 37) and Kubernetes (line 46) scenarios, this scenario could benefit from an explicit assertion verifying the captured Authorization header from the client-auth server, rather than only checking which tools were called.
💡 Suggested additional assertion
Then The MCP mock server should have received requests And The MCP mock server request log should contain exactly tools mock_tool_file, mock_tool_k8s, mock_tool_client + And The MCP mock server should have captured Authorization header containing "client-test-token" from mock-client-auth server
| def check_conversation_exists() -> bool: | ||
| """Check if conversation exists in database (runs in thread pool).""" | ||
| with get_session() as session: | ||
| existing = ( | ||
| session.query(UserConversation) | ||
| .filter_by(id=conversation_id) | ||
| .first() | ||
| ) | ||
| return existing is not None | ||
|
|
||
| # Run database check in thread pool to avoid connection issues | ||
| conversation_exists = await asyncio.to_thread( | ||
| check_conversation_exists | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalize conversation IDs before the existence check.
conversation_id may include a conv_ prefix, while the DB stores normalized IDs. This check can misclassify existing conversations and trigger unnecessary topic-summary generation.
🔧 Suggested fix
def check_conversation_exists() -> bool:
"""Check if conversation exists in database (runs in thread pool)."""
+ normalized_id = normalize_conversation_id(conversation_id)
with get_session() as session:
existing = (
session.query(UserConversation)
- .filter_by(id=conversation_id)
+ .filter_by(id=normalized_id)
.first()
)
return existing is not None🤖 Prompt for AI Agents
In `@src/app/endpoints/query.py` around lines 446 - 459, The conversation
existence check uses conversation_id directly, which may include a "conv_"
prefix while the DB stores normalized IDs; update the logic so
check_conversation_exists normalizes conversation_id before querying (e.g.,
strip a "conv_" prefix or run the same normalization used when storing IDs) and
then query UserConversation with the normalized id; ensure the same
normalization function or rules are applied (call or replicate normalization
used elsewhere) and keep the await asyncio.to_thread(check_conversation_exists)
usage intact so the DB query still runs in the thread pool.
3c4e8a9 to
a362b01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/main.py (1)
120-125:⚠️ Potential issue | 🟠 MajorFix CORS allow_methods configuration parameter.
The
allow_methodsparameter expects HTTP verbs (e.g.,GET,POST,PUT), not header names. Currently it's receivingcors.allow_headers, which is semantically incorrect. Usecors.allow_methodsinstead to ensure proper CORS preflight handling, especially when custom methods are configured.🛠 Suggested fix
- allow_methods=cors.allow_headers, + allow_methods=cors.allow_methods,
🤖 Fix all issues with AI agents
In `@src/app/endpoints/query_v2.py`:
- Around line 932-938: The debug log in the failed header resolution block
currently prints the raw header value (variable value) which may leak secrets;
update the logger.debug call in query_v2.py (the block that references
mcp_server.name, name, value) to avoid emitting the raw value—either remove the
value from the message and log only mcp_server.name and name (header name/type),
or replace value with a redacted/masked placeholder (e.g. "<redacted>" or a
short hash) before logging; ensure you update only the logger.debug invocation
in that resolution failure path.
🧹 Nitpick comments (1)
docker-compose.yaml (1)
87-91: Consider parameterizing the test token value.The entrypoint correctly chains commands with
&&to ensure the Python script only executes after the token file is successfully created. Since this compose file is clearly for E2E testing, the hardcoded token is acceptable.For additional flexibility (e.g., testing different token values or scenarios), you could parameterize the token via an environment variable:
♻️ Optional: Parameterize the test token
environment: - OPENAI_API_KEY=${OPENAI_API_KEY} # Azure Entra ID credentials (AZURE_API_KEY is obtained dynamically) - TENANT_ID=${TENANT_ID:-} - CLIENT_ID=${CLIENT_ID:-} - CLIENT_SECRET=${CLIENT_SECRET:-} + - MCP_TEST_TOKEN=${MCP_TEST_TOKEN:-test-secret-token-123} entrypoint: > /bin/bash -c " - echo 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && + echo \"$MCP_TEST_TOKEN\" > /tmp/lightspeed-mcp-test-token && python3.12 /app-root/src/lightspeed_stack.py "
| # Log failed resolution | ||
| logger.debug( | ||
| "MCP server '%s': Header '%s' -> FAILED to resolve (value was: %s)", | ||
| mcp_server.name, | ||
| name, | ||
| value, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid logging raw auth header values.
On failed resolution, the log prints value, which can be a static secret. This leaks credentials into debug logs; log only the header name/type or redact.
🛡 Suggested fix
- logger.debug(
- "MCP server '%s': Header '%s' -> FAILED to resolve (value was: %s)",
- mcp_server.name,
- name,
- value,
- )
+ logger.debug(
+ "MCP server '%s': Header '%s' -> FAILED to resolve",
+ mcp_server.name,
+ name,
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Log failed resolution | |
| logger.debug( | |
| "MCP server '%s': Header '%s' -> FAILED to resolve (value was: %s)", | |
| mcp_server.name, | |
| name, | |
| value, | |
| ) | |
| # Log failed resolution | |
| logger.debug( | |
| "MCP server '%s': Header '%s' -> FAILED to resolve", | |
| mcp_server.name, | |
| name, | |
| ) |
🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 932 - 938, The debug log in the
failed header resolution block currently prints the raw header value (variable
value) which may leak secrets; update the logger.debug call in query_v2.py (the
block that references mcp_server.name, name, value) to avoid emitting the raw
value—either remove the value from the message and log only mcp_server.name and
name (header name/type), or replace value with a redacted/masked placeholder
(e.g. "<redacted>" or a short hash) before logging; ensure you update only the
logger.debug invocation in that resolution failure path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/endpoints/query_v2.py (1)
179-270:⚠️ Potential issue | 🟠 MajorAvoid logging raw MCP arguments/output content.
The new debug logs print tool arguments and output previews, which can include secrets or PII. Consider redacting values (log only metadata such as sizes/types) or gating this behind an explicit safe-debug flag.
🛡️ Suggested redaction
- logger.debug(" Arguments: %s", args) + logger.debug(" Arguments: <redacted>") ... - logger.debug(" Output preview: %s", output_preview) + logger.debug(" Output preview: <redacted>")src/app/endpoints/query.py (1)
372-379:⚠️ Potential issue | 🟠 MajorCache is not updated with generated topic_summary; conversation list queries will show empty summary.
Line 378 passes
Nonefortopic_summaryto the cache immediately. The background task generates the summary and updates the database viapersist_user_conversation_details(), but it never callscache.set_topic_summary(). Since conversation list endpoints read from the cache (e.g.,conversations_v2.py:102), users will see an emptytopic_summaryeven after generation completes. Callconfiguration.conversation_cache.set_topic_summary()in the background task after generating the summary to keep cache and database in sync.
🤖 Fix all issues with AI agents
In `@dev-tools/mcp-mock-server/server.py`:
- Around line 235-238: The /debug/clear handler currently clears request_log but
leaves last_headers set, causing stale data to persist; update the clear branch
that handles "/debug/clear" to also reset the last_headers variable (and any
similar header-tracking state) when calling request_log.clear(), and ensure the
response still uses
self._send_json_response({"status":"cleared","request_count":0}) so
/debug/headers returns an empty/cleared state after the reset.
In `@src/app/main.py`:
- Line 124: The CORS configuration in main.py sets
allow_methods=cors.allow_headers which passes header names where HTTP methods
are expected; update the call that configures CORS (the parameter allow_methods)
to use cors.allow_methods instead of cors.allow_headers so the allowed HTTP
methods (GET, POST, PUT, DELETE, etc.) are provided to the CORS middleware.
In `@tests/e2e/features/steps/mcp.py`:
- Around line 626-639: The test currently sets
context.auth_headers["Authorization"] which the stack may not forward; update
the configure_mock_server_errors function to instead ensure context.mcp_headers
exists and propagate the error-mode Authorization through it (e.g., set
context.mcp_headers["Authorization"] = "Bearer error-mode"), keep setting
context.expect_tool_errors = True, and remove or avoid relying solely on
context.auth_headers so the mock server receives the MCP-HEADERS error token.
src/app/main.py
Outdated
| allow_origins=cors.allow_origins, | ||
| allow_credentials=cors.allow_credentials, | ||
| allow_methods=cors.allow_methods, | ||
| allow_methods=cors.allow_headers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the file around line 124
find . -type f -name "main.py" | grep -E "src/app" | head -5Repository: lightspeed-core/lightspeed-stack
Length of output: 93
🏁 Script executed:
# Read the main.py file to see context around line 124
head -150 src/app/main.py | tail -50Repository: lightspeed-core/lightspeed-stack
Length of output: 1824
🏁 Script executed:
# Also search for CORS configuration to understand the cors object structure
rg -A 5 -B 5 "cors\s*=" src/app/main.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 365
🏁 Script executed:
# Search for CORS configuration class/model definition
rg -A 20 "class.*[Cc]ors" src/Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Look for cors configuration in the configuration module
find . -type f -name "*.py" -path "*/configuration/*" | head -10Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Search for allow_methods in the codebase to understand what's expected
rg "allow_methods" src/Repository: lightspeed-core/lightspeed-stack
Length of output: 322
🏁 Script executed:
# Examine the CORS configuration model
rg -B 5 -A 10 "allow_methods" src/models/config.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1109
🏁 Script executed:
# Get the full CORS configuration class
rg -B 20 -A 30 "class.*Cors" src/models/config.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Let me see the full CORSConfiguration class definition
rg -B 40 "allow_methods.*Field" src/models/config.py | head -60Repository: lightspeed-core/lightspeed-stack
Length of output: 1538
🌐 Web query:
FastAPI CORSMiddleware allow_methods parameter default behavior HTTP methods
💡 Result:
In FastAPI, CORSMiddleware is Starlette’s CORS middleware, and its allow_methods default is only GET:
- FastAPI reference signature shows
allow_methods=("GET",)by default. [1] - Starlette docs confirm
allow_methodsdefaults to['GET'], and you can use['*']to allow all standard methods. [2]
Practical effect: if your browser sends a CORS preflight (OPTIONS with Access-Control-Request-Method: POST/PUT/...) and that method isn’t in allow_methods, the preflight will fail and the browser won’t send the actual request. To allow all methods, set allow_methods=["*"] (or list the exact methods you need). [3]
Sources:
[1] FastAPI middleware reference (CORSMiddleware signature) (fastapi.tiangolo.com)
[2] Starlette middleware docs (CORSMiddleware defaults) (starlette.io)
[3] FastAPI CORS tutorial (using allow_methods=["*"]) (fastapi.tiangolo.com)
Citations:
- 1: https://fastapi.tiangolo.com/reference/middleware/?utm_source=openai
- 2: https://www.starlette.io/middleware/?utm_source=openai
- 3: https://fastapi.tiangolo.com/bn/tutorial/cors/?utm_source=openai
Fix CORS allow_methods mapping.
Line 124 incorrectly passes cors.allow_headers to the allow_methods parameter. The allow_methods parameter expects HTTP method names (GET, POST, PUT, DELETE, etc.), while allow_headers contains header names (Content-Type, Authorization, etc.). This will cause CORS preflight failures because browsers request allowed methods but receive header names instead, blocking actual requests. Use cors.allow_methods instead.
🔧 Suggested fix
- allow_methods=cors.allow_headers,
+ allow_methods=cors.allow_methods,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| allow_methods=cors.allow_headers, | |
| allow_methods=cors.allow_methods, |
🤖 Prompt for AI Agents
In `@src/app/main.py` at line 124, The CORS configuration in main.py sets
allow_methods=cors.allow_headers which passes header names where HTTP methods
are expected; update the call that configures CORS (the parameter allow_methods)
to use cors.allow_methods instead of cors.allow_headers so the allowed HTTP
methods (GET, POST, PUT, DELETE, etc.) are provided to the CORS middleware.
a362b01 to
4f35463
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/app/main.py`:
- Around line 184-244: The code compares scope["path"] (concrete request path)
to app_routes_paths strings, so parameterized routes (e.g. /conversations/{id})
are never matched and using the raw scope path as a metric label risks
cardinality explosion; replace the simple membership check and label usage with
a route match lookup and use the route's template for labels: iterate your
registered routes (instead of comparing path strings), call route.matches(scope)
(or the equivalent match method on your framework's Route object) to find the
matching Route, skip if no match, set a label_path = route.path or
route.template (the template string like "/conversations/{conversation_id}") and
use label_path in metrics.response_duration_seconds.labels(...) and
metrics.rest_api_calls_total.labels(label_path, status_code).inc(); keep
send_wrapper and status capture as-is. Ensure app_routes_paths is the collection
of Route objects (or convert strings to Route objects) so route.matches() and
route.path are available.
🧹 Nitpick comments (2)
tests/integration/endpoints/test_query_v2_integration.py (1)
36-59: Reduce fixed-sleep flakiness in background-task waits.
The hardcoded 0.6s/0.2s sleeps can be brittle under slower CI. Prefer a timeout-based wait on the tracked tasks so tests fail fast if tasks stall.♻️ Proposed refactor
async def wait_for_background_tasks() -> None: """Wait for background tasks to complete. @@ - # Wait for the initial 500ms delay + buffer - await asyncio.sleep(0.6) - - # Wait for any remaining background tasks to complete + # Wait for any remaining background tasks to complete (with timeout) # pylint: disable=import-outside-toplevel from app.endpoints.query import background_tasks_set - if background_tasks_set: - await asyncio.gather(*background_tasks_set, return_exceptions=True) - - # Give thread pool operations extra time to complete - await asyncio.sleep(0.2) + if background_tasks_set: + await asyncio.wait_for( + asyncio.gather(*background_tasks_set, return_exceptions=True), + timeout=2.0, + )tests/unit/app/endpoints/test_streaming_query_v2.py (1)
226-227: Avoid fixed sleep; wait on the tracked cleanup task instead.
This makes the test deterministic and less timing-sensitive.♻️ Proposed refactor
-from app.endpoints.streaming_query_v2 import ( +from app.endpoints.streaming_query_v2 import ( + background_tasks_set, retrieve_response, streaming_query_endpoint_handler_v2, ) @@ - # Wait for background cleanup task to complete (has 0.5s delay) - await asyncio.sleep(0.7) + # Wait for background cleanup task to complete + await asyncio.wait_for( + asyncio.gather(*background_tasks_set, return_exceptions=True), + timeout=2.0, + )
4f35463 to
09c1419
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/app/endpoints/streaming_query_v2.py`:
- Around line 85-94: Wrap the asyncio.create_task call in create_background_task
with a try/except and log any exceptions so task creation failures aren't
silent: call asyncio.create_task(coro) inside a try block, add the resulting
task to background_tasks_set and attach
task.add_done_callback(background_tasks_set.discard) in that try, and in the
except Exception as e block log the error (use the module's logger, e.g.,
logger.exception(...) or process_logger.error(...) if present) including the
exception details; do not change the existing background_tasks_set or callback
names.
🧹 Nitpick comments (7)
tests/e2e/features/steps/common.py (1)
39-41: Consider also resettingexpect_tool_errorsflag for complete test isolation.The
auth_headersreset is good, but per the past review comment onmcp_tools.feature, thecontext.expect_tool_errorsflag set by "The MCP mock server is configured to return errors" step can also persist across scenarios. For complete test isolation, consider adding:# Clear auth headers to ensure clean state between scenarios if hasattr(context, "auth_headers"): context.auth_headers = {} + # Clear MCP error mode flag to ensure clean state between scenarios + if hasattr(context, "expect_tool_errors"): + context.expect_tool_errors = FalseBased on learnings: "In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared."
src/app/endpoints/streaming_query_v2.py (1)
80-94: Consider extracting shared background task infrastructure to a utility module.The
background_tasks_setandcreate_background_taskpattern is duplicated between this file andsrc/app/endpoints/query.py. Extracting this to a shared utility (e.g.,utils/background_tasks.py) would reduce duplication and ensure consistent behavior.dev-tools/mcp-mock-server/server.py (1)
97-113: Consider using if/elif instead ofmatch Truefor token matching.The
match Truewith case guards is an unconventional pattern. While it works, standardif/elifchains are more idiomatic for this type of string matching:♻️ Alternative with if/elif (optional)
- match True: - case _ if "test-secret-token" in auth_header: - tool_name = "mock_tool_file" - tool_desc = "Mock tool with file-based auth" - case _ if "my-k8s-token" in auth_header: - tool_name = "mock_tool_k8s" - tool_desc = "Mock tool with Kubernetes token" - case _ if "my-client-token" in auth_header: - tool_name = "mock_tool_client" - tool_desc = "Mock tool with client-provided token" - case _ if "error-mode" in auth_header: - tool_name = "mock_tool_error" - tool_desc = "Mock tool configured to return errors" - error_mode = True - case _: - # Default case already set above - pass + if "test-secret-token" in auth_header: + tool_name = "mock_tool_file" + tool_desc = "Mock tool with file-based auth" + elif "my-k8s-token" in auth_header: + tool_name = "mock_tool_k8s" + tool_desc = "Mock tool with Kubernetes token" + elif "my-client-token" in auth_header: + tool_name = "mock_tool_client" + tool_desc = "Mock tool with client-provided token" + elif "error-mode" in auth_header: + tool_name = "mock_tool_error" + tool_desc = "Mock tool configured to return errors" + error_mode = Truetests/integration/endpoints/test_query_v2_integration.py (1)
161-168: Good approach for test database isolation, but kwargs not forwarded.The
mock_to_threadfunction accepts**kwargsbut doesn't pass them tofunc(). Whileasyncio.to_threadtypically doesn't use kwargs for the target function (they're passed as positional args), the signature mismatch could cause subtle issues if kwargs are ever used.♻️ Minor fix to forward kwargs
async def mock_to_thread(func: Any, *args: Any, **kwargs: Any) -> Any: """Run function synchronously instead of in thread pool for tests.""" - return func(*args, **kwargs) + return func(*args)Or if kwargs support is intentional:
async def mock_to_thread(func: Any, *args: Any, **kwargs: Any) -> Any: """Run function synchronously instead of in thread pool for tests.""" return func(*args, **kwargs)src/app/main.py (1)
81-85: Remove commented-out code.The commented code block for MCP server registration (lines 81-85) should be removed rather than left as comments. If this code path is no longer needed for the Responses API, removing it improves maintainability.
♻️ Remove commented code
- # NOTE: MCP server registration not needed for Responses API - # The Responses API takes inline tool definitions instead of pre-registered toolgroups - # logger.info("Registering MCP servers") - # await register_mcp_servers_async(logger, configuration.configuration) - # get_logger("app.endpoints.handlers") logger.info("App startup complete")tests/e2e/features/steps/mcp.py (2)
831-848: Streaming response verification doesn't parse SSE events.The step acknowledges that SSE events should be parsed (line 843 comment) but only checks the status code. This means the assertion "streaming response contains tool execution results" isn't actually verified.
💡 Consider implementing SSE parsing
`@then`("The streaming response should contain tool execution results") def check_streaming_response_has_tool_results(context: Context) -> None: assert context.response is not None, "No response received" assert context.response.status_code == 200 # Parse SSE events from streaming response content = context.response.content.decode("utf-8") events = [line for line in content.split("\n") if line.startswith("data:")] # Verify at least one event contains tool-related content tool_events = [e for e in events if "tool" in e.lower()] assert len(tool_events) > 0, "No tool execution events found in streaming response" print(f"✅ Found {len(tool_events)} tool-related events in streaming response") time.sleep(0.5)
78-87: Consider polling instead of fixed sleep for MCP request waits.The fixed
time.sleep(2)at line 86 may be insufficient under load or too long for fast scenarios. Consider polling the/debug/requestsendpoint until requests appear or a timeout is reached.💡 Polling approach example
`@when`("I wait for MCP server to receive requests") def wait_for_mcp_requests(context: Context) -> None: """Wait for MCP server to receive and log requests with polling.""" max_wait = 5 # seconds poll_interval = 0.2 elapsed = 0 while elapsed < max_wait: try: response = requests.get(f"{MOCK_MCP_SERVER_URL}/debug/requests", timeout=2) if response.status_code == 200 and len(response.json()) > 0: print(f"⏱️ MCP server received requests after {elapsed:.1f}s") return except requests.RequestException: pass time.sleep(poll_interval) elapsed += poll_interval print(f"⏱️ Waited {max_wait}s for MCP server (may have no requests)")
09c1419 to
ce1403f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/app/endpoints/query_v2.py`:
- Around line 179-203: The current logging of MCP tool calls exposes raw
arguments and output (see uses of mcp_call_item, args, content and
logger.debug/logger.warning); update these logs to redact sensitive data by
replacing raw args with a summary (e.g., if args is a dict/loggable mapping, log
keys and per-key lengths or types; if list/tuple, log element counts and types)
and remove or replace detailed output previews with only content length and a
fixed masked indicator (no raw substrings). Apply the same change to the
error/result logging paths (including where mcp_call_item.error is logged) so
only non-sensitive metadata (tool name, server_label, call_id, error
type/message without user data, args keys/counts, output length) are recorded
instead of raw values.
In `@src/app/endpoints/query.py`:
- Around line 415-491: The app currently schedules background tasks via
create_background_task(persist_with_topic_summary()) but the lifespan shutdown
does not wait for background_tasks_set, causing tasks (like
persist_with_topic_summary) to be cancelled before completion; update the
application shutdown in main.py to import background_tasks_set from
app.endpoints.query and await asyncio.gather(*background_tasks_set,
return_exceptions=True) (then call A2AStorageFactory.cleanup()) so all pending
tasks are awaited during shutdown, ensuring persist_with_topic_summary finishes
persistence and topic summary work before process exit.
In `@tests/e2e/features/mcp_tools.feature`:
- Around line 110-117: In the step function that implements "The system is in
default state" in tests/e2e/features/steps/common.py (the Given step for default
state), reset the MCP mock flag by adding context.expect_tool_errors = False
alongside the existing cleanup of context.auth_headers so test isolation is
preserved; ensure you set this attribute in the same function (the default-state
step) so tests that previously set context.expect_tool_errors = True are not
leaking into subsequent scenarios.
🧹 Nitpick comments (6)
docker-compose-library.yaml (1)
69-73: Useexecso Python receives SIGTERM directly.
Withoutexec, bash stays PID 1 and may not forward signals, which can slow container shutdowns.Suggested tweak
entrypoint: > /bin/bash -c " echo 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && - /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py + exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py "tests/unit/app/endpoints/test_streaming_query_v2.py (1)
4-4: Prefer awaiting background tasks over a fixed sleep.
The 0.7s delay can be flaky on slower CI; waiting on the tracked tasks is deterministic.Suggested tweak
- # Wait for background cleanup task to complete (has 0.5s delay) - await asyncio.sleep(0.7) + # Wait for background cleanup task(s) deterministically + from app.endpoints.streaming_query_v2 import background_tasks_set + + if background_tasks_set: + await asyncio.gather(*list(background_tasks_set), return_exceptions=True)Also applies to: 226-227
tests/integration/endpoints/test_query_v2_integration.py (1)
7-59: Snapshot the background task set before gathering.
background_tasks_setcan mutate while unpacking; snapshotting avoids potentialRuntimeError.Suggested tweak
- if background_tasks_set: - await asyncio.gather(*background_tasks_set, return_exceptions=True) + tasks = list(background_tasks_set) + if tasks: + await asyncio.gather(*tasks, return_exceptions=True)src/app/endpoints/query.py (2)
59-59: Logger name doesn't follow the__name__pattern.Per coding guidelines, modules should use
logger = logging.getLogger(__name__). The hardcoded string"app.endpoints.handlers"may cause confusion if the module is renamed or moved.-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)
102-106: Use%-style formatting in logger calls instead of f-strings.Using f-strings in logging eagerly evaluates the format expression even when the log level is disabled. The
%-style is more efficient and consistent with the rest of the codebase.- logger.debug( - f"Background task created, active tasks: {len(background_tasks_set)}" - ) + logger.debug( + "Background task created, active tasks: %d", len(background_tasks_set) + ) except Exception as e: - logger.error(f"Failed to create background task: {e}", exc_info=True) + logger.error("Failed to create background task: %s", e, exc_info=True)tests/e2e/features/steps/mcp.py (1)
831-845: Streaming response verification doesn't parse SSE events.The step acknowledges that SSE events should be parsed but only checks the status code. This means "streaming response contains tool execution results" isn't actually verified.
💡 Consider implementing SSE parsing
`@then`("The streaming response should contain tool execution results") def check_streaming_response_has_tool_results(context: Context) -> None: assert context.response is not None, "No response received" assert context.response.status_code == 200 # Parse SSE events from streaming response content = context.response.content.decode("utf-8") events = [line for line in content.split("\n") if line.startswith("data:")] # Verify at least one event contains tool-related content tool_events = [e for e in events if "tool" in e.lower()] assert len(tool_events) > 0, "No tool execution events found in streaming response" print(f"✅ Found {len(tool_events)} tool-related events in streaming response")
219ac4f to
d2923bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration/endpoints/test_query_v2_integration.py (1)
1350-1382:⚠️ Potential issue | 🟡 MinorAdd a wait after the second request in transcript test.
The second DB lookup can race the background persistence task. Add another
wait_for_background_tasks()afterresponse_disabledbefore queryingconversation_disabled.🔧 Suggested fix
# Verify response succeeded with transcripts disabled assert response_disabled.conversation_id is not None assert response_disabled.response is not None + await wait_for_background_tasks() + # Verify conversation was still persisted (transcripts are independent) conversation_disabled = ( patch_db_session.query(UserConversation) .filter_by(id=response_disabled.conversation_id) .first()
🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/conversation.py`:
- Around line 25-73: In poll_for_conversation, add an explicit guard at the top
to handle max_retries <= 0 (e.g., raise ValueError or set it to a safe default)
so that the variable response is never referenced before assignment; ensure the
function returns or raises immediately when max_retries is invalid to avoid
UnboundLocalError on the final return.
In `@tests/e2e/features/steps/feedback.py`:
- Around line 150-154: The code builds conversation_url by concatenating base
and a path then calling .replace("//","/"), which breaks the URL scheme; instead
construct the path separately from base and join them safely: build the path
using context.api_prefix and context.conversation_id (e.g.,
f"{context.api_prefix}/conversations/{context.conversation_id}" with leading
slashes trimmed) and then prepend base after trimming trailing slashes (or use a
URL-joining utility) to form conversation_url so the scheme (http://) is
preserved and no .replace("//","/") is used.
In `@tests/integration/endpoints/test_query_v2_integration.py`:
- Around line 51-56: background_tasks_set may mutate while being expanded into
asyncio.gather causing a "set changed size during iteration" error; snapshot it
first by converting background_tasks_set to a list (e.g., tasks =
list(background_tasks_set)) and then pass that list to await
asyncio.gather(*tasks, return_exceptions=True) so the set can't change during
iteration; update the code around background_tasks_set usage in
test_query_v2_integration.py accordingly.
🧹 Nitpick comments (5)
tests/e2e/features/steps/conversation.py (1)
251-289: Prefer shared retry/timeout constants for consistency.Use
MAX_RETRIES,INITIAL_RETRY_DELAY,MAX_RETRY_DELAY, andDEFAULT_TIMEOUTinstead of literals to keep retry behavior aligned across the suite.♻️ Proposed refactor
- max_retries = 10 - delay = 0.2 # Start with 200ms + max_retries = MAX_RETRIES + delay = INITIAL_RETRY_DELAY # Start with configured delay @@ - delay = min(delay * 2, 2.0) # Exponential backoff, cap at 2s + delay = min(delay * 2, MAX_RETRY_DELAY) # Exponential backoff, cap @@ - context.response = requests.get(url, headers=headers, timeout=10) + context.response = requests.get(url, headers=headers, timeout=DEFAULT_TIMEOUT)src/app/endpoints/streaming_query_v2.py (1)
80-100: Consider extracting shared background task utility.The
background_tasks_setandcreate_background_taskpattern is duplicated between this file andquery.py. Consider extracting to a shared utility module (e.g.,utils/background_tasks.py) to avoid drift and ensure consistent behavior.src/app/endpoints/query.py (1)
102-104: Use % formatting for logger calls instead of f-strings.Line 103 uses an f-string (
f"Background task created...") while other logger calls in this file andstreaming_query_v2.pyuse%formatting (e.g., line 96 in streaming_query_v2.py). Use consistent formatting for better performance (deferred evaluation) and style uniformity.♻️ Suggested fix
- logger.debug( - f"Background task created, active tasks: {len(background_tasks_set)}" - ) + logger.debug( + "Background task created, active tasks: %d", len(background_tasks_set) + )Same applies to line 106:
- logger.error(f"Failed to create background task: {e}", exc_info=True) + logger.error("Failed to create background task: %s", e, exc_info=True)dev-tools/mcp-mock-server/server.py (1)
97-113: Consider simplifying the token matching logic.The
match True:withcase _ if ...guards is unconventional. A more idiomatic approach would be using if/elif chains or matching directly onauth_header:♻️ Alternative using if/elif
if "test-secret-token" in auth_header: tool_name = "mock_tool_file" tool_desc = "Mock tool with file-based auth" elif "my-k8s-token" in auth_header: tool_name = "mock_tool_k8s" tool_desc = "Mock tool with Kubernetes token" elif "my-client-token" in auth_header: tool_name = "mock_tool_client" tool_desc = "Mock tool with client-provided token" elif "error-mode" in auth_header: tool_name = "mock_tool_error" tool_desc = "Mock tool configured to return errors" error_mode = True # else: defaults already set abovesrc/app/main.py (1)
104-112: Bound shutdown waits to avoid hanging deployments.If any background task stalls, shutdown will block indefinitely. Consider adding a timeout and canceling lingering tasks.
🔧 Suggested fix
- all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks) - await asyncio.gather(*all_tasks, return_exceptions=True) - logger.info("All background tasks completed") + all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks) + try: + await asyncio.wait_for( + asyncio.gather(*all_tasks, return_exceptions=True), + timeout=10, + ) + logger.info("All background tasks completed") + except asyncio.TimeoutError: + logger.warning("Background tasks did not finish before shutdown; cancelling") + for task in all_tasks: + task.cancel() + await asyncio.gather(*all_tasks, return_exceptions=True)
d2923bf to
7a1e2dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/e2e/features/steps/feedback.py (1)
116-150:⚠️ Potential issue | 🟡 MinorDocstring is misleading — claims polling occurs but it doesn't.
The docstring (lines 118-120) states the function "polls to ensure it's persisted to the database before proceeding," but there is no polling code in the function. The NOTE comment (lines 146-149) explicitly says polling is disabled. This contradiction will confuse future maintainers.
Either remove the polling claim from the docstring or, if polling is intended, implement it.
📝 Proposed fix to align docstring with actual behavior
def create_conversation_with_user_id( context: Context, user_id: Optional[str] = None ) -> None: - """Create a conversation, optionally with a specific user_id query parameter. - - After creating the conversation, polls to ensure it's persisted to the database - before proceeding. This handles the asynchronous background persistence. - """ + """Create a conversation, optionally with a specific user_id query parameter.""" endpoint = "query"Then simplify the NOTE comment:
- # NOTE: Polling disabled for feedback scenarios to avoid errors - # Feedback tests typically don't need to wait for background persistence - # as they interact with conversations immediately after creation. - # If polling is needed, it can be added back with proper error handling. + # NOTE: Feedback tests interact with conversations immediately after creation, + # so no polling for background persistence is performed here.
🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/common.py`:
- Around line 27-45: Fix the duplicated wording in the docstring at the top of
the shown block (the docstring that currently reads "Check the default state
system state."); update it to read "Check the default system state." (or "Check
the default state.") so the sentence is grammatically correct while keeping the
rest of the docstring intact; the change should be made in the docstring
immediately preceding the assert context is not None check in
tests/e2e/features/steps/common.py.
🧹 Nitpick comments (7)
docker-compose.yaml (1)
87-91: Useexecin the entrypoint for proper signal handling.Running Python as a child of bash can swallow SIGTERM/SIGINT;
execmakes the interpreter PID 1 for clean shutdowns.🔧 Suggested change
entrypoint: > /bin/bash -c " echo 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && - /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py + exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py "tests/unit/app/endpoints/test_streaming_query_v2.py (1)
226-227: Avoid fixed sleep; await background cleanup deterministically.Fixed delays can be flaky and slow. Consider awaiting the tracked background tasks instead.
🔧 Suggested change
from app.endpoints.streaming_query_v2 import ( + background_tasks_set, retrieve_response, streaming_query_endpoint_handler_v2, ) @@ - # Wait for background cleanup task to complete (has 0.5s delay) - await asyncio.sleep(0.7) + # Wait for background cleanup task to complete deterministically + tasks = list(background_tasks_set) + if tasks: + await asyncio.gather(*tasks, return_exceptions=True)docker-compose-library.yaml (1)
69-73: Useexecin the entrypoint for proper signal handling.Running Python as a child of bash can swallow SIGTERM/SIGINT;
execmakes the interpreter PID 1 for clean shutdowns.🔧 Suggested change
entrypoint: > /bin/bash -c " echo 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && - /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py + exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py "src/app/endpoints/streaming_query_v2.py (1)
80-100: Consider centralizing the background-task helper.It’s duplicated with
src/app/endpoints/query.py; a shared utility would prevent drift.tests/e2e/features/steps/conversation.py (1)
257-305: Use module-level constants for retry configuration.The retry logic here duplicates values from the module constants. Using
MAX_RETRIES,INITIAL_RETRY_DELAY, andMAX_RETRY_DELAYwould improve consistency and maintainability.♻️ Suggested refactor
def check_returned_conversation_id(context: Context) -> None: """Check the conversation id in response. If the conversation is not found in the list, retries the GET request with exponential backoff to handle asynchronous background persistence. """ - max_retries = 10 - delay = 0.2 # Start with 200ms + max_retries = MAX_RETRIES + delay = INITIAL_RETRY_DELAY for attempt in range(max_retries): # ... existing code ... - delay = min(delay * 2, 2.0) # Exponential backoff, cap at 2s + delay = min(delay * 2, MAX_RETRY_DELAY) # Exponential backoff with captests/e2e/features/steps/mcp.py (2)
831-848: Streaming response verification doesn't parse SSE events.The step only checks status code but doesn't verify tool execution results are actually present in the SSE stream. Consider parsing SSE events to make this assertion meaningful.
💡 Suggested enhancement
`@then`("The streaming response should contain tool execution results") def check_streaming_response_has_tool_results(context: Context) -> None: assert context.response is not None, "No response received" assert context.response.status_code == 200 # Parse SSE events from streaming response content = context.response.content.decode("utf-8") events = [line for line in content.split("\n") if line.startswith("data:")] # Verify at least one event contains tool-related content tool_events = [e for e in events if "tool" in e.lower()] assert len(tool_events) > 0, "No tool execution events found in streaming response" print(f"✅ Found {len(tool_events)} tool-related events in streaming response") # Cleanup delay time.sleep(0.5)
78-87: Consider making wait time configurable or using polling.The fixed
time.sleep(2)may be too short under load or too long for fast runs. Consider polling the debug endpoint until requests appear or making the timeout configurable.
tests/e2e/features/steps/common.py
Outdated
| """Check the default state system state. | ||
|
|
||
| Ensure the Behave test context is present for steps that assume the system | ||
| is in its default state. | ||
|
|
||
| Note: Does NOT clear auth headers, as those may be set in Background section | ||
| and should persist for the entire scenario. Auth headers are automatically | ||
| reset by Behave between scenarios. | ||
|
|
||
| Parameters: | ||
| context (Context): Behave Context instance used to store and share test state. | ||
|
|
||
| Raises: | ||
| AssertionError: If `context` is None. | ||
| """ | ||
| assert context is not None | ||
| # Reset MCP error expectation flag for test isolation | ||
| if hasattr(context, "expect_tool_errors"): | ||
| context.expect_tool_errors = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the docstring typo (“default state system state”).
Minor wording duplication in the docstring.
🔧 Suggested change
- """Check the default state system state.
+ """Check the default system state.🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/common.py` around lines 27 - 45, Fix the duplicated
wording in the docstring at the top of the shown block (the docstring that
currently reads "Check the default state system state."); update it to read
"Check the default system state." (or "Check the default state.") so the
sentence is grammatically correct while keeping the rest of the docstring
intact; the change should be made in the docstring immediately preceding the
assert context is not None check in tests/e2e/features/steps/common.py.
ee1cec2 to
d516d49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docker-compose-library.yaml`:
- Around line 69-73: In the entrypoint shell snippet, the test token is written
with echo which appends a trailing newline; replace the write command to use
printf without a newline (e.g., printf '%s' 'test-secret-token-123' >
/tmp/lightspeed-mcp-test-token) so the file contains the exact token, then
continue launching /app-root/.venv/bin/python3.12
/app-root/src/lightspeed_stack.py; update the entrypoint command that currently
uses echo to use printf '%s' instead.
🧹 Nitpick comments (2)
tests/e2e/features/steps/mcp.py (1)
646-661: Add an assertion for error indicators in the response.
The step name promises failure verification, but it currently doesn’t validate any error signal.🧪 Suggested improvement
assert context.response is not None, "No response received" assert ( context.response.status_code == 200 ), f"Bad status: {context.response.status_code}" - - # In error mode, the response should still be 200 but contain error information - # The LLM will handle the tool error gracefully - print("✅ Response received (tool errors are handled gracefully by LLM)") + response_data = context.response.json() + response_text = json.dumps(response_data).lower() + assert ( + "error" in response_text or "iserror" in response_text + ), "Response did not surface tool error information" + print("✅ Response indicates tool execution failure")src/app/endpoints/streaming_query_v2.py (1)
337-369: Include stack traces in background cleanup failures.
This keeps post‑mortems actionable when async cleanup fails.🔧 Suggested tweak
- except Exception as e: # pylint: disable=broad-exception-caught - logger.error("Background cleanup failed: %s", e) + except Exception as e: # pylint: disable=broad-exception-caught + logger.error("Background cleanup failed: %s", e, exc_info=True)
d516d49 to
71c4a6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/e2e/features/steps/feedback.py (1)
116-162:⚠️ Potential issue | 🟡 MinorFail fast when persistence polling fails.
Line 156–161 only logs a warning on non-200 poll results, then continues. This can mask the root cause and push failures into later steps. Prefer failing immediately after retries to keep tests deterministic and errors actionable.
🔧 Suggested fix
poll_response = poll_for_conversation(conversation_url, headers) - if poll_response.status_code != 200: - print( - f"⚠️ Warning: Conversation {context.conversation_id} not found after creation " - f"(status: {poll_response.status_code})" - ) + assert poll_response.status_code == 200, ( + f"Conversation {context.conversation_id} not found after creation " + f"(status: {poll_response.status_code})" + )
🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/conversation.py`:
- Around line 312-365: The loop assumes context.response contains a 200 JSON
with "conversations"; add an explicit check right after response_json =
context.response.json() (or better, before parsing) to validate
context.response.status_code == 200 and that "conversations" is in the parsed
body. If the status is non-200 or the key is missing, raise/assert with a clear
message that includes context.response.status_code and the raw response text (or
response_json) so the test fails with the real error; keep using the same retry
flow (re-fetch via the existing endpoint/url/headers logic) when status is
non-200 so transient errors are retried, but on final attempt surface the
detailed error mentioning context.response_data["conversation_id"] and the last
response payload.
🧹 Nitpick comments (2)
dev-tools/mcp-mock-server/server.py (1)
97-113: Consider using if-elif chain instead ofmatch Truepattern.The
match Truewithcase _ if condition:is an unconventional pattern in Python. While functional, a standard if-elif chain would be more idiomatic and readable:♻️ Suggested refactor
- match True: - case _ if "test-secret-token" in auth_header: - tool_name = "mock_tool_file" - tool_desc = "Mock tool with file-based auth" - case _ if "my-k8s-token" in auth_header: - tool_name = "mock_tool_k8s" - tool_desc = "Mock tool with Kubernetes token" - case _ if "my-client-token" in auth_header: - tool_name = "mock_tool_client" - tool_desc = "Mock tool with client-provided token" - case _ if "error-mode" in auth_header: - tool_name = "mock_tool_error" - tool_desc = "Mock tool configured to return errors" - error_mode = True - case _: - # Default case already set above - pass + if "test-secret-token" in auth_header: + tool_name = "mock_tool_file" + tool_desc = "Mock tool with file-based auth" + elif "my-k8s-token" in auth_header: + tool_name = "mock_tool_k8s" + tool_desc = "Mock tool with Kubernetes token" + elif "my-client-token" in auth_header: + tool_name = "mock_tool_client" + tool_desc = "Mock tool with client-provided token" + elif "error-mode" in auth_header: + tool_name = "mock_tool_error" + tool_desc = "Mock tool configured to return errors" + error_mode = True + # else: defaults already set abovesrc/app/endpoints/streaming_query_v2.py (1)
337-369: Include stack traces for background cleanup failures.Without
exc_info=True(orlogger.exception), failures in cleanup lose useful diagnostics.🔧 Suggested tweak
- except Exception as e: # pylint: disable=broad-exception-caught - logger.error("Background cleanup failed: %s", e) + except Exception as e: # pylint: disable=broad-exception-caught + logger.error("Background cleanup failed: %s", e, exc_info=True)
71c4a6a to
4321952
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/app/main.py`:
- Around line 319-334: The global ASGI exception middleware currently maps all
exceptions to a 500 InternalServerErrorResponse; update the try/except in
main.py to explicitly handle APIConnectionError first (before the broad
Exception catch) by logging the connection error and returning
ServiceUnavailableResponse.generic() when response_started is False (and still
logging the "response already started" case using response_started and logger),
otherwise keep the existing fallback behavior that uses
InternalServerErrorResponse.generic() for other exceptions; reference
APIConnectionError, ServiceUnavailableResponse, InternalServerErrorResponse,
response_started, and logger to locate and implement the change.
🧹 Nitpick comments (5)
tests/e2e/features/steps/llm_query_response.py (1)
60-66: Avoid hard‑coded 1s sleep; make the wait configurable/adaptive.A fixed delay can be too short on slow CI (flaky) or too long on fast runs (adds latency). Prefer polling with a timeout, or at least make the delay configurable via env/context so CI can tune it.
♻️ Suggested diff (configurable delay)
-import json -import time +import json +import os +import time @@ DEFAULT_LLM_TIMEOUT = 60 +PERSIST_WAIT_SECONDS = float(os.getenv("E2E_PERSIST_WAIT_SECONDS", "1")) @@ - time.sleep(1) + time.sleep(PERSIST_WAIT_SECONDS)src/app/main.py (2)
104-112: Surface background task failures during shutdown.
asyncio.gather(..., return_exceptions=True)will swallow failures unless you log them.🔧 Example
- all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks) - await asyncio.gather(*all_tasks, return_exceptions=True) + all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks) + results = await asyncio.gather(*all_tasks, return_exceptions=True) + for result in results: + if isinstance(result, BaseException): + logger.error("Background task failed during shutdown", exc_info=result)
207-212: Align new docstrings with Google style (“Args/Returns”).Use
Args:instead ofParameters:in the new docstrings for consistency with Google conventions.🔧 Example
- Parameters: - app: The ASGI application instance to wrap - app_routes_paths: List of route paths to monitor (others ignored) + Args: + app: The ASGI application instance to wrap + app_routes_paths: List of route paths to monitor (others ignored)As per coding guidelines Follow Google Python docstring conventions including Parameters, Returns, Raises, and Attributes sections as needed.
tests/e2e/features/steps/conversation.py (1)
317-361: Reuse retry constants for consistency.This block redefines backoff values and timeout; consider using
MAX_RETRIES,INITIAL_RETRY_DELAY, andDEFAULT_TIMEOUTso tuning stays centralized.🔧 Example
- max_retries = 10 - delay = 0.2 # Start with 200ms + max_retries = MAX_RETRIES + delay = INITIAL_RETRY_DELAY @@ - context.response = requests.get(url, headers=headers, timeout=10) + context.response = requests.get(url, headers=headers, timeout=DEFAULT_TIMEOUT)docker-compose.yaml (1)
87-91: Use exec in entrypoint to preserve signal handling.Without
exec, bash stays PID1 and may not forward SIGTERM to the Python process.🔧 Suggested tweak
entrypoint: > /bin/bash -c " printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && - /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py + exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py "
58a07bd to
116f3b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/e2e/features/steps/feedback.py (1)
116-162:⚠️ Potential issue | 🟡 MinorFail fast if conversation persistence never completes. The current warning allows tests to proceed and fail later with less context. Consider asserting so the failure is immediate and actionable.
🔧 Suggested fix (fail fast)
- poll_response = poll_for_conversation(conversation_url, headers) - if poll_response.status_code != 200: - print( - f"⚠️ Warning: Conversation {context.conversation_id} not found after creation " - f"(status: {poll_response.status_code})" - ) + poll_response = poll_for_conversation(conversation_url, headers) + if poll_response.status_code != 200: + raise AssertionError( + "Conversation persistence failed after creation " + f"(status: {poll_response.status_code}, body: {poll_response.text})" + )
🤖 Fix all issues with AI agents
In `@src/app/main.py`:
- Around line 322-349: The 503 handler currently sends the full
ServiceUnavailableResponse (including status_code) which differs from the 500
response schema; change the body sent to match the 500 format by sending only
the detail wrapper. Specifically, after creating error_response
(ServiceUnavailableResponse) in the except APIConnectionError block, build the
payload as {"detail": error_response.detail} (or equivalent from error_response)
and send that JSON via send(...) instead of error_response.model_dump_json(), so
the response schema matches the 500 error format.
In `@tests/e2e/features/info.feature`:
- Line 19: Update the test assertion string that currently reads "And The body
of the response has llama-stack version 0.4.2" to expect "llama-stack version
0.4.3" so it matches the pinned dependency in pyproject.toml; locate and modify
the step definition or feature line containing that exact phrase to use 0.4.3.
🧹 Nitpick comments (4)
tests/e2e/configuration/library-mode/lightspeed-stack-mcp.yaml (1)
54-54: Add a trailing newline at end of file.The file is missing a final newline, which is a common convention for text files and helps with certain tools (e.g.,
cat, diff utilities).src/app/endpoints/query.py (2)
87-97: Consider using a more specific type hint for the coroutine parameter.The
coroparameter is typed asAny, but could be more precisely typed asCoroutine[Any, Any, Any]for better type checking.♻️ Proposed fix
+from typing import Annotated, Any, Coroutine, Optional -from typing import Annotated, Any, Optional -def create_background_task(coro: Any) -> None: +def create_background_task(coro: Coroutine[Any, Any, Any]) -> None:
415-433: Consider making the MCP cleanup delay configurable or documenting the rationale.The 500ms delay is a reasonable pragmatic solution, but it's a magic number. Consider either:
- Extracting it to a named constant with documentation
- Making it configurable for different environments
This is a minor suggestion since the current implementation works.
♻️ Proposed constant extraction
+# Delay before making LLM calls in background tasks to allow MCP session cleanup +MCP_CLEANUP_DELAY_SECONDS = 0.5 async def persist_with_topic_summary() -> None: ... logger.debug("Background task: waiting for MCP cleanup") - await asyncio.sleep(0.5) # 500ms should be enough for cleanup + await asyncio.sleep(MCP_CLEANUP_DELAY_SECONDS) logger.debug("Background task: MCP cleanup complete")tests/e2e/features/mcp_tools.feature (1)
66-76: Consider documenting the magic number in the assertion.The assertion
should have received at least 6 requestsuses a specific count. While this works, if the test configuration changes (e.g., adding/removing MCP servers), this number may become incorrect.Consider adding a comment or using a more descriptive step that explains the expected breakdown (e.g., 2 requests per server × 3 servers = 6 requests minimum).
| except APIConnectionError as exc: | ||
| # Llama Stack connection failure - return 503 so clients know to retry | ||
| logger.error("Llama Stack connection error in middleware: %s", exc) | ||
|
|
||
| if response_started: | ||
| logger.error( | ||
| "Cannot send 503 response - response already started (likely streaming)" | ||
| ) | ||
| return | ||
|
|
||
| # Return 503 Service Unavailable for Llama Stack connection issues | ||
| error_response = ServiceUnavailableResponse( | ||
| backend_name="Llama Stack", cause=str(exc) | ||
| ) | ||
|
|
||
| Returns: | ||
| Response: The HTTP response produced by the next handler. | ||
| """ | ||
| path = request.url.path | ||
| logger.debug("Received request for path: %s", path) | ||
|
|
||
| # ignore paths that are not part of the app routes | ||
| if path not in app_routes_paths: | ||
| return await call_next(request) | ||
|
|
||
| logger.debug("Processing API request for path: %s", path) | ||
|
|
||
| # measure time to handle duration + update histogram | ||
| with metrics.response_duration_seconds.labels(path).time(): | ||
| response = await call_next(request) | ||
|
|
||
| # ignore /metrics endpoint that will be called periodically | ||
| if not path.endswith("/metrics"): | ||
| # just update metrics | ||
| metrics.rest_api_calls_total.labels(path, response.status_code).inc() | ||
| return response | ||
|
|
||
|
|
||
| @app.middleware("http") | ||
| async def global_exception_middleware( | ||
| request: Request, call_next: Callable[[Request], Awaitable[Response]] | ||
| ) -> Response: | ||
| """Middleware to handle uncaught exceptions from all endpoints.""" | ||
| try: | ||
| response = await call_next(request) | ||
| return response | ||
| except HTTPException: | ||
| raise | ||
| except Exception as exc: # pylint: disable=broad-exception-caught | ||
| logger.exception("Uncaught exception in endpoint: %s", exc) | ||
| error_response = InternalServerErrorResponse.generic() | ||
| return JSONResponse( | ||
| status_code=error_response.status_code, | ||
| content={"detail": error_response.detail.model_dump()}, | ||
| ) | ||
| await send( | ||
| { | ||
| "type": "http.response.start", | ||
| "status": error_response.status_code, | ||
| "headers": [[b"content-type", b"application/json"]], | ||
| } | ||
| ) | ||
| await send( | ||
| { | ||
| "type": "http.response.body", | ||
| "body": error_response.model_dump_json().encode("utf-8"), | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -name "main.py" -path "*/app/*" 2>/dev/null | head -5Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# First, let's look at the file structure
git ls-files | grep -E "(main\.py|response|error)" | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 771
🏁 Script executed:
# Check if src/app/main.py exists and get its size
wc -l src/app/main.py 2>/dev/null || echo "File not found at src/app/main.py"Repository: lightspeed-core/lightspeed-stack
Length of output: 95
🏁 Script executed:
# Let's search for error handling patterns in the codebase
rg "APIConnectionError|500|ServiceUnavailableResponse" -A 5 -B 2 --no-heading | head -100Repository: lightspeed-core/lightspeed-stack
Length of output: 19631
🏁 Script executed:
# Read the main.py file focusing on the error handling section
sed -n '300,370p' src/app/main.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 3089
🏁 Script executed:
# Search for 500 error handling and other error responses in main.py
rg "500|APIConnectionError|except.*Error" src/app/main.py -B 2 -A 8Repository: lightspeed-core/lightspeed-stack
Length of output: 2035
🏁 Script executed:
# Read the responses.py file to understand the response structures
head -150 src/models/responses.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 5278
🏁 Script executed:
# Continue reading main.py to see the 500 error handling
sed -n '350,380p' src/app/main.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1465
🏁 Script executed:
# Search for InternalServerErrorResponse and ServiceUnavailableResponse definitions
rg "class.*ServerError|class.*ServiceUnavailable" src/models/responses.py -A 15Repository: lightspeed-core/lightspeed-stack
Length of output: 1186
🏁 Script executed:
# Look for any ErrorResponse or base error response models
rg "class.*Error.*Response|class.*Response.*Error" src/models/responses.py -A 10Repository: lightspeed-core/lightspeed-stack
Length of output: 4027
🏁 Script executed:
# Find the DetailModel class definition
rg "class DetailModel" src/models/responses.py -A 10Repository: lightspeed-core/lightspeed-stack
Length of output: 406
🏁 Script executed:
# Check how ServiceUnavailableResponse is instantiated to understand its constructor
rg "ServiceUnavailableResponse" src/app/main.py -B 2 -A 5Repository: lightspeed-core/lightspeed-stack
Length of output: 667
🏁 Script executed:
# Check if there are any other tests or examples showing expected response formats
rg "ServiceUnavailableResponse|InternalServerErrorResponse" tests/ -A 3 | head -40Repository: lightspeed-core/lightspeed-stack
Length of output: 3753
🏁 Script executed:
# Check the ServiceUnavailableResponse test cases more fully
rg "ServiceUnavailableResponse" tests/unit/models/responses/test_error_responses.py -A 8Repository: lightspeed-core/lightspeed-stack
Length of output: 2580
🏁 Script executed:
# Look at the ServiceUnavailableResponse class definition more completely
rg "class ServiceUnavailableResponse" src/models/responses.py -A 30Repository: lightspeed-core/lightspeed-stack
Length of output: 1214
🏁 Script executed:
# Check if there's a factory method or __init__ for ServiceUnavailableResponse
ast-grep --pattern 'class ServiceUnavailableResponse {
$$$
}'Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Look for any factory methods in error response classes
rg "@classmethod|def generic|def " src/models/responses.py | grep -A 2 -B 2 "ServiceUnavailable\|InternalServerError"Repository: lightspeed-core/lightspeed-stack
Length of output: 765
🏁 Script executed:
# Read the complete ServiceUnavailableResponse __init__ to see the full constructor
sed -n '1,500p' src/models/responses.py | grep -A 30 "class ServiceUnavailableResponse"Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Check AbstractErrorResponse's __init__ to understand how detail is constructed
rg "class AbstractErrorResponse" src/models/responses.py -A 25 | head -40Repository: lightspeed-core/lightspeed-stack
Length of output: 962
🏁 Script executed:
# Read the actual AbstractErrorResponse __init__ method
rg "class AbstractErrorResponse" src/models/responses.py -B 5 -A 50 | grep -A 20 "def __init__"Repository: lightspeed-core/lightspeed-stack
Length of output: 913
🏁 Script executed:
# Verify what the actual serialized outputs look like by examining model_dump behavior
python3 << 'EOF'
# Simulate the response models
class DetailModel:
def __init__(self, response: str, cause: str):
self.response = response
self.cause = cause
def model_dump(self):
return {"response": self.response, "cause": self.cause}
class ServiceUnavailableResponse:
def __init__(self, backend_name: str, cause: str):
self.status_code = 503
self.detail = DetailModel(f"Unable to connect to {backend_name}", cause)
def model_dump_json(self):
import json
return json.dumps({
"status_code": self.status_code,
"detail": self.detail.model_dump()
})
# Test current 503 response
resp_503 = ServiceUnavailableResponse("Llama Stack", "Connection timeout")
print("Current 503 response body:")
print(resp_503.model_dump_json())
# Test what the 500 response format would be (from code snippet)
print("\n500 response format (from main.py):")
print('{"detail": {"response": "Internal server error", "cause": "..."}}')
# Test proposed fix format
print("\nProposed 503 fix format:")
import json
content = {"detail": resp_503.detail.model_dump()}
print(json.dumps(content))
EOFRepository: lightspeed-core/lightspeed-stack
Length of output: 435
🏁 Script executed:
# Double-check what JSONResponse returns
python3 << 'EOF'
from fastapi.responses import JSONResponse
import json
# Simulate response detail
response_detail = {"response": "Unable to connect to Llama Stack", "cause": "Connection timeout"}
# Test the suggested fix approach
json_response = JSONResponse(content={"detail": response_detail})
print("JSONResponse body type:", type(json_response.body))
print("JSONResponse body value:", json_response.body)
# Verify it matches direct JSON encoding
direct_encoding = json.dumps({"detail": response_detail}).encode("utf-8")
print("\nDirect encoding matches:", json_response.body == direct_encoding)
EOFRepository: lightspeed-core/lightspeed-stack
Length of output: 195
🏁 Script executed:
# Let's verify the actual current output one more time by reading the exact code
sed -n '322,349p' src/app/main.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1147
Align 503 response schema with 500 response structure. The 503 response currently sends {"status_code": 503, "detail": {...}} while the 500 response sends only {"detail": {...}}, creating an inconsistent error schema. Fix by wrapping the detail object:
Suggested fix
- await send(
- {
- "type": "http.response.body",
- "body": error_response.model_dump_json().encode("utf-8"),
- }
- )
+ await send(
+ {
+ "type": "http.response.body",
+ "body": JSONResponse(
+ content={"detail": error_response.detail.model_dump()}
+ ).body,
+ }
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except APIConnectionError as exc: | |
| # Llama Stack connection failure - return 503 so clients know to retry | |
| logger.error("Llama Stack connection error in middleware: %s", exc) | |
| if response_started: | |
| logger.error( | |
| "Cannot send 503 response - response already started (likely streaming)" | |
| ) | |
| return | |
| # Return 503 Service Unavailable for Llama Stack connection issues | |
| error_response = ServiceUnavailableResponse( | |
| backend_name="Llama Stack", cause=str(exc) | |
| ) | |
| Returns: | |
| Response: The HTTP response produced by the next handler. | |
| """ | |
| path = request.url.path | |
| logger.debug("Received request for path: %s", path) | |
| # ignore paths that are not part of the app routes | |
| if path not in app_routes_paths: | |
| return await call_next(request) | |
| logger.debug("Processing API request for path: %s", path) | |
| # measure time to handle duration + update histogram | |
| with metrics.response_duration_seconds.labels(path).time(): | |
| response = await call_next(request) | |
| # ignore /metrics endpoint that will be called periodically | |
| if not path.endswith("/metrics"): | |
| # just update metrics | |
| metrics.rest_api_calls_total.labels(path, response.status_code).inc() | |
| return response | |
| @app.middleware("http") | |
| async def global_exception_middleware( | |
| request: Request, call_next: Callable[[Request], Awaitable[Response]] | |
| ) -> Response: | |
| """Middleware to handle uncaught exceptions from all endpoints.""" | |
| try: | |
| response = await call_next(request) | |
| return response | |
| except HTTPException: | |
| raise | |
| except Exception as exc: # pylint: disable=broad-exception-caught | |
| logger.exception("Uncaught exception in endpoint: %s", exc) | |
| error_response = InternalServerErrorResponse.generic() | |
| return JSONResponse( | |
| status_code=error_response.status_code, | |
| content={"detail": error_response.detail.model_dump()}, | |
| ) | |
| await send( | |
| { | |
| "type": "http.response.start", | |
| "status": error_response.status_code, | |
| "headers": [[b"content-type", b"application/json"]], | |
| } | |
| ) | |
| await send( | |
| { | |
| "type": "http.response.body", | |
| "body": error_response.model_dump_json().encode("utf-8"), | |
| } | |
| ) | |
| except APIConnectionError as exc: | |
| # Llama Stack connection failure - return 503 so clients know to retry | |
| logger.error("Llama Stack connection error in middleware: %s", exc) | |
| if response_started: | |
| logger.error( | |
| "Cannot send 503 response - response already started (likely streaming)" | |
| ) | |
| return | |
| # Return 503 Service Unavailable for Llama Stack connection issues | |
| error_response = ServiceUnavailableResponse( | |
| backend_name="Llama Stack", cause=str(exc) | |
| ) | |
| await send( | |
| { | |
| "type": "http.response.start", | |
| "status": error_response.status_code, | |
| "headers": [[b"content-type", b"application/json"]], | |
| } | |
| ) | |
| await send( | |
| { | |
| "type": "http.response.body", | |
| "body": json.dumps({"detail": error_response.detail.model_dump()}).encode("utf-8"), | |
| } | |
| ) |
🤖 Prompt for AI Agents
In `@src/app/main.py` around lines 322 - 349, The 503 handler currently sends the
full ServiceUnavailableResponse (including status_code) which differs from the
500 response schema; change the body sent to match the 500 format by sending
only the detail wrapper. Specifically, after creating error_response
(ServiceUnavailableResponse) in the except APIConnectionError block, build the
payload as {"detail": error_response.detail} (or equivalent from error_response)
and send that JSON via send(...) instead of error_response.model_dump_json(), so
the response schema matches the 500 error format.
tests/e2e/features/info.feature
Outdated
| Then The status code of the response is 200 | ||
| And The body of the response has proper name Lightspeed Core Service (LCS) and version 0.4.0 | ||
| And The body of the response has llama-stack version 0.4.3 | ||
| And The body of the response has llama-stack version 0.4.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for llama-stack version references in config, dependency, and Docker files
echo "=== Searching for version references (0.4.x) ==="
rg "0\.4\.[0-9]" --type yaml --type toml --type txt -g '!*.lock' -g '!node_modules' 2>/dev/null || true
echo -e "\n=== Searching for llama-stack in requirements/dependencies ==="
fd -e "requirements.txt" -e "setup.py" -e "setup.cfg" -e "pyproject.toml" -e "poetry.lock" | head -20
echo -e "\n=== Searching for Dockerfile or docker-compose ==="
fd -e "Dockerfile*" -e "docker-compose*" | head -10
echo -e "\n=== Searching for all occurrences of 'llama-stack' in feature files ==="
rg "llama-stack" tests/e2e/features/ 2>/dev/null || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 4675
🏁 Script executed:
#!/bin/bash
# Check the feature file context around line 19
echo "=== Content of info.feature around the changed line ==="
cat -n tests/e2e/features/info.feature | head -30
echo -e "\n=== Search for version expectations in all feature files ==="
rg -n "version" tests/e2e/features/ --type txt 2>/dev/null || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 1650
🏁 Script executed:
#!/bin/bash
# Check git history to understand the context of this change
echo "=== Recent commits touching this file ==="
git log --oneline -n 10 -- tests/e2e/features/info.feature 2>/dev/null || echo "Git history not available"
echo -e "\n=== Check if there's a version file or environment configuration ==="
fd "version" -e "txt" -e "py" -e "yaml" -e "env" | grep -E "(version|config|env)" | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 371
🏁 Script executed:
#!/bin/bash
echo "=== Check llama_stack_version.py utility ==="
cat -n src/utils/llama_stack_version.py | head -50
echo -e "\n=== Check version.py ==="
cat -n src/version.py
echo -e "\n=== Check the test step implementation for version checking ==="
cat -n tests/e2e/features/steps/info.py | head -40Repository: lightspeed-core/lightspeed-stack
Length of output: 4347
Fix the llama-stack version mismatch in the test assertion.
The test expects llama-stack version 0.4.2, but pyproject.toml pins llama-stack==0.4.3. This mismatch will cause the e2e test to fail when run. The assertion on line 19 should expect 0.4.3 to match the pinned dependency.
🤖 Prompt for AI Agents
In `@tests/e2e/features/info.feature` at line 19, Update the test assertion string
that currently reads "And The body of the response has llama-stack version
0.4.2" to expect "llama-stack version 0.4.3" so it matches the pinned dependency
in pyproject.toml; locate and modify the step definition or feature line
containing that exact phrase to use 0.4.3.
116f3b5 to
e77d05d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@docker-compose-library.yaml`:
- Around line 69-73: The entrypoint currently starts a bash shell that writes
the token and launches Python as a child; replace that with a shell sequence
that tightens token permissions and then uses exec to replace PID 1 with the
Python process: set umask 077 before creating the token (or explicitly chmod the
token to 0600 after printf to /tmp/lightspeed-mcp-test-token) and then exec
/app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py so the Python
process becomes PID 1 and receives signals correctly; update the entrypoint
block that uses /bin/bash -c and printf accordingly.
In `@docker-compose.yaml`:
- Around line 87-91: The entrypoint should drop into the Python process as PID 1
and create the token file with owner-only permissions; update the entrypoint
snippet that writes the token and launches /app-root/.venv/bin/python3.12
/app-root/src/lightspeed_stack.py to first set umask 077 (so the token file is
created with owner-only perms) and then use exec to replace the shell with the
Python process (so the Python process becomes PID 1 and receives signals);
locate the multline entrypoint block that prints 'test-secret-token-123' into
/tmp/lightspeed-mcp-test-token and modify it accordingly.
In `@src/app/endpoints/query_v2.py`:
- Around line 261-268: The log prints the full args dict for MCP approval
requests (logger.debug(" Arguments: %s", args)), which may leak sensitive data;
change this to log a redacted version of args using the same redaction used for
mcp_call (apply the existing redaction helper used around mcp_call at line ~186)
and log that redacted payload instead, referencing
approval_request_item.name/server_label and the redacted args so raw user
data/secrets are never emitted.
🧹 Nitpick comments (4)
tests/e2e/features/steps/llm_query_response.py (1)
63-68: Avoid a hardcoded 3s sleep; make the wait configurable (or poll with a deadline).
A fixed sleep can be too short in slower CI (flaky) and too long when persistence is fast. Consider a configurable wait (via behave userdata) or polling for readiness with a max timeout.💡 Example tweak (configurable wait)
- time.sleep(3) + wait_s = float(context.config.userdata.get("persistence_wait_s", 3)) + time.sleep(wait_s)src/app/endpoints/query.py (1)
430-433: Hardcoded delay for MCP cleanup is fragile.The 500ms sleep is a best-effort heuristic. Under high load or slow network conditions, MCP cleanup might not complete in time. Consider making this configurable or implementing a more robust synchronization mechanism if issues arise in production.
tests/e2e/features/steps/mcp.py (2)
553-613: Multiple verification steps only check status code.Steps like
check_response_has_tool_results,check_successful_tool_execution,check_response_has_tool_info, andcheck_tool_results_in_responseall make claims about tool execution but only verify thatresponsekey exists. This makes tests pass even when tools weren't actually used.Consider strengthening these assertions to check for tool-specific fields (
tool_calls,tool_results) when they mature beyond initial coverage.
10-11: Consider making mock server URL configurable.
MOCK_MCP_SERVER_URLis hardcoded. If the mock server runs on a different port or host in CI environments, tests will fail. Consider using an environment variable with a default:MOCK_MCP_SERVER_URL = os.environ.get("MCP_MOCK_SERVER_URL", "http://localhost:9000")
| entrypoint: > | ||
| /bin/bash -c " | ||
| printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && | ||
| /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py | ||
| " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "docker-compose-library.yaml" -o -name "*compose*.yaml" | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 127
🏁 Script executed:
sed -n '60,80p' ./docker-compose-library.yamlRepository: lightspeed-core/lightspeed-stack
Length of output: 1078
🌐 Web query:
bash PID1 signal handling Docker entrypoint
💡 Result:
When your Docker container’s PID 1 is bash (or /bin/sh -c), signal handling often surprises you:
- PID 1 is special in Linux namespaces: if PID 1 doesn’t install handlers, some signals (notably
SIGTERM/SIGINT) won’t have their normal “default action” of terminating the process. Sodocker stopmay hang until Docker sendsSIGKILL. [1] - If you use shell-form
CMD/ENTRYPOINT(string form), Docker effectively runs your command under/bin/sh -c, and signals may not reach your real app (your app isn’t PID 1). Docker explicitly warns that shell-formENTRYPOINT“does not pass signals”. [2]
Best practices for Docker entrypoints
- Prefer exec-form
ENTRYPOINT/CMD(JSON array)
ENTRYPOINT ["/usr/local/bin/my-app"]
# or
CMD ["/usr/local/bin/my-app"]This avoids /bin/sh -c wrappers and lets your app receive signals directly. [2]
- If you must use a shell script entrypoint,
execthe long-running process
#!/bin/sh
set -e
# setup...
exec my-app "$@"This replaces the shell with your app, so the app becomes PID 1 and receives SIGTERM. Docker’s own docs call out using exec for clean docker stop behavior. [2]
- Use a minimal init as PID 1 (recommended):
--init(tini) or bake it in
- Docker CLI:
docker run --init …makes an init process run as PID 1 (backed by tini) and handle duties like signal forwarding and zombie reaping. [3] - Or explicitly:
ENTRYPOINT ["tini","--","/your/entrypoint.sh"](tini docs). [4]
Most common fix: switch to exec-form + ensure your entrypoint script ends with exec …, or run with --init.
Sources:
[1] Docker run reference (PID 1 signal behavior) https://www.docs4dev.com/docs/docker/19.03/engine/reference/run/
[2] Dockerfile reference (shell vs exec form; shell form doesn’t pass signals; use exec) https://docs.docker.com/reference/builder
[3] docker container run docs (--init, docker-init backed by tini) https://docs.docker.com/reference/cli/docker/container/run
[4] tini README (--init and usage) https://github.com/krallin/tini
🌐 Web query:
umask 077 file permissions /tmp default permissions bash
💡 Result:
-
umask 077affects new files/directories your process creates anywhere, including under/tmp. It removes all group/other permissions from the requested mode. (sans.org)- Typical result:
- New files:
0666 & ~0077 = 0600(rw-------) - New directories:
0777 & ~0077 = 0700(rwx------) (sans.org)
- New files:
- Typical result:
-
/tmpitself is normally world-writable with the sticky bit: mode1777(drwxrwxrwt). Yourumaskdoes not change/tmp’s existing permissions; it only affects what you create inside it. (linuxconfig.org) -
Bash is not special here: if you set
umask 077in your shell (or in~/.bashrc, etc.), then commands you run from that shell inherit it and will create/tmpfiles accordingly (unless the program explicitlychmods after creation or requests a more restrictive mode). (bs2manuals.ts.fujitsu.com) -
Many systems ensure
/tmphas mode1777at boot via systemd-tmpfiles rules (example entries show1777directories under/tmp). (freedesktop.org)
Citations:
- 1: https://www.sans.org/blog/understanding-user-permissions-your-first-line-of-defense-part-3-of-5/?utm_source=openai
- 2: https://www.sans.org/blog/understanding-user-permissions-your-first-line-of-defense-part-3-of-5/?utm_source=openai
- 3: https://linuxconfig.org/explaining-the-sticky-bit-what-the-t-in-linux-directory-permissions-means?utm_source=openai
- 4: https://bs2manuals.ts.fujitsu.com/psPOSIXV210en/umask-get-or-set-the-file-mode-creation-mask-posix-commands-2025-09-215-13638739.html?utm_source=openai
- 5: https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html?utm_source=openai
🌐 Web query:
docker-compose entrypoint exec umask syntax YAML multiline string
💡 Result:
Use exec-form (entrypoint: as a YAML list) and run a shell explicitly so you can set umask and then exec the real process (so it becomes PID 1 and receives signals). Compose allows entrypoint as either a string or a list; the list form is the “Dockerfile-like”/exec style. [1]
services:
app:
image: your-image
entrypoint:
- /bin/sh
- -c
- |
umask 002
exec "$@"
- --
command: ["your-binary", "arg1", "arg2"]Notes:
|is just YAML “multiline string”; it becomes the single script argument tosh -c(newlines included).- The
--is used as the$0placeholder forsh -c; thencommand:becomes$@, whichexec "$@"runs. - Using
exec ...ensures the final process is PID 1 (important for signal handling), as recommended in Docker’s ENTRYPOINT docs. [2]
Sources: Docker Compose entrypoint reference [1]; Docker ENTRYPOINT shell vs exec form + exec for proper PID 1 behavior [2].
[1] (docs.docker.com)
[2] (docs.docker.com)
Citations:
- 1: https://docs.docker.com/reference/compose-file/services/?utm_source=openai
- 2: https://docs.docker.com/reference/dockerfile/?utm_source=openai
Use exec and tighten token file permissions in the entrypoint.
Bash as PID1 prevents proper signal forwarding—docker stop may hang until SIGKILL. The token file in /tmp defaults to world-readable mode. Add umask 077 and exec the Python process to make it PID1 and restrict the token file to owner-only (0600).
🛠️ Proposed fix
- entrypoint: >
- /bin/bash -c "
- printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
- /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
- "
+ entrypoint: >
+ /bin/bash -c "
+ umask 077
+ printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
+ exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+ "🤖 Prompt for AI Agents
In `@docker-compose-library.yaml` around lines 69 - 73, The entrypoint currently
starts a bash shell that writes the token and launches Python as a child;
replace that with a shell sequence that tightens token permissions and then uses
exec to replace PID 1 with the Python process: set umask 077 before creating the
token (or explicitly chmod the token to 0600 after printf to
/tmp/lightspeed-mcp-test-token) and then exec /app-root/.venv/bin/python3.12
/app-root/src/lightspeed_stack.py so the Python process becomes PID 1 and
receives signals correctly; update the entrypoint block that uses /bin/bash -c
and printf accordingly.
| entrypoint: > | ||
| /bin/bash -c " | ||
| printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && | ||
| /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py | ||
| " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n docker-compose.yaml | sed -n '85,95p'Repository: lightspeed-core/lightspeed-stack
Length of output: 519
🌐 Web query:
bash PID1 signal handling swallow signals docker
💡 Result:
When you run a shell script as the container “main process”, bash becomes PID 1, and PID 1 has special signal semantics in Linux containers:
- PID 1 doesn’t get the normal “default action” for many fatal signals (like
SIGTERM): if PID 1 hasn’t installed a handler, the signal may effectively do nothing, sodocker stop(SIGTERM) appears “swallowed”. This is why containers can linger until Docker eventually sendsSIGKILL. [1], [2] - Even if
bashtraps/exits onSIGTERM, it typically won’t forward that signal to its child process(es) unless you explicitly do so—so your actual app keeps running. [3]
What to do instead (common fixes)
-
Prefer making your app PID 1 (avoid
bashas PID 1)
Use Docker “exec form” so your process replaces the shell (no extra PID 1 shell layer). (This avoids the “shell doesn’t forward signals” problem.) [3] -
Add a minimal init as PID 1 (recommended for general robustness)
Run with Docker’s built-in init (tini) so signals are forwarded and zombies are reaped:docker run --init …(orinit: truein Compose). [4] -
If you must use
bashas PID 1: trap + forward signals
At minimum, install aSIGTERMtrap (so PID 1 actually exits) and explicitly propagate termination to the child process group (otherwise children may ignore it). The need to trapTERMas PID 1 is the key point. [3]
Sources:
[1] https://stackoverflow.com/questions/71448484/why-does-foreground-job-ignore-job-control-signals-when-bash-is-running-as-pid-1
[2] https://semisignal.com/sigterm-and-pid1/
[3] https://cloud.theodo.com/en/blog/docker-processes-container
[4] https://raby.sh/sigterm-and-pid-1-why-does-a-container-linger-after-receiving-a-sigterm.html
🌐 Web query:
/tmp file permissions default linux umask security
💡 Result:
-
Default
/tmpdirectory permissions on most Linux distros aredrwxrwxrwt(octal1777) owned byroot:root: world-writable with the sticky bit so users can’t delete/rename other users’ files in/tmpeven though everyone can create files there. This is commonly enforced at boot bysystemd-tmpfilesrules (oftenD /tmp 1777 root root -). [1] [2] -
umaskcontrols the default permissions of new files/dirs created inside/tmp(it does not change/tmp’s own1777mode). When a program creates a file, the kernel applies:
final_mode = requested_mode & ~umask (and similarly for directories). Typical defaultumaskis022, so a file created with requested mode0666becomes0644. [3] -
Security implications / best practice
/tmpis a shared namespace: predictable names like/tmp/foocan be pre-created by another user to cause failures or trick a program (DoS, symlink/hardlink-style attacks). Use unique, non-guessable names and safe creation patterns (e.g., “create-new” semantics), and prefer honoring$TMPDIRwhen set. [4]- The sticky bit does not prevent reads. If your program creates temp files with permissive modes (because it requests
0666and relies onumask), other users may read them. Set sensitive temp files to owner-only permissions explicitly (and don’t rely onumaskalone). [3] - For services, consider systemd
PrivateTmp=yesto give the service its own private/tmp(and/var/tmp) namespace, reducing cross-process attacks via shared temp directories. [5]
-
Persistence note (
/tmpvs/var/tmp)/tmpis often cleared at boot and may be ontmpfs(RAM/swap), while/var/tmpis generally more persistent and intended to survive reboots. [6] [4]
Sources:
[1] https://www.baeldung.com/linux/sticky-bit
[2] https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html
[3] https://manpages.debian.org/testing/manpages-dev/umask.2.en.html
[4] https://systemd.io/TEMPORARY_DIRECTORIES/
[5] https://www.redhat.com/de/blog/new-red-hat-enterprise-linux-7-security-feature-privatetmp
[6] https://unix.stackexchange.com/questions/30489/what-is-the-difference-between-tmp-and-var-tmp
Use exec and tighten token file permissions in the entrypoint.
Bash as PID 1 in containers won't forward SIGTERM to child processes and may ignore the signal itself without an explicit trap, preventing graceful shutdown. Additionally, the token file is created with default umask (typically 022), making it world-readable in the container. Apply umask 077 to restrict the token to owner-only permissions and use exec to make the Python process PID 1:
🛠️ Proposed fix
- entrypoint: >
- /bin/bash -c "
- printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
- /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
- "
+ entrypoint: >
+ /bin/bash -c "
+ umask 077
+ printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token &&
+ exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py
+ "📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| entrypoint: > | |
| /bin/bash -c " | |
| printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && | |
| /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py | |
| " | |
| entrypoint: > | |
| /bin/bash -c " | |
| umask 077 | |
| printf %s 'test-secret-token-123' > /tmp/lightspeed-mcp-test-token && | |
| exec /app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py | |
| " |
🤖 Prompt for AI Agents
In `@docker-compose.yaml` around lines 87 - 91, The entrypoint should drop into
the Python process as PID 1 and create the token file with owner-only
permissions; update the entrypoint snippet that writes the token and launches
/app-root/.venv/bin/python3.12 /app-root/src/lightspeed_stack.py to first set
umask 077 (so the token file is created with owner-only perms) and then use exec
to replace the shell with the Python process (so the Python process becomes PID
1 and receives signals); locate the multline entrypoint block that prints
'test-secret-token-123' into /tmp/lightspeed-mcp-test-token and modify it
accordingly.
924666a to
490a94c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/app/main.py`:
- Around line 236-241: The current string check using scope["path"] against
self.app_routes_paths should be replaced with route template matching so
parameterized routes are recognized and metric labels use the template, not raw
paths; locate the middleware logic referencing scope["path"],
self.app_routes_paths and the call to await self.app(scope, receive, send) and
use each Starlette Route.matches(scope) to find the matching Route, skip metrics
only when no route matches, and derive the metric label from the Route.path
(template) rather than scope["path"] to avoid Prometheus cardinality explosion.
🧹 Nitpick comments (2)
dev-tools/mcp-mock-server/server.py (2)
63-67: Consider simplifying the function signature formatting.The multiline return type annotation with
Noneon its own line and trailing comment is unconventional. A single line suffices:def do_POST(self) -> None: # pylint: disable=invalid-name,too-many-locals,...
97-113: Consider using standard if-elif for token matching.The
match True:pattern with guard clauses is unconventional. A standard if-elif chain would be more readable:♻️ Suggested refactor
- match True: - case _ if "test-secret-token" in auth_header: - tool_name = "mock_tool_file" - tool_desc = "Mock tool with file-based auth" - case _ if "my-k8s-token" in auth_header: - tool_name = "mock_tool_k8s" - tool_desc = "Mock tool with Kubernetes token" - case _ if "my-client-token" in auth_header: - tool_name = "mock_tool_client" - tool_desc = "Mock tool with client-provided token" - case _ if "error-mode" in auth_header: - tool_name = "mock_tool_error" - tool_desc = "Mock tool configured to return errors" - error_mode = True - case _: - # Default case already set above - pass + if "test-secret-token" in auth_header: + tool_name = "mock_tool_file" + tool_desc = "Mock tool with file-based auth" + elif "my-k8s-token" in auth_header: + tool_name = "mock_tool_k8s" + tool_desc = "Mock tool with Kubernetes token" + elif "my-client-token" in auth_header: + tool_name = "mock_tool_client" + tool_desc = "Mock tool with client-provided token" + elif "error-mode" in auth_header: + tool_name = "mock_tool_error" + tool_desc = "Mock tool configured to return errors" + error_mode = True
490a94c to
0799d40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tests/e2e/features/environment.py`:
- Around line 263-273: Multiple auth-related tag blocks (e.g., the MCP block
using create_config_backup, switch_config, restart_container) can run for the
same feature and cause double swaps/restores; add a guard so config swapping
happens only once per feature by tracking the active swap on the context (for
example set context.feature_config and a boolean like context.config_swapped or
context.active_feature_config) and only call
create_config_backup/switch_config/restart_container when that flag is not set,
and ensure the corresponding restore logic checks the same flag and clears it
after restoring so the second tag won't attempt to restore a removed backup.
🧹 Nitpick comments (2)
tests/e2e/features/steps/conversation.py (1)
317-361: Reuse shared retry/timeout constants for consistency.
Keeps polling behavior aligned with the module-wide configuration and makes the logged “waited” time accurate under the cap.♻️ Suggested refactor
- max_retries = 10 - delay = 0.2 # Start with 200ms + max_retries = MAX_RETRIES + delay = INITIAL_RETRY_DELAY # Start with 200ms @@ - f"(waited {sum(0.2 * (2 ** i) for i in range(attempt)):.2f}s)" + f"(waited {sum(min(INITIAL_RETRY_DELAY * (2 ** i), MAX_RETRY_DELAY) for i in range(attempt)):.2f}s)" @@ - delay = min(delay * 2, 2.0) # Exponential backoff, cap at 2s + delay = min(delay * 2, MAX_RETRY_DELAY) # Exponential backoff, cap at 2s @@ - context.response = requests.get(url, headers=headers, timeout=10) + context.response = requests.get(url, headers=headers, timeout=DEFAULT_TIMEOUT)tests/e2e/features/steps/llm_query_response.py (1)
3-4: Make the persistence wait configurable for environment-specific tuning (optional improvement)The documented persistence timing includes topic summary generation (2-5s), and code comments note empirically observed persistence at 2-3s in CI. The 3s delay was deliberately chosen to account for this. While making the wait configurable via environment variable is reasonable for different test environments, this is an optional refactor rather than a flakiness fix, since the current value appears well-calibrated to observed CI behavior.
Suggested approach
+import os import time @@ - time.sleep(3) + wait_seconds = float(os.getenv("E2E_PERSIST_WAIT_SECONDS", "3")) + time.sleep(wait_seconds)
| if "MCP" in feature.tags: | ||
| # For MCP tests, we need noop-with-token auth to support k8s token forwarding | ||
| # Use mode-specific configs (server vs library) | ||
| mode_dir = "library-mode" if context.is_library_mode else "server-mode" | ||
| context.feature_config = ( | ||
| f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-mcp.yaml" | ||
| ) | ||
| context.default_config_backup = create_config_backup("lightspeed-stack.yaml") | ||
| switch_config(context.feature_config) | ||
| restart_container("lightspeed-stack") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against double config swaps if feature tags overlap.
Because each auth-related tag uses independent if blocks, a feature tagged with MCP and another auth tag would switch configs twice and restore twice; the second restore can fail once the backup is removed. Consider enforcing mutual exclusivity or tracking the active feature config and restoring once.
Also applies to: 301-305
🤖 Prompt for AI Agents
In `@tests/e2e/features/environment.py` around lines 263 - 273, Multiple
auth-related tag blocks (e.g., the MCP block using create_config_backup,
switch_config, restart_container) can run for the same feature and cause double
swaps/restores; add a guard so config swapping happens only once per feature by
tracking the active swap on the context (for example set context.feature_config
and a boolean like context.config_swapped or context.active_feature_config) and
only call create_config_backup/switch_config/restart_container when that flag is
not set, and ensure the corresponding restore logic checks the same flag and
clears it after restoring so the second tag won't attempt to restore a removed
backup.
b5c8a15 to
00e41fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tests/e2e/features/steps/common.py`:
- Around line 32-34: The docstring is incorrect about Behave auto-resetting auth
headers; update the after_scenario hook in environment.py to explicitly clear
any leaked auth headers by checking for and deleting context.auth_headers (e.g.,
if hasattr(context, "auth_headers"): del context.auth_headers), or
alternatively, revise the docstring to state that Behave preserves custom
context attributes across scenarios and explain why clearing is intentionally
skipped; locate the after_scenario function and apply one of these two fixes
referencing context.auth_headers and after_scenario.
🧹 Nitpick comments (6)
tests/integration/endpoints/test_query_v2_integration.py (2)
36-62: Consider movingwait_for_background_taskstoconftest.pyfor reuse.This helper is valuable for coordinating with background persistence. According to your project conventions, shared fixtures and test utilities should be in
conftest.py. If other test modules need this pattern (e.g., streaming tests), centralizing it would reduce duplication.The implementation itself is sound - it properly snapshots
background_tasks_setto avoid iteration errors and provides adequate delays for MCP cleanup and thread pool operations.Based on learnings: "Use
conftest.pyfor shared fixtures in pytest"
163-170: Mock signature accepts**kwargsbutasyncio.to_threaddoesn't support keyword arguments.
asyncio.to_thread(func, /, *args)only accepts positional arguments for the target function. The mock accepts**kwargs, which could hide bugs if production code mistakenly passes keyword arguments (they'd work in tests but fail in production).🔧 Suggested fix to match real signature
- async def mock_to_thread(func: Any, *args: Any, **kwargs: Any) -> Any: - """Run function synchronously instead of in thread pool for tests.""" - return func(*args, **kwargs) + async def mock_to_thread(func: Any, *args: Any) -> Any: + """Run function synchronously instead of in thread pool for tests.""" + return func(*args)dev-tools/mcp-mock-server/server.py (1)
97-113: Non-idiomaticmatch Truepattern for conditional branching.Using
match True:withcase _ if condition:works but is an unusual pattern that reduces readability. Since you're matching against string content, a series ofif/elifstatements would be more conventional and clearer.♻️ Suggested refactor using if/elif
- match True: - case _ if "test-secret-token" in auth_header: - tool_name = "mock_tool_file" - tool_desc = "Mock tool with file-based auth" - case _ if "my-k8s-token" in auth_header: - tool_name = "mock_tool_k8s" - tool_desc = "Mock tool with Kubernetes token" - case _ if "my-client-token" in auth_header: - tool_name = "mock_tool_client" - tool_desc = "Mock tool with client-provided token" - case _ if "error-mode" in auth_header: - tool_name = "mock_tool_error" - tool_desc = "Mock tool configured to return errors" - error_mode = True - case _: - # Default case already set above - pass + if "test-secret-token" in auth_header: + tool_name = "mock_tool_file" + tool_desc = "Mock tool with file-based auth" + elif "my-k8s-token" in auth_header: + tool_name = "mock_tool_k8s" + tool_desc = "Mock tool with Kubernetes token" + elif "my-client-token" in auth_header: + tool_name = "mock_tool_client" + tool_desc = "Mock tool with client-provided token" + elif "error-mode" in auth_header: + tool_name = "mock_tool_error" + tool_desc = "Mock tool configured to return errors" + error_mode = True + # else: defaults already set abovetests/e2e/features/steps/conversation.py (1)
310-371: Minor duplication of retry/backoff logic.This function implements its own retry loop with exponential backoff, duplicating the pattern in
poll_for_conversation. The use cases differ (searching a list vs. fetching by ID), so some divergence is acceptable. However, extracting a common backoff utility could reduce maintenance burden.The implementation is correct and addresses the past review concern about validating response status and payload.
src/app/main.py (2)
96-113: Add a shutdown timeout to avoid hanging on stuck background tasks.
If a background task stalls, shutdown can block indefinitely. Consider a bounded wait and cancel-on-timeout.🛠️ Possible refinement
- all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks) - await asyncio.gather(*all_tasks, return_exceptions=True) + all_tasks = list(query_bg_tasks) + list(streaming_bg_tasks) + try: + await asyncio.wait_for( + asyncio.gather(*all_tasks, return_exceptions=True), + timeout=30, + ) + except asyncio.TimeoutError: + logger.warning("Background tasks did not finish within 30s; cancelling") + for task in all_tasks: + task.cancel() + await asyncio.gather(*all_tasks, return_exceptions=True)
205-257: Use Starlette'sRoute.matches()for more robust route matching.The current manual segment matching doesn't handle path converters (e.g.,
{param:path}) or mounted sub-applications. While the codebase currently uses only simple parameterized routes, usingRoute.matches(scope)is the proper Starlette approach and future-proofs the metrics collection against additional route patterns.♻️ Suggested approach
-from starlette.routing import Mount, Route, WebSocketRoute +from starlette.routing import Match, Mount, Route, WebSocketRoute @@ - def __init__( # pylint: disable=redefined-outer-name - self, app: Any, app_routes_paths: list[str] - ) -> None: + def __init__( # pylint: disable=redefined-outer-name + self, app: Any, app_routes: list[Route] + ) -> None: @@ - self.app_routes_paths = app_routes_paths + self.app_routes = app_routes @@ - route_path = None - for route_template in self.app_routes_paths: - # Simple parameterized route matching: check if path segments match template - # This handles routes like /conversations/{conversation_id} - path_parts = path.strip("/").split("/") - template_parts = route_template.strip("/").split("/") - - if len(path_parts) == len(template_parts): - match = all( - tp.startswith("{") - and tp.endswith("}") # Template parameter - or tp == pp # Exact match - for tp, pp in zip(template_parts, path_parts) - ) - if match: - route_path = route_template - break + matched_route = next( + (r for r in self.app_routes if r.matches(scope)[0] == Match.FULL), + None, + ) + route_path = matched_route.path if matched_route else None-app_routes_paths = [ - route.path - for route in app.routes - if isinstance(route, (Mount, Route, WebSocketRoute)) -] +app_routes = [ + route + for route in app.routes + if isinstance(route, (Mount, Route, WebSocketRoute)) +] @@ -app.add_middleware(MetricsMiddleware, app_routes_paths=app_routes_paths) +app.add_middleware(MetricsMiddleware, app_routes=app_routes)
tests/e2e/features/steps/common.py
Outdated
| Note: Does NOT clear auth headers, as those may be set in Background section | ||
| and should persist for the entire scenario. Auth headers are automatically | ||
| reset by Behave between scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for Behave hooks that might clear auth headers between scenarios
find . -path '*/e2e/*' -name 'environment.py' -type f -exec cat -n {} +
# Search for auth header related code in e2e tests
rg -n 'auth.*header|Authorization|\.auth|headers.*=.*\{\}' --glob '**/e2e/**/*.py' -A 2 -B 2Repository: lightspeed-core/lightspeed-stack
Length of output: 46948
Update the docstring or add explicit auth header clearing to after_scenario hook in environment.py.
The docstring states auth headers are "automatically reset by Behave between scenarios," but this is incorrect. Behave's Context object persists custom attributes across scenarios unless explicitly cleared. Since there is no clearing of auth_headers in the after_scenario hook in environment.py, auth headers set in one scenario will leak into the next scenario, causing test pollution.
Either:
- Add explicit clearing in
after_scenariohook:if hasattr(context, "auth_headers"): del context.auth_headers - Update the docstring to reflect actual behavior and clarify why clearing is intentionally skipped
🤖 Prompt for AI Agents
In `@tests/e2e/features/steps/common.py` around lines 32 - 34, The docstring is
incorrect about Behave auto-resetting auth headers; update the after_scenario
hook in environment.py to explicitly clear any leaked auth headers by checking
for and deleting context.auth_headers (e.g., if hasattr(context,
"auth_headers"): del context.auth_headers), or alternatively, revise the
docstring to state that Behave preserves custom context attributes across
scenarios and explain why clearing is intentionally skipped; locate the
after_scenario function and apply one of these two fixes referencing
context.auth_headers and after_scenario.
00e41fd to
8daec95
Compare
|
/retest |
|
@blublinsky: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
8daec95 to
e5aa9a7
Compare
e5aa9a7 to
d8224fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/app/endpoints/query.py`:
- Line 17: There's a syntax typo in the import statement for TextContentItem:
change the malformed "rom
llama_stack_client.types.shared.interleaved_content_item import TextContentItem"
to a proper "from ..." import so the module can load; locate the import that
references TextContentItem in the file (query.py) and correct the keyword to
"from" to fix the SyntaxError.
- Line 4: Remove the unused import of the ast module from the top of
src/app/endpoints/query.py by deleting the "import ast" line; update the import
block in that file (e.g., near other imports in the module) so only actually
used modules remain (no changes to functions/classes like any query handler
functions are required).
🧹 Nitpick comments (6)
tests/e2e/features/steps/conversation.py (1)
310-371: Reuse shared retry constants to avoid drift.
This block redefines retry values and timeout literals that already exist. Using the shared constants keeps backoff/timeout behavior consistent across helpers.♻️ Suggested refactor
- max_retries = 10 - delay = 0.2 # Start with 200ms + max_retries = MAX_RETRIES + delay = INITIAL_RETRY_DELAY # Start with configured delay @@ - print( - f"✅ Conversation found in list after {attempt + 1} attempts " - f"(waited {sum(0.2 * (2 ** i) for i in range(attempt)):.2f}s)" - ) + print( + f"✅ Conversation found in list after {attempt + 1} attempts " + f"(waited {sum(min(INITIAL_RETRY_DELAY * (2 ** i), MAX_RETRY_DELAY) for i in range(attempt)):.2f}s)" + ) @@ - delay = min(delay * 2, 2.0) # Exponential backoff, cap at 2s + delay = min(delay * 2, MAX_RETRY_DELAY) # Exponential backoff, cap at configured max @@ - context.response = requests.get(url, headers=headers, timeout=10) + context.response = requests.get( + url, headers=headers, timeout=DEFAULT_TIMEOUT + )tests/e2e/configuration/server-mode/lightspeed-stack-mcp.yaml (1)
51-51: Add newline at end of file.POSIX convention requires files to end with a newline. This avoids issues with tools that expect it.
src/app/endpoints/streaming_query_v2.py (1)
337-369: Background cleanup design is sound, but the 0.5s delay is a magic number.The detached background task pattern correctly avoids MCP session cancellation issues. However, the hardcoded
0.5second delay is undocumented beyond inline comments. Consider extracting this to a named constant for clarity and easier adjustment.♻️ Optional: Extract delay to constant
+# Delay before background cleanup to allow MCP sessions to close +MCP_CLEANUP_DELAY_SECONDS = 0.5 + async def cleanup_task() -> None: """Background cleanup after streaming response is sent.""" # Small delay to ensure response is fully sent and MCP cleanup completes - await asyncio.sleep(0.5) + await asyncio.sleep(MCP_CLEANUP_DELAY_SECONDS)src/app/endpoints/query.py (2)
101-102: Use%formatting instead of f-strings in logger calls.F-strings are evaluated immediately, while
%formatting with logger methods enables lazy evaluation—the string is only formatted if the log level is enabled. This is a minor performance optimization.♻️ Suggested change
logger.debug( - f"Background task created, active tasks: {len(background_tasks_set)}" + "Background task created, active tasks: %d", len(background_tasks_set) ) except Exception as e: - logger.error(f"Failed to create background task: {e}", exc_info=True) + logger.error("Failed to create background task: %s", e, exc_info=True)
80-105: Consider extracting sharedcreate_background_taskto a utility module.This function is duplicated in both
query.pyandstreaming_query_v2.py. Extract to a shared utility (e.g.,utils/background_tasks.py) to maintain a single source of truth.src/app/main.py (1)
193-288: Consider using Route.matches for robust path matching.
The manual segment matcher won’t handle mounts or path converters (e.g.,{path:path}), which can silently drop metrics for valid routes. Using Starlette’s route matching avoids these false negatives and keeps labels stable.♻️ Suggested refactor using Starlette route matching
-from starlette.routing import Mount, Route, WebSocketRoute +from starlette.routing import Mount, Route, WebSocketRoute, Match @@ - def __init__( # pylint: disable=redefined-outer-name - self, app: Any, app_routes_paths: list[str] - ) -> None: + def __init__( # pylint: disable=redefined-outer-name + self, app: Any, app_routes: list[Route] + ) -> None: @@ - self.app_routes_paths = app_routes_paths + self.app_routes = app_routes @@ - route_path = None - for route_template in self.app_routes_paths: - ... - if route_path is None: + matched_route = next( + (r for r in self.app_routes if r.matches(scope)[0] == Match.FULL), + None, + ) + if matched_route is None: await self.app(scope, receive, send) return + route_path = matched_route.path @@ -app_routes_paths = [ - route.path - for route in app.routes - if isinstance(route, (Mount, Route, WebSocketRoute)) -] +app_routes = [ + route + for route in app.routes + if isinstance(route, (Mount, Route, WebSocketRoute)) +] @@ -app.add_middleware(MetricsMiddleware, app_routes_paths=app_routes_paths) +app.add_middleware(MetricsMiddleware, app_routes=app_routes)
src/app/endpoints/query.py
Outdated
| """Handler for REST API call to provide answer to query.""" | ||
|
|
||
| import asyncio | ||
| import ast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n src/app/endpoints/query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 32894
🏁 Script executed:
rg -n '\bast\.' src/app/endpoints/query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
rg -n '\bast\b' src/app/endpoints/query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 88
Remove unused ast import at line 4.
The ast module is imported but never used in this file. Remove it to keep imports clean.
🤖 Prompt for AI Agents
In `@src/app/endpoints/query.py` at line 4, Remove the unused import of the ast
module from the top of src/app/endpoints/query.py by deleting the "import ast"
line; update the import block in that file (e.g., near other imports in the
module) so only actually used modules remain (no changes to functions/classes
like any query handler functions are required).
src/app/endpoints/query.py
Outdated
| RateLimitError, # type: ignore | ||
| ) | ||
| from llama_stack_client.types.model_list_response import ModelListResponse | ||
| rom llama_stack_client.types.shared.interleaved_content_item import TextContentItem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical syntax error: rom should be from.
This typo will cause a SyntaxError at module load time, preventing the application from starting.
🐛 Fix the import statement
-rom llama_stack_client.types.shared.interleaved_content_item import TextContentItem
+from llama_stack_client.types.shared.interleaved_content_item import TextContentItem📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rom llama_stack_client.types.shared.interleaved_content_item import TextContentItem | |
| from llama_stack_client.types.shared.interleaved_content_item import TextContentItem |
🤖 Prompt for AI Agents
In `@src/app/endpoints/query.py` at line 17, There's a syntax typo in the import
statement for TextContentItem: change the malformed "rom
llama_stack_client.types.shared.interleaved_content_item import TextContentItem"
to a proper "from ..." import so the module can load; locate the import that
references TextContentItem in the file (query.py) and correct the keyword to
"from" to fix the SyntaxError.
d8224fe to
7307578
Compare
Description
This PR achieves 3 main things:
This PR turned out much larger then expected because of 2 issues
Root Cause 1 - Streaming Response Bug: Llama Stack's MCP tool execution uses streaming responses (Server-Sent Events), which exposed critical bugs in FastAPI's BaseHTTPMiddleware - specifically the RuntimeError: "No response returned" error that occurs when middleware tries to handle streaming endpoints.
Root Cause 2 - MCP Cleanup & Connection Management: MCP server connections and LLM streaming calls need to be properly closed AFTER the response is fully streamed, but we also need to persist conversation data to the database without blocking the stream or delaying the client.
The Fix: Required a complete architectural change:
In short: MCP streaming responses + required cleanup + database persistence = complete architectural change from decorator-based to ASGI middleware + async background tasks for DB writes and MCP cleanup.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Improvements
Tests