Safety shield config #1098

JslYoon · 2026-02-03T16:53:50Z

Description

As a developer using Lightspeed Core Stack, I want to specify which safety shields to apply per query, so that I can use different shield configurations for different use cases without restarting the service.

Updated query v1, query v2, and streaming query to take in an extra parameter to identify which shields to apply. If the request parameter is not provided, it will apply all safety shields (same as before).

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude code

Related Tickets & Documents

Related Issue #: LCORE-1259
Closes #: LCORE-1259

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Users can specify which safety shields to apply per query: select specific shields, use all defaults, or disable shields by sending an empty list.
Behavior Change
- Moderation now respects per-query shield selection when evaluating input.
Tests
- Added unit tests covering shield selection, bypass, and invalid-shield handling.

Signed-off-by: Lucas <lyoon@redhat.com>

coderabbitai · 2026-02-03T16:54:06Z

Walkthrough

An optional shield_ids field was added to the QueryRequest model and threaded through A2A, query, and streaming endpoints into run_shield_moderation, which now accepts and applies shield ID filtering (including empty-list and invalid-ID handling). Tests for the new behavior were added.

Changes

Cohort / File(s)	Summary
Request Model `src/models/requests.py`	Add optional `shield_ids: Optional[list[str]]` to `QueryRequest` with docs/examples (None = all shields, [] = skip all, list = run specified shields).
Shield Moderation Utility `src/utils/shields.py`	Update `run_shield_moderation` signature to accept `shield_ids`. Implement filtering: empty list → skip all (return unblocked), None → run all, list → filter available shields, warn about unknown IDs, raise 422 if none matched. Preserve existing error/violation handling.
Endpoint Integration `src/app/endpoints/a2a.py`, `src/app/endpoints/query_v2.py`, `src/app/endpoints/streaming_query_v2.py`	Thread `shield_ids` into `QueryRequest` construction and pass `query_request.shield_ids` into `run_shield_moderation` calls. Minor routing/metadata plumbing only.
Tests `tests/unit/utils/test_shields.py`	Add async tests covering: empty-list skips shields, invalid shield_ids → 422 UnprocessableEntity, and filtering to a specific shield triggers expected moderation call.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Endpoint
  participant ShieldsUtil as run_shield_moderation
  participant ShieldsAPI as Shields Service / List

  Client->>Endpoint: Send query (may include shield_ids)
  Endpoint->>Endpoint: Build QueryRequest (includes shield_ids)
  Endpoint->>ShieldsUtil: run_shield_moderation(input_text, shield_ids)
  ShieldsUtil->>ShieldsAPI: list available shields
  ShieldsAPI-->>ShieldsUtil: return shields
  ShieldsUtil->>ShieldsUtil: filter shields by shield_ids (None/all, []/skip, list/filter)
  ShieldsUtil->>ShieldsAPI: run moderation on selected shield(s)
  ShieldsAPI-->>ShieldsUtil: moderation result
  ShieldsUtil-->>Endpoint: moderation result (blocked / unblocked / error)
  Endpoint-->>Client: return response (respecting moderation outcome)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

LCORE-849: Shields implementation using Moderations API #979 — Adds shield_ids to QueryRequest and threads it into run_shield_moderation, directly related to the shield ID handling and endpoint integration in this change.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Safety shield config' is vague and generic. While it broadly relates to the shield configuration feature, it lacks specificity about the key change: enabling per-query shield specification. The title uses non-descriptive phrasing that doesn't clearly convey what was changed or why.	Consider a more specific title such as 'Add per-query safety shield configuration' or 'Enable runtime shield selection via query parameters' to clearly communicate the main change.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/utils/shields.py`:
- Around line 63-104: In run_shield_moderation, when a non-None shield_ids list
is provided but no shields match (shields_to_run is empty), fail fast instead of
silently skipping: after computing shields_to_run and the missing set, if
shields_to_run is empty raise an HTTPException (or appropriate client error)
indicating the requested shield IDs were not found so callers cannot bypass
moderation due to typos/misconfiguration; keep the existing logger.warning but
follow it with the exception throw using the function name run_shield_moderation
and the shield_ids/shields_to_run identifiers to locate the change.

src/utils/shields.py

Signed-off-by: Lucas <lyoon@redhat.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/utils/shields.py (1)
115-158: ⚠️ Potential issue | 🟠 Major

Add APIConnectionError handling to all Llama Stack API calls.

Network failures from shields.list(), models.list(), or moderations.create() currently bubble as 500s. Import APIConnectionError and handle connection errors consistently (map to HTTP 503) following the established pattern:
Recommended fix
-from llama_stack_client import AsyncLlamaStackClient, BadRequestError
+from llama_stack_client import APIConnectionError, AsyncLlamaStackClient, BadRequestError
+from models.responses import ServiceUnavailableResponse
Then add this exception handler after the existing BadRequestError/ValueError handler:
         except (BadRequestError, ValueError):
             logger.warning(
                 "Shield '%s' violation detected, treating as blocked",
                 shield.identifier,
             )
             metrics.llm_calls_validation_errors_total.inc()
             return ShieldModerationResult(
                 blocked=True,
                 message=DEFAULT_VIOLATION_MESSAGE,
                 shield_model=shield.provider_resource_id,
             )
+        except APIConnectionError as e:
+            logger.error("Unable to connect to Llama Stack: %s", e)
+            response = ServiceUnavailableResponse(backend_name="Llama Stack", cause=str(e))
+            raise HTTPException(**response.model_dump()) from e
This matches the error handling pattern established in src/app/endpoints/shields.py and other endpoint modules.

🤖 Fix all issues with AI agents

In `@src/utils/shields.py`:
- Around line 68-85: The docstring for run_shield_moderation is missing
documentation of the new HTTP 422 error path for invalid shield_ids; update the
Google-style docstring for run_shield_moderation to include in the Raises
section that an HTTPException with status 422 is raised when shield_ids is
invalid (e.g., contains unknown IDs or is malformed), and ensure the Parameters,
Returns, and Raises sections are present and concise per Google Python docstring
conventions, referencing the ShieldModerationResult return type and the existing
500/404 or other HTTPException cases for missing provider_resource_id or
model-not-found.

coderabbitai · 2026-02-03T18:22:30Z

src/utils/shields.py

    """
    Run shield moderation on input text.

-    Iterates through all configured shields and runs moderation checks.
+    Iterates through configured shields and runs moderation checks.
    Raises HTTPException if shield model is not found.

    Parameters:
        client: The Llama Stack client.
        input_text: The text to moderate.
+        shield_ids: Optional list of shield IDs to use. If None, uses all shields.
+                   If empty list, skips all shields.

    Returns:
        ShieldModerationResult: Result indicating if content was blocked and the message.

    Raises:
        HTTPException: If shield's provider_resource_id is not configured or model not found.
    """


⚠️ Potential issue | 🟡 Minor

Document the new 422 raise path in the docstring.
The function now raises HTTP 422 for invalid shield_ids, but the Raises section only mentions missing models.

✏️ Proposed docstring update

Raises: - HTTPException: If shield's provider_resource_id is not configured or model not found. + HTTPException: If shield's provider_resource_id is not configured, model not found, + or the requested shield_ids do not match any available shields.

As per coding guidelines, Follow Google Python docstring conventions including Parameters, Returns, Raises, and Attributes sections as needed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"""

Run shield moderation on input text.

Iterates through all configured shields and runs moderation checks.

Iterates through configured shields and runs moderation checks.

Raises HTTPException if shield model is not found.

Parameters:

client: The Llama Stack client.

input_text: The text to moderate.

shield_ids: Optional list of shield IDs to use. If None, uses all shields.

If empty list, skips all shields.

Returns:

ShieldModerationResult: Result indicating if content was blocked and the message.

Raises:

HTTPException: If shield's provider_resource_id is not configured or model not found.

"""

"""

Run shield moderation on input text.

Iterates through configured shields and runs moderation checks.

Raises HTTPException if shield model is not found.

Parameters:

client: The Llama Stack client.

input_text: The text to moderate.

shield_ids: Optional list of shield IDs to use. If None, uses all shields.

If empty list, skips all shields.

Returns:

ShieldModerationResult: Result indicating if content was blocked and the message.

Raises:

HTTPException: If shield's provider_resource_id is not configured, model not found,

or the requested shield_ids do not match any available shields.

"""

🤖 Prompt for AI Agents

In `@src/utils/shields.py` around lines 68 - 85, The docstring for run_shield_moderation is missing documentation of the new HTTP 422 error path for invalid shield_ids; update the Google-style docstring for run_shield_moderation to include in the Raises section that an HTTPException with status 422 is raised when shield_ids is invalid (e.g., contains unknown IDs or is malformed), and ensure the Parameters, Returns, and Raises sections are present and concise per Google Python docstring conventions, referencing the ShieldModerationResult return type and the existing 500/404 or other HTTPException cases for missing provider_resource_id or model-not-found.

Safety shield config

097d039

Signed-off-by: Lucas <lyoon@redhat.com>

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

src/utils/shields.py Show resolved Hide resolved

Typos for shield fail gracefully. Added test suite

71b5bad

Signed-off-by: Lucas <lyoon@redhat.com>

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

JslYoon closed this Feb 3, 2026

JslYoon reopened this Feb 3, 2026

JslYoon closed this Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety shield config #1098

Safety shield config #1098

JslYoon commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Safety shield config #1098

Safety shield config #1098

Conversation

JslYoon commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JslYoon commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading