-
Notifications
You must be signed in to change notification settings - Fork 70
Safety shield config #1098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safety shield config #1098
Conversation
Signed-off-by: Lucas <lyoon@redhat.com>
WalkthroughAn optional Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Endpoint
participant ShieldsUtil as run_shield_moderation
participant ShieldsAPI as Shields Service / List
Client->>Endpoint: Send query (may include shield_ids)
Endpoint->>Endpoint: Build QueryRequest (includes shield_ids)
Endpoint->>ShieldsUtil: run_shield_moderation(input_text, shield_ids)
ShieldsUtil->>ShieldsAPI: list available shields
ShieldsAPI-->>ShieldsUtil: return shields
ShieldsUtil->>ShieldsUtil: filter shields by shield_ids (None/all, []/skip, list/filter)
ShieldsUtil->>ShieldsAPI: run moderation on selected shield(s)
ShieldsAPI-->>ShieldsUtil: moderation result
ShieldsUtil-->>Endpoint: moderation result (blocked / unblocked / error)
Endpoint-->>Client: return response (respecting moderation outcome)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/utils/shields.py`:
- Around line 63-104: In run_shield_moderation, when a non-None shield_ids list
is provided but no shields match (shields_to_run is empty), fail fast instead of
silently skipping: after computing shields_to_run and the missing set, if
shields_to_run is empty raise an HTTPException (or appropriate client error)
indicating the requested shield IDs were not found so callers cannot bypass
moderation due to typos/misconfiguration; keep the existing logger.warning but
follow it with the exception throw using the function name run_shield_moderation
and the shield_ids/shields_to_run identifiers to locate the change.
Signed-off-by: Lucas <lyoon@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/shields.py (1)
115-158:⚠️ Potential issue | 🟠 MajorAdd
APIConnectionErrorhandling to all Llama Stack API calls.Network failures from
shields.list(),models.list(), ormoderations.create()currently bubble as 500s. ImportAPIConnectionErrorand handle connection errors consistently (map to HTTP 503) following the established pattern:Recommended fix
-from llama_stack_client import AsyncLlamaStackClient, BadRequestError +from llama_stack_client import APIConnectionError, AsyncLlamaStackClient, BadRequestError +from models.responses import ServiceUnavailableResponseThen add this exception handler after the existing
BadRequestError/ValueErrorhandler:except (BadRequestError, ValueError): logger.warning( "Shield '%s' violation detected, treating as blocked", shield.identifier, ) metrics.llm_calls_validation_errors_total.inc() return ShieldModerationResult( blocked=True, message=DEFAULT_VIOLATION_MESSAGE, shield_model=shield.provider_resource_id, ) + except APIConnectionError as e: + logger.error("Unable to connect to Llama Stack: %s", e) + response = ServiceUnavailableResponse(backend_name="Llama Stack", cause=str(e)) + raise HTTPException(**response.model_dump()) from eThis matches the error handling pattern established in
src/app/endpoints/shields.pyand other endpoint modules.
🤖 Fix all issues with AI agents
In `@src/utils/shields.py`:
- Around line 68-85: The docstring for run_shield_moderation is missing
documentation of the new HTTP 422 error path for invalid shield_ids; update the
Google-style docstring for run_shield_moderation to include in the Raises
section that an HTTPException with status 422 is raised when shield_ids is
invalid (e.g., contains unknown IDs or is malformed), and ensure the Parameters,
Returns, and Raises sections are present and concise per Google Python docstring
conventions, referencing the ShieldModerationResult return type and the existing
500/404 or other HTTPException cases for missing provider_resource_id or
model-not-found.
| """ | ||
| Run shield moderation on input text. | ||
|
|
||
| Iterates through all configured shields and runs moderation checks. | ||
| Iterates through configured shields and runs moderation checks. | ||
| Raises HTTPException if shield model is not found. | ||
|
|
||
| Parameters: | ||
| client: The Llama Stack client. | ||
| input_text: The text to moderate. | ||
| shield_ids: Optional list of shield IDs to use. If None, uses all shields. | ||
| If empty list, skips all shields. | ||
|
|
||
| Returns: | ||
| ShieldModerationResult: Result indicating if content was blocked and the message. | ||
|
|
||
| Raises: | ||
| HTTPException: If shield's provider_resource_id is not configured or model not found. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document the new 422 raise path in the docstring.
The function now raises HTTP 422 for invalid shield_ids, but the Raises section only mentions missing models.
✏️ Proposed docstring update
Raises:
- HTTPException: If shield's provider_resource_id is not configured or model not found.
+ HTTPException: If shield's provider_resource_id is not configured, model not found,
+ or the requested shield_ids do not match any available shields.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """ | |
| Run shield moderation on input text. | |
| Iterates through all configured shields and runs moderation checks. | |
| Iterates through configured shields and runs moderation checks. | |
| Raises HTTPException if shield model is not found. | |
| Parameters: | |
| client: The Llama Stack client. | |
| input_text: The text to moderate. | |
| shield_ids: Optional list of shield IDs to use. If None, uses all shields. | |
| If empty list, skips all shields. | |
| Returns: | |
| ShieldModerationResult: Result indicating if content was blocked and the message. | |
| Raises: | |
| HTTPException: If shield's provider_resource_id is not configured or model not found. | |
| """ | |
| """ | |
| Run shield moderation on input text. | |
| Iterates through configured shields and runs moderation checks. | |
| Raises HTTPException if shield model is not found. | |
| Parameters: | |
| client: The Llama Stack client. | |
| input_text: The text to moderate. | |
| shield_ids: Optional list of shield IDs to use. If None, uses all shields. | |
| If empty list, skips all shields. | |
| Returns: | |
| ShieldModerationResult: Result indicating if content was blocked and the message. | |
| Raises: | |
| HTTPException: If shield's provider_resource_id is not configured, model not found, | |
| or the requested shield_ids do not match any available shields. | |
| """ |
🤖 Prompt for AI Agents
In `@src/utils/shields.py` around lines 68 - 85, The docstring for
run_shield_moderation is missing documentation of the new HTTP 422 error path
for invalid shield_ids; update the Google-style docstring for
run_shield_moderation to include in the Raises section that an HTTPException
with status 422 is raised when shield_ids is invalid (e.g., contains unknown IDs
or is malformed), and ensure the Parameters, Returns, and Raises sections are
present and concise per Google Python docstring conventions, referencing the
ShieldModerationResult return type and the existing 500/404 or other
HTTPException cases for missing provider_resource_id or model-not-found.
Description
As a developer using Lightspeed Core Stack, I want to specify which safety shields to apply per query, so that I can use different shield configurations for different use cases without restarting the service.
Updated query v1, query v2, and streaming query to take in an extra parameter to identify which shields to apply. If the request parameter is not provided, it will apply all safety shields (same as before).
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit