Skip to content

Conversation

@are-ces
Copy link
Contributor

@are-ces are-ces commented Jan 28, 2026

Description

  • now extracting doc_title, doc_url from the "attributes" in response
  • removed chunk metadata extraction from results which are not "type": "file_search_call"
  • deleted deprecated method in old query endpoint
  • updated the BYOK config generation
  • updated documentation for BYOK and RAG

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

  • Related Issue # LCORE-1179
  • Closes # LCORE-1179

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

I have registered a vector store, called /v1/query and /v1/streaming_query endpoints, making sure that the referenced_docs field is well structured.

E.g.

"referenced_documents":` [
    {
      "doc_url": "https://www.redhat.com/openshift_docs/virt/about_virt/about-virt.txt",
      "doc_title": "About OpenShift Virtualization",
    },

Summary by CodeRabbit

  • New Features

    • Referenced-document extraction now prefers document titles and improved URL resolution (doc_url, docs_url, url, link); deduplication uses title+URL instead of filenames.
    • Centralized configuration: added storage.backends and registered_resources (vector_stores, models); BYOK RAG wiring now populates storage, providers, vector stores, and model registrations.
    • Inference endpoint accepts additional request context inputs for richer integrations.
  • Bug Fixes

    • A2A endpoint mapping corrected to use POST.
  • Documentation

    • Docs/OpenAPI updated: vector_db_id renamed to vector_store_id and new storage/registered_resources schema documented.

@are-ces are-ces requested review from asimurka and tisnik January 28, 2026 15:57
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 28, 2026

Warning

Rate limit exceeded

@are-ces has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Removed text-item metadata parsing from query endpoints; changed referenced-document extraction in query_v2 to prefer doc_url/docs_url/url/link, extract doc_title, and dedupe by (doc_url, doc_title); renamed and expanded BYOK RAG config wiring to centralized storage/backends and registered_resources with new helper constructors; tests and docs updated accordingly.

Changes

Cohort / File(s) Summary
Query endpoints
src/app/endpoints/query.py, src/app/endpoints/query_v2.py
Removed parse_metadata_from_text_item and all TextContentItem usage from query.py. In query_v2.py, updated referenced-document parsing: URL precedence doc_url→docs_url→url→link, extract doc_title from attributes, eligibility based on doc_title or doc_url, dedupe by (final_url, doc_title), and construct ReferencedDocument with doc_url/doc_title. Removed filename/annotation branches.
Models & responses
src/models/responses.py, src/models/config.py
Minor import reorder in responses.py. Updated ByokRag.vector_db_id description wording to "Vector database identification." in config.py.
BYOK / Llama stack config
src/llama_stack_configuration.py
Renamed construct_vector_dbs_sectionconstruct_storage_backends_section; added construct_models_section, construct_vector_stores_section, and expanded construct_vector_io_providers_section. enrich_byok_rag now populates storage.backends, providers.vector_io, registered_resources.vector_stores, and registered_resources.models.
Tests — endpoints
tests/unit/app/endpoints/test_query.py, tests/unit/app/endpoints/test_query_v2.py
Removed tests/imports referencing parse_metadata_from_text_item and TextContentItem. Updated test_query_v2 fixtures/assertions to use docs_url, title, document_id attributes and new referenced-doc expectations (reduced referenced docs and added RAG chunk content/source/score assertions).
Tests — config & cache
tests/unit/cache/test_postgres_cache.py, tests/unit/test_llama_stack_configuration.py
Import adjustments in Postgres cache test. Config tests updated to validate new vector_stores, storage.backends, providers.vector_io, and registered_resources.models wiring and ID/name mappings.
YAML examples & CI configs
tests/configuration/run.yaml, examples/run.yaml, run.yaml, tests/e2e/configs/run-ci.yaml
Removed top-level conversations_store, inference_store, metadata_store. Replaced vector_dbs with vector_stores under registered_resources. Introduced storage.backends/stores and adjusted provider persistence and vector_io persistence/backends.
Docs / OpenAPI / Guides
docs/openapi.json, docs/openapi.md, docs/byok_guide.md, docs/rag_guide.md
Renamed vector_db_idvector_store_id and updated descriptions; extended /v1/infer to include request and background_tasks; updated A2A operationIds to POST variants; reworked BYOK/RAG docs to central storage/backends and registered_resources model.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • tisnik
  • radofuchs
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: updating metadata extraction logic to extract doc_title and doc_url from chunk attributes instead of filenames, which is reflected in the substantial refactoring across query endpoints and configuration files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/query_v2.py (1)

547-588: Include doc_id in de-duplication to avoid dropping distinct documents.
If two documents share title/URL (or URL is missing) but have different doc_id values, the current (final_url, doc_title) key will collapse them, which undermines the new doc_id linkage.

✅ Suggested fix
-    # Use a set to track unique documents by (doc_url, doc_title) tuple
-    seen_docs: set[tuple[Optional[str], Optional[str]]] = set()
+    # Use a set to track unique documents by (doc_id, doc_url, doc_title) tuple
+    seen_docs: set[tuple[Optional[str], Optional[str], Optional[str]]] = set()
@@
-                if (final_url, doc_title) not in seen_docs:
+                unique_key = (doc_id, final_url, doc_title)
+                if unique_key not in seen_docs:
                     documents.append(
                         ReferencedDocument(
                             doc_url=final_url, doc_title=doc_title, doc_id=doc_id
                         )
                     )
-                    seen_docs.add((final_url, doc_title))
+                    seen_docs.add(unique_key)

@are-ces are-ces requested a review from onmete January 28, 2026 16:15
)
seen_docs.add((final_url, filename))

# 2. Parse from message content annotations
Copy link
Contributor

@asimurka asimurka Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for dropping this section?
Check also OpenAIResponseOutputMessageContent type that can be part of OpenAIResponseMessage content. Isn't this also relevant? More specifically, OpenAIResponseAnnotations object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already get the references to the chunks used for the response.

Annotations is useful if we want to determine exactly which files the model has used to give the answer (determined by the model) and at which position in the text.

Personally I think that we can use this data, but right now it is not needed. Because the scope of referenced_docs is to keep data of all chunks retrieved and used as input for reference. The data that we get from citation is just redundant since it is a small subset of these docs.

I think we can make use of citations in a separate feature, after agreement across our customers since it adds a new functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for handling "annotations" separately

Copy link
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@are-ces are-ces marked this pull request as draft January 30, 2026 09:31
@are-ces are-ces force-pushed the fix-metadata branch 3 times, most recently from 88ed424 to a9ccff5 Compare January 30, 2026 12:09
@are-ces are-ces marked this pull request as ready for review January 30, 2026 12:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@docs/openapi.md`:
- Around line 5324-5328: The ReferencedDocument markdown table pipes are
misaligned with the header row causing MD060; fix by aligning the pipe
separators vertically and ensuring the header separator row matches the number
of columns and header styles—update the table containing the fields doc_url,
doc_title, and doc_id so the header row "| Field | Type | Description |" and the
separator row (the row of dashes) have matching columns and properly placed
pipes, and then ensure each data row (the lines with doc_url, doc_title, doc_id)
uses the same pipe alignment and column count.

In `@src/app/endpoints/query_v2.py`:
- Around line 561-566: The code sets attributes = result.get("attributes", {})
or uses getattr(result, "attributes", {}), but if the key/attribute exists with
value None the default isn't applied and attributes becomes None causing
attributes.get(...) to raise AttributeError; change the assignment so attributes
is always a dict by applying the "or {}" pattern after both retrievals (i.e.,
ensure attributes = result.get("attributes", {}) or {} and attributes =
getattr(result, "attributes", {}) or {}), referencing the variables result and
attributes and the subsequent attributes.get(...) usage to locate where to apply
the fix.
- Around line 575-586: The current filter only appends ReferencedDocument when
doc_title or doc_url exists, which drops entries that only have doc_id; change
the if-condition to check any of doc_title, doc_url, or doc_id so documents with
only a document_id are preserved. Keep the existing final_url normalization
(treat empty string as None), use the same dedupe tuple (final_url, doc_title,
doc_id) against seen_docs, and append ReferencedDocument(doc_url=final_url,
doc_title=doc_title, doc_id=doc_id) when that tuple is not in seen_docs.
🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)

503-507: Consider using doc_id as RAGChunk.source (with filename fallback).

If clients are expected to join rag_chunks with referenced_documents via doc_id, source should carry that ID.

🔄 Suggested linkage-friendly source
-            rag_chunk = RAGChunk(
-                content=result.text, source=result.filename, score=result.score
-            )
+            attributes = getattr(result, "attributes", None) or {}
+            doc_id = None
+            if isinstance(attributes, dict):
+                doc_id = attributes.get("document_id") or attributes.get("doc_id")
+            source = doc_id or result.filename
+            rag_chunk = RAGChunk(content=result.text, source=source, score=result.score)

Comment on lines 575 to 583
doc_title = attributes.get("title")
doc_id = attributes.get("document_id") or attributes.get("doc_id")

# If we have at least a filename or url
if filename or doc_url:
if doc_title or doc_url:
# Treat empty string as None for URL to satisfy Optional[AnyUrl]
final_url = doc_url if doc_url else None
if (final_url, filename) not in seen_docs:
if (final_url, doc_title, doc_id) not in seen_docs:
documents.append(
ReferencedDocument(doc_url=final_url, doc_title=filename)
)
seen_docs.add((final_url, filename))

# 2. Parse from message content annotations
elif item_type == "message":
content = getattr(output_item, "content", None)
if isinstance(content, list):
for part in content:
# Skip if part is a string or doesn't have annotations
if isinstance(part, str):
continue

annotations = getattr(part, "annotations", []) or []
for annotation in annotations:
# Handle both object and dict access for annotations
if isinstance(annotation, dict):
anno_type = annotation.get("type")
anno_url = annotation.get("url")
anno_title = annotation.get("title") or annotation.get(
"filename"
)
else:
anno_type = getattr(annotation, "type", None)
anno_url = getattr(annotation, "url", None)
anno_title = getattr(annotation, "title", None) or getattr(
annotation, "filename", None
ReferencedDocument(
doc_url=final_url, doc_title=doc_title, doc_id=doc_id
)

if anno_type == "url_citation":
# Treat empty string as None
final_url = anno_url if anno_url else None
if (final_url, anno_title) not in seen_docs:
documents.append(
ReferencedDocument(
doc_url=final_url, doc_title=anno_title
)
)
seen_docs.add((final_url, anno_title))

elif anno_type == "file_citation":
if (None, anno_title) not in seen_docs:
documents.append(
ReferencedDocument(
doc_url=None, doc_title=anno_title
)
)
seen_docs.add((None, anno_title))
)
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code around lines 575-586 in query_v2.py
cat -n src/app/endpoints/query_v2.py | sed -n '570,590p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 1147


🏁 Script executed:

# Let's also check the ReferencedDocument class definition to understand its structure
rg -A 10 'class ReferencedDocument' src/

Repository: lightspeed-core/lightspeed-stack

Length of output: 708


🏁 Script executed:

# Let's see more context around this code section to understand the full logic
cat -n src/app/endpoints/query_v2.py | sed -n '550,600p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 2444


🏁 Script executed:

# Get full ReferencedDocument class definition
cat -n src/models/responses.py | grep -A 20 'class ReferencedDocument'

Repository: lightspeed-core/lightspeed-stack

Length of output: 840


🏁 Script executed:

# Search for where ReferencedDocument is used to understand the intent
rg 'ReferencedDocument' src/ --type py -B 2 -A 2

Repository: lightspeed-core/lightspeed-stack

Length of output: 10260


🏁 Script executed:

# Check if there are any tests that show expected behavior with doc_id-only documents
fd 'test.*query' src/ --type py | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 245


Include doc_id-only documents to prevent data loss.

Documents with only document_id but no title or url are currently filtered out. Since doc_id is explicitly extracted and used in deduplication, it should be preserved. Update the condition to include doc_id:

Suggested fix
-                if doc_title or doc_url:
+                if doc_title or doc_url or doc_id:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
doc_title = attributes.get("title")
doc_id = attributes.get("document_id") or attributes.get("doc_id")
# If we have at least a filename or url
if filename or doc_url:
if doc_title or doc_url:
# Treat empty string as None for URL to satisfy Optional[AnyUrl]
final_url = doc_url if doc_url else None
if (final_url, filename) not in seen_docs:
if (final_url, doc_title, doc_id) not in seen_docs:
documents.append(
ReferencedDocument(doc_url=final_url, doc_title=filename)
)
seen_docs.add((final_url, filename))
# 2. Parse from message content annotations
elif item_type == "message":
content = getattr(output_item, "content", None)
if isinstance(content, list):
for part in content:
# Skip if part is a string or doesn't have annotations
if isinstance(part, str):
continue
annotations = getattr(part, "annotations", []) or []
for annotation in annotations:
# Handle both object and dict access for annotations
if isinstance(annotation, dict):
anno_type = annotation.get("type")
anno_url = annotation.get("url")
anno_title = annotation.get("title") or annotation.get(
"filename"
)
else:
anno_type = getattr(annotation, "type", None)
anno_url = getattr(annotation, "url", None)
anno_title = getattr(annotation, "title", None) or getattr(
annotation, "filename", None
ReferencedDocument(
doc_url=final_url, doc_title=doc_title, doc_id=doc_id
)
if anno_type == "url_citation":
# Treat empty string as None
final_url = anno_url if anno_url else None
if (final_url, anno_title) not in seen_docs:
documents.append(
ReferencedDocument(
doc_url=final_url, doc_title=anno_title
)
)
seen_docs.add((final_url, anno_title))
elif anno_type == "file_citation":
if (None, anno_title) not in seen_docs:
documents.append(
ReferencedDocument(
doc_url=None, doc_title=anno_title
)
)
seen_docs.add((None, anno_title))
)
doc_title = attributes.get("title")
doc_id = attributes.get("document_id") or attributes.get("doc_id")
if doc_title or doc_url or doc_id:
# Treat empty string as None for URL to satisfy Optional[AnyUrl]
final_url = doc_url if doc_url else None
if (final_url, doc_title, doc_id) not in seen_docs:
documents.append(
ReferencedDocument(
doc_url=final_url, doc_title=doc_title, doc_id=doc_id
)
)
🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 575 - 586, The current filter
only appends ReferencedDocument when doc_title or doc_url exists, which drops
entries that only have doc_id; change the if-condition to check any of
doc_title, doc_url, or doc_id so documents with only a document_id are
preserved. Keep the existing final_url normalization (treat empty string as
None), use the same dedupe tuple (final_url, doc_title, doc_id) against
seen_docs, and append ReferencedDocument(doc_url=final_url, doc_title=doc_title,
doc_id=doc_id) when that tuple is not in seen_docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

referenced_docs are used because we need to show in the frontend what document have been referenced, IMO it does not make sense to return empty documents without TItle / Url

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@are-ces, that makes perfect sense! You're right that a document reference with only an ID wouldn't provide meaningful information for the frontend to display. The current condition ensures that only documents with at least a title or URL—something actionable and human-readable—are returned to the UI. Thank you for the clarification!


✏️ Learnings added
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:34.749Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@are-ces are-ces marked this pull request as draft January 30, 2026 13:55
@are-ces are-ces marked this pull request as ready for review January 30, 2026 16:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/openapi.json (1)

4287-4312: ⚠️ Potential issue | 🟠 Major

Ensure unique operationId values for GET vs POST.
Line 4293 and Line 4311 both use handle_a2a_jsonrpc_a2a_post, which can break client codegen (operationId must be unique). Consider distinct IDs per method.

🔧 Suggested fix
-                "operationId": "handle_a2a_jsonrpc_a2a_post",
+                "operationId": "handle_a2a_jsonrpc_a2a_get",
🧹 Nitpick comments (2)
src/llama_stack_configuration.py (1)

310-347: Avoid in-place mutation of ls_config in enrich_byok_rag.
Returning a new mapping reduces side‑effects and aligns with the project guideline.

♻️ Proposed refactor to avoid in-place mutation
-def enrich_byok_rag(ls_config: dict[str, Any], byok_rag: list[dict[str, Any]]) -> None:
+def enrich_byok_rag(ls_config: dict[str, Any], byok_rag: list[dict[str, Any]]) -> dict[str, Any]:
     """Enrich Llama Stack config with BYOK RAG settings.

     Args:
-        ls_config: Llama Stack configuration dict (modified in place)
+        ls_config: Llama Stack configuration dict
         byok_rag: List of BYOK RAG configurations
+    Returns:
+        dict[str, Any]: Enriched configuration mapping.
     """
-    if len(byok_rag) == 0:
-        logger.info("BYOK RAG is not configured: skipping")
-        return
+    if len(byok_rag) == 0:
+        logger.info("BYOK RAG is not configured: skipping")
+        return ls_config
+
+    output = {**ls_config}

     logger.info("Enriching Llama Stack config with BYOK RAG")

     # Add storage backends
-    if "storage" not in ls_config:
-        ls_config["storage"] = {}
-    ls_config["storage"]["backends"] = construct_storage_backends_section(
-        ls_config, byok_rag
-    )
+    storage = {**output.get("storage", {})}
+    storage["backends"] = construct_storage_backends_section(output, byok_rag)
+    output["storage"] = storage

     # Add vector_io providers
-    if "providers" not in ls_config:
-        ls_config["providers"] = {}
-    ls_config["providers"]["vector_io"] = construct_vector_io_providers_section(
-        ls_config, byok_rag
-    )
+    providers = {**output.get("providers", {})}
+    providers["vector_io"] = construct_vector_io_providers_section(output, byok_rag)
+    output["providers"] = providers

     # Add registered vector stores
-    if "registered_resources" not in ls_config:
-        ls_config["registered_resources"] = {}
-    ls_config["registered_resources"]["vector_stores"] = (
-        construct_vector_stores_section(ls_config, byok_rag)
-    )
-
-    # Add embedding models
-    ls_config["registered_resources"]["models"] = construct_models_section(
-        ls_config, byok_rag
-    )
+    registered_resources = {**output.get("registered_resources", {})}
+    registered_resources["vector_stores"] = construct_vector_stores_section(
+        output, byok_rag
+    )
+    registered_resources["models"] = construct_models_section(output, byok_rag)
+    output["registered_resources"] = registered_resources
+
+    return output
🔁 Caller update
-    enrich_byok_rag(ls_config, config.get("byok_rag", []))
+    ls_config = enrich_byok_rag(ls_config, config.get("byok_rag", []))

As per coding guidelines: Avoid in-place parameter modification anti-patterns; instead return new data structures.

docs/openapi.json (1)

8434-8451: Clarify when doc_id will actually appear.
The schema allows doc_id alone, but the API filters out entries lacking doc_title/doc_url. A short note will prevent consumer confusion.

📝 Suggested doc clarification
-                "description": "Model representing a document referenced in generating a response.\n\nAttributes:\n    doc_url: Url to the referenced doc.\n    doc_title: Title of the referenced doc.\n    doc_id: ID of the referenced doc."
+                "description": "Model representing a document referenced in generating a response.\n\nAttributes:\n    doc_url: Url to the referenced doc.\n    doc_title: Title of the referenced doc.\n    doc_id: ID of the referenced doc.\n\nNote: referenced documents are only emitted when doc_title or doc_url is present; doc_id-only entries are filtered."

Based on learnings: In src/app/endpoints/query_v2.py, the parse_referenced_documents_from_responses_api function filters referenced documents to include only those with doc_title or doc_url because these documents are displayed in the frontend. Documents with only doc_id are intentionally excluded as they wouldn't provide useful information to end users.

Copy link
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update LGTM

@are-ces are-ces requested review from onmete and removed request for onmete February 2, 2026 08:08
or attributes.get("url")
or attributes.get("doc_url")
or attributes.get("link")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also implements attribute key priority, worth documenting it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rag-content tool creates a docs_url field. Users should not care about this, they should just care about creating the vector store via the tool and register the BYOK store in LCORE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you can't connect an arbitrary index to LCORE? If you have one, you need to recreate it via the rag-content tool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can but we don't support that. AFAIK We support only creation of rag dbs from rag-content

)
seen_docs.add((final_url, filename))

# 2. Parse from message content annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for handling "annotations" separately

@are-ces are-ces force-pushed the fix-metadata branch 2 times, most recently from f715457 to c259964 Compare February 2, 2026 09:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
examples/run.yaml (1)

111-112: ⚠️ Potential issue | 🟡 Minor

Duplicate telemetry: key in YAML configuration.

The file contains two telemetry: sections at lines 111-112 and 167-168. In YAML, duplicate keys cause the second occurrence to silently override the first. This results in telemetry.enabled: true (line 168) overwriting the earlier definition (line 112).

If both are intentional, consolidate them into a single telemetry: block.

🔧 Proposed fix: remove duplicate telemetry block
 safety:
   default_shield_id: llama-guard
-telemetry:
-  enabled: true

Also applies to: 167-168

docs/openapi.json (1)

4287-4312: ⚠️ Potential issue | 🟠 Major

OperationId must be unique across GET/POST.
Line 4293 and Line 4311 now share the same operationId, which breaks OpenAPI uniqueness guarantees and can confuse client generators.

🛠️ Suggested fix (keep GET unique)
-                "operationId": "handle_a2a_jsonrpc_a2a_post",
+                "operationId": "handle_a2a_jsonrpc_a2a_get",
🤖 Fix all issues with AI agents
In `@docs/openapi.json`:
- Around line 8434-8451: Update the OpenAPI docs to clarify that doc_id alone is
supplemental and that runtime only emits referenced_documents that include
doc_title or doc_url: edit the ReferencedDocument description (and/or add a
note) to state that entries with only doc_id are not emitted by
parse_referenced_documents_from_responses_api and that consumers should expect
either doc_title or doc_url to be present for displayed documents;
alternatively, if you prefer a schema-level guarantee, change the
ReferencedDocument schema to require at least one of doc_title or doc_url (e.g.,
document-level note or a oneOf/anyOf constraint) so the API contract matches
parse_referenced_documents_from_responses_api behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/openapi.json (1)

4287-4312: ⚠️ Potential issue | 🟠 Major

Ensure operationId values are unique across methods.

Both GET and POST for /a2a currently use the same operationId, which breaks OpenAPI uniqueness guarantees and can confuse client generators.

🔧 Suggested fix
 /a2a:
   get:
     ...
-    "operationId": "handle_a2a_jsonrpc_a2a_post"
+    "operationId": "handle_a2a_jsonrpc_a2a_get"
   post:
     ...
     "operationId": "handle_a2a_jsonrpc_a2a_post"
tests/configuration/run.yaml (1)

84-90: ⚠️ Potential issue | 🟠 Major

Remove unsupported telemetry provider block from providers section.

The telemetry provider block at lines 84–90 is not supported in Llama Stack 0.3.x. Telemetry should be configured only via the top-level telemetry.enabled: true setting (already present at line 146), without requiring a provider entry. Remove this block to prevent configuration warnings or failures.

🛠️ Proposed change
-  telemetry:
-  - config:
-      service_name: ''
-      sinks: sqlite
-      sqlite_db_path: .llama/distributions/ollama/trace_store.db
-    provider_id: meta-reference
-    provider_type: inline::meta-reference
🤖 Fix all issues with AI agents
In `@src/llama_stack_configuration.py`:
- Around line 138-145: The loop that builds BYOK backends uses vector_store_id
from each byok_rag entry and currently allows empty or missing values (producing
names like byok__storage and silent overwrites); update the code in this loop
(the byok_rag iteration where backend_name = f"byok_{vector_store_id}_storage"
and output[...] is set) to validate that vector_store_id is present and
non-empty and raise/throw a clear exception (fail fast) if not, and apply the
same validation/early-fail pattern in the other helpers mentioned
(construct_vector_stores_section, construct_vector_io_providers_section,
construct_models_section) so each function checks its vector_store_id (or
equivalent ID fields) before using them to build keys or provider/model entries.
🧹 Nitpick comments (3)
src/llama_stack_configuration.py (1)

323-347: Consider returning a new config instead of mutating ls_config in place.

The function currently rewrites nested sections on the passed-in dict, which makes side effects harder to reason about. Returning a new dict would better isolate changes.

As per coding guidelines, "Avoid in-place parameter modification anti-patterns; instead return new data structures".

docs/openapi.json (2)

5338-5356: Consider noting the rename from vector_db_idvector_store_id.

This is a public schema change; adding a brief migration note in the description can help clients discover the rename without hunting release notes.


8434-8451: Clarify when doc_id is emitted.

The API filters out referenced documents that lack doc_title or doc_url, so doc_id effectively appears only alongside a title or URL. Consider adding a short note to avoid clients expecting doc_id-only entries.

Based on learnings: “In src/app/endpoints/query_v2.py, the parse_referenced_documents_from_responses_api function filters referenced documents to include only those with doc_title or doc_url because these documents are displayed in the frontend. Documents with only doc_id are intentionally excluded as they wouldn't provide useful information to end users.”

Comment on lines 138 to 145
# add new backends for each BYOK RAG
for brag in byok_rag:
vector_store_id = brag.get("vector_store_id", "")
backend_name = f"byok_{vector_store_id}_storage"
output[backend_name] = {
"type": "kv_sqlite",
"db_path": brag.get("db_path", f".llama/{vector_store_id}.db"),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against missing vector_store_id to avoid invalid backend/provider IDs.

If a BYOK entry is missing vector_store_id, the code generates names like byok__storage and can silently overwrite entries. Please validate and fail fast (and mirror the same guard in construct_vector_stores_section, construct_vector_io_providers_section, and construct_models_section).

🛠️ Proposed fix (apply similarly in the other construct_* helpers)
-    for brag in byok_rag:
-        vector_store_id = brag.get("vector_store_id", "")
+    for brag in byok_rag:
+        vector_store_id = brag.get("vector_store_id")
+        if not vector_store_id:
+            raise ValueError("byok_rag entry missing vector_store_id")
         backend_name = f"byok_{vector_store_id}_storage"
         output[backend_name] = {
             "type": "kv_sqlite",
             "db_path": brag.get("db_path", f".llama/{vector_store_id}.db"),
         }
🤖 Prompt for AI Agents
In `@src/llama_stack_configuration.py` around lines 138 - 145, The loop that
builds BYOK backends uses vector_store_id from each byok_rag entry and currently
allows empty or missing values (producing names like byok__storage and silent
overwrites); update the code in this loop (the byok_rag iteration where
backend_name = f"byok_{vector_store_id}_storage" and output[...] is set) to
validate that vector_store_id is present and non-empty and raise/throw a clear
exception (fail fast) if not, and apply the same validation/early-fail pattern
in the other helpers mentioned (construct_vector_stores_section,
construct_vector_io_providers_section, construct_models_section) so each
function checks its vector_store_id (or equivalent ID fields) before using them
to build keys or provider/model entries.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/openapi.json (1)

4287-4312: ⚠️ Potential issue | 🟠 Major

Ensure unique operationId values for /a2a GET and POST.

Both methods now share the same operationId, which can break OpenAPI tooling/codegen due to collisions. Assign a distinct ID to the GET operation (or remove/deprecate the GET route if it is no longer supported).

🔧 Suggested fix (unique GET operationId)
-                "operationId": "handle_a2a_jsonrpc_a2a_post",
+                "operationId": "handle_a2a_jsonrpc_a2a_get",
🧹 Nitpick comments (4)
docs/rag_guide.md (1)

242-242: Minor inconsistency: documentation still references vector_dbs.

Line 242 mentions vector_dbs while the rest of the document has been updated to use vector_stores. Consider updating for consistency:

-> [!NOTE]
-> When experimenting with different `models`, `providers` and `vector_dbs`, you might need to manually unregister the old ones with the Llama Stack client CLI (e.g. `llama-stack-client vector_dbs list`)
+> [!NOTE]
+> When experimenting with different `models`, `providers` and `vector_stores`, you might need to manually unregister the old ones with the Llama Stack client CLI (e.g. `llama-stack-client vector_stores list`)
tests/unit/test_llama_stack_configuration.py (1)

283-283: Consider using tmp_path fixture for the db_path.

Using /tmp/store1.db as a hardcoded path works but could be improved by leveraging pytest's tmp_path fixture for better cross-platform compatibility and automatic cleanup.

tests/unit/cache/test_postgres_cache.py (2)

518-543: Assert topic summary removal after delete.

The test name/docstring says the summary is removed, but there’s no assertion verifying it. Add a post-delete list check to prevent regressions.

🧪 Suggested assertion
     deleted = cache.delete(USER_ID_1, CONVERSATION_ID_1, False)
     assert deleted is True
+    mock_cursor.fetchall.return_value = []
+    conversations = cache.list(USER_ID_1, False)
+    assert conversations == []

573-626: Add doc_id coverage for referenced_documents persistence.

PR objectives highlight doc_id linkage, but this test only verifies doc_title/doc_url. Adding doc_id to the test would prevent regressions in cache serialization/deserialization.

🧪 Add doc_id to the persisted payload check
-    docs = [
-        ReferencedDocument(doc_title="Test Doc", doc_url=AnyUrl("http://example.com/"))
-    ]
+    docs = [
+        ReferencedDocument(
+            doc_title="Test Doc",
+            doc_url=AnyUrl("http://example.com/"),
+            doc_id="doc_1",
+        )
+    ]
@@
-    assert json.loads(inserted_json_str) == [
-        {"doc_url": "http://example.com/", "doc_title": "Test Doc"}
-    ]
+    assert json.loads(inserted_json_str) == [
+        {"doc_url": "http://example.com/", "doc_title": "Test Doc", "doc_id": "doc_1"}
+    ]
@@
-        [{"doc_url": "http://example.com/", "doc_title": "Test Doc"}],
+        [{"doc_url": "http://example.com/", "doc_title": "Test Doc", "doc_id": "doc_1"}],
@@
     assert retrieved_entries[0] == entry_with_docs
     assert retrieved_entries[0].referenced_documents[0].doc_title == "Test Doc"
+    assert retrieved_entries[0].referenced_documents[0].doc_id == "doc_1"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/app/endpoints/query_v2.py (1)

546-584: ⚠️ Potential issue | 🟠 Major

Include document_id in referenced documents for chunk linkage.
attributes.document_id is currently ignored, so referenced_documents can’t be linked back to the originating RAG chunks. Extract it, include it in the dedupe key, and pass it through (while keeping the title/url filter). Also extend ReferencedDocument to store doc_id.

🔧 Suggested update (plus model change in models/responses.py)
-    seen_docs: set[tuple[Optional[str], Optional[str]]] = set()
+    seen_docs: set[tuple[Optional[str], Optional[str], Optional[str]]] = set()
...
                 doc_title = attributes.get("title")
+                doc_id = attributes.get("doc_id") or attributes.get("document_id")

                 if doc_title or doc_url:
                     # Treat empty string as None for URL to satisfy Optional[AnyUrl]
                     final_url = doc_url if doc_url else None
-                    if (final_url, doc_title) not in seen_docs:
+                    dedupe_key = (final_url, doc_title, doc_id)
+                    if dedupe_key not in seen_docs:
                         documents.append(
-                            ReferencedDocument(doc_url=final_url, doc_title=doc_title)
+                            ReferencedDocument(
+                                doc_url=final_url, doc_title=doc_title, doc_id=doc_id
+                            )
                         )
-                        seen_docs.add((final_url, doc_title))
+                        seen_docs.add(dedupe_key)
docs/openapi.json (2)

4287-4312: ⚠️ Potential issue | 🟠 Major

Avoid duplicate operationId values for /a2a GET/POST.

OpenAPI operationIds must be unique; duplicating them can break client generation and tooling.

🔧 Proposed fix
             "get": {
                 "tags": [
                     "a2a"
                 ],
                 "summary": "Handle A2A Jsonrpc",
@@
-                "operationId": "handle_a2a_jsonrpc_a2a_post",
+                "operationId": "handle_a2a_jsonrpc_a2a_get",

8407-8439: ⚠️ Potential issue | 🟠 Major

ReferencedDocument schema is missing doc_id.

PR objectives indicate doc_id is now part of referenced_documents; the OpenAPI contract should expose it.

🔧 Proposed fix
             "ReferencedDocument": {
                 "properties": {
+                    "doc_id": {
+                        "anyOf": [
+                            {
+                                "type": "string",
+                                "minLength": 1
+                            },
+                            {
+                                "type": "null"
+                            }
+                        ],
+                        "title": "Doc Id",
+                        "description": "ID of the referenced document"
+                    },
                     "doc_url": {
                         "anyOf": [
                             {
                                 "type": "string",
@@
                 "type": "object",
                 "title": "ReferencedDocument",
                 "description": "Model representing a document referenced in generating a response.\n\nAttributes:\n    doc_url: Url to the referenced doc.\n    doc_title: Title of the referenced doc.\n+    doc_id: ID of the referenced doc."
             },

Also update the referenced_documents examples (e.g., /v1/query and QueryResponse) to include doc_id when available.

🤖 Fix all issues with AI agents
In `@docs/byok_guide.md`:
- Around line 480-485: Example 2 still defines top-level keys (vector_stores,
models, tool_groups) which conflict with the new registered_resources schema;
move those blocks under the registered_resources map in Example 2 so the entries
become registered_resources:<resource_name>: { vector_stores / models /
tool_groups ... } and ensure the vector_store_id (enterprise-docs) and
provider_id (enterprise-knowledge) remain referenced inside the corresponding
registered_resources entry rather than at top level.

In `@docs/rag_guide.md`:
- Around line 83-101: The snippets place the vector_stores mapping at the top
level but the schema expects registered_resources.vector_stores; move the entire
vector_stores block under registered_resources.vector_stores to match the rest
of the config (keep keys like embedding_dimension, embedding_model, provider_id:
custom-index, and vector_store_id unchanged) and update any comments or
references that point to the top-level vector_stores so they now refer to
registered_resources.vector_stores (apply the same change for both FAISS and
pgvector examples and ensure storage.backends.rag_backend and vector_io
provider_id references remain consistent).

In `@tests/configuration/run.yaml`:
- Around line 145-146: The config currently defines both a providers.telemetry
block and telemetry.enabled: true, which is incompatible with Llama Stack 0.3.x;
remove the providers.telemetry block (the entire providers.telemetry
configuration) and keep only telemetry.enabled: true so the file uses the
supported telemetry flag; locate the providers object in the YAML (the
providers.telemetry section) and delete that subsection, leaving the top-level
telemetry.enabled setting intact.
🧹 Nitpick comments (3)
tests/unit/app/endpoints/test_query_v2.py (1)

912-951: Add doc_id assertions to lock in the new linkage.
The fixtures now carry document_id, but the test doesn’t verify it on ReferencedDocument. If doc_id is part of the contract, add assertions so regressions are caught.

✅ Suggested test additions (once ReferencedDocument.doc_id exists)
     assert doc1.doc_title == "Title 1"
     assert str(doc1.doc_url) == "http://example.com/doc2"
+    assert doc1.doc_id == "doc-123"
...
     assert doc2.doc_title == "Title 2"
     assert str(doc2.doc_url) == "http://example.com/doc3"
+    assert doc2.doc_id == "doc-456"

Also applies to: 994-1007

tests/unit/test_llama_stack_configuration.py (1)

9-222: Solid coverage for the new construct_ helpers.*

src/llama_stack_configuration.py (1)

323-347: Avoid in-place mutation of ls_config in enrich_byok_rag.

Consider returning a new enriched config instead of mutating the input dict. It will reduce side effects and align with the project’s style expectations.

As per coding guidelines: "Avoid in-place parameter modification anti-patterns; instead return new data structures".

@onmete
Copy link
Contributor

onmete commented Feb 2, 2026

I hope it works ;)

/lgtm

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/openapi.json (1)

4287-4312: ⚠️ Potential issue | 🟠 Major

operationId must be unique across GET/POST on /a2a.

Line 4293 and Line 4311 use the same operationId, which breaks OpenAPI uniqueness and can collide in code generators.

🔧 Suggested fix
-                "operationId": "handle_a2a_jsonrpc_a2a_post",
+                "operationId": "handle_a2a_jsonrpc_a2a_get",
🧹 Nitpick comments (1)
docs/openapi.json (1)

3667-3677: Clarify server-only parameters in the /v1/infer description.

Line 3667 lists request, background_tasks, and auth under “Args”, which reads like client-supplied inputs in OpenAPI. Consider labeling them as server-side context or removing to avoid client confusion.

💡 Suggested doc tweak
-                "description": "Handle rlsapi v1 /infer requests for stateless inference.\n\nThis endpoint serves requests from the RHEL Lightspeed Command Line Assistant (CLA).\n\nAccepts a question with optional context (stdin, attachments, terminal output,\nsystem info) and returns an LLM-generated response.\n\nArgs:\n    infer_request: The inference request containing question and context.\n    request: The FastAPI request object for accessing headers and state.\n    background_tasks: FastAPI background tasks for async Splunk event sending.\n    auth: Authentication tuple from the configured auth provider.\n\nReturns:\n    RlsapiV1InferResponse containing the generated response text and request ID.\n\nRaises:\n    HTTPException: 503 if the LLM service is unavailable.",
+                "description": "Handle rlsapi v1 /infer requests for stateless inference.\n\nThis endpoint serves requests from the RHEL Lightspeed Command Line Assistant (CLA).\n\nAccepts a question with optional context (stdin, attachments, terminal output,\nsystem info) and returns an LLM-generated response.\n\nArgs:\n    infer_request: The inference request containing question and context.\n\nServer-side context (not provided by clients):\n    request: The FastAPI request object for accessing headers and state.\n    background_tasks: FastAPI background tasks for async Splunk event sending.\n    auth: Authentication tuple from the configured auth provider.\n\nReturns:\n    RlsapiV1InferResponse containing the generated response text and request ID.\n\nRaises:\n    HTTPException: 503 if the LLM service is unavailable.",

@are-ces are-ces marked this pull request as draft February 2, 2026 15:16
- now extracting doc_title, doc_url from the "attributes" in response
- removed chunk metadata extraction from results which are not "type": "file_search_call"
- citations could be added in future, needs discussion with other teams
- Updated BYOK config generation
@are-ces are-ces marked this pull request as ready for review February 2, 2026 15:57
Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 73ad33a into lightspeed-core:main Feb 2, 2026
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants