refactor: llguidance#288
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
jakelorocco
left a comment
There was a problem hiding this comment.
Looks like there's some mypy errors as well.
I don't know why there is a mypy error. Mypy passes locally for me. |
3ebb8bd to
8dce1f2
Compare
We've seen this happen recently and it came down to mypy versions. Could be the cause. I re-ran the checks on the latest branch. If those still fail and you still pass with mypy, try checking your mypy version. |
|
With an updated lockfile the mypy passes. Test is terminated but locally it is passing. If you approve, I will merge this |
…API, use llguidance
…lm, V1 API, use llguidance
…lm, V1 API, use llguidance
…lm, V1 API, use llguidance
…lm, V1 API, use llguidance
938d919 to
62923fe
Compare
There was a problem hiding this comment.
@guicho271828, can you remind me if we ever got the local vllm backend running on mac (or if you remember a discussion about that)?
I tried installing this version of the mellea package on my mac and was getting versioning errors.
Details
(mellea) ➜ mellea git:(pr/guicho271828/288) uv pip install -e '.[all]' --all-extras --group dev -r pyproject.toml
Using Python 3.12.0 environment at: /opt/homebrew/Caskroom/miniforge/base/envs/mellea
× No solution found when resolving dependencies:
╰─▶ Because only the following versions of nvidia-cudnn-frontend are available:
nvidia-cudnn-frontend<=1.13.0
nvidia-cudnn-frontend==1.14.0
nvidia-cudnn-frontend==1.14.1
nvidia-cudnn-frontend==1.15.0
nvidia-cudnn-frontend==1.16.0
nvidia-cudnn-frontend==1.17.0
and nvidia-cudnn-frontend>=1.13.0 has no wheels with a matching platform tag (e.g., `macosx_15_0_arm64`), we can conclude that
nvidia-cudnn-frontend>=1.13.0 cannot be used.
And because flashinfer-python==0.5.3 depends on nvidia-cudnn-frontend>=1.13.0 and vllm==0.13.0 depends on flashinfer-python==0.5.3, we
can conclude that vllm==0.13.0 cannot be used.
And because only vllm<=0.13.0 is available and mellea depends on vllm>=0.13.0, we can conclude that your requirements are
unsatisfiable.
It looks like the newest version I can install on a mac is 0.11.0.
The vllm engine can't seem to get enough resources to run locally on my mac anyways, so maybe we can just add a note somewhere that "mella[vllm]" doesn't work for macs?
To run vllm locally, vllm assumes
So yes |
29588fb disables installing vllm on darwin. |
| vllm.sampling_params.StructuredOutputsParams( | ||
| json=format.model_json_schema() | ||
| ) |
There was a problem hiding this comment.
FWIW, when I was trying this locally (uv run pytest test/backends/test_vllm.py on Fedora 43, Python 3.12.8, CUDA 13.0), the test_generate_from_raw_with_format test failed. Here's a snippet of the output:
session = <mellea.stdlib.session.MelleaSession object at 0x7ff08f129a90>
@pytest.mark.qualitative
async def test_generate_from_raw_with_format(session):
prompts = ["what is 1+1?", "what is 2+2?", "what is 3+3?", "what is 4+4?"]
class Answer(pydantic.BaseModel):
name: str
value: int
results = await session.backend.generate_from_raw(
actions=[CBlock(value=prompt) for prompt in prompts],
ctx=session.ctx,
format=Answer,
)
assert len(results) == len(prompts)
random_result = results[0]
try:
answer = Answer.model_validate_json(random_result.value)
except pydantic.ValidationError as e:
> assert False, (
f"formatting directive failed for {random_result.value}: {e.json()}"
)
E AssertionError: formatting directive failed for {
E
E
E "name": "binary",
E "value": 1
E : [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: EOF while parsing an object at line 6 column 1","input":"{\n\n\n \"name\": \"binary\",\n \"value\": 1\n ","ctx":{"error":"EOF while parsing an object at line 6 column 1"},"url":"https://errors.pydantic.dev/2.12/v/json_invalid"}]
E assert False
test/backends/test_vllm.py:133: AssertionError
Seems like what's happening is that the model is following the grammar but not valid JSON (which seems like may just be a fact of life with a tiny model, I see the vllm test is using qwen3 0.6b).
There was a problem hiding this comment.
A quick hack I tried locally that got the test to pass was to add an additional prompt to the list around using proper json content:
diff --git a/mellea/backends/vllm.py b/mellea/backends/vllm.py
index a59ced7..6e3ee5a 100644
--- a/mellea/backends/vllm.py
+++ b/mellea/backends/vllm.py
@@ -447,8 +447,21 @@ class LocalVLLMBackend(FormatterBackend):
model_options = self._simplify_and_merge(model_options)
+ # When structured output is requested, ensure there's a reasonable max_tokens limit
+ # to prevent excessive whitespace generation and ensure output completion.
+ if format is not None and ModelOption.MAX_NEW_TOKENS not in model_options:
+ model_options[ModelOption.MAX_NEW_TOKENS] = 512
+
prompts = [self.formatter.print(action) for action in actions]
+ # When structured output is requested, prepend format instructions to help the model
+ # understand what JSON content to generate. Without this, models may produce valid JSON
+ # structure (due to constrained decoding) but with meaningless content like whitespace.
+ if format is not None:
+ schema_str = json.dumps(format.model_json_schema(), indent=2)
+ format_prefix = f"Output a JSON object matching this schema:\n{schema_str}\n\nQuery: "
+ prompts = [format_prefix + p for p in prompts]
+
sampling_params = vllm.SamplingParams(
**self._make_backend_specific_and_remove(
model_options, vllm.SamplingParams
There was a problem hiding this comment.
We could make this error message better, but I think what's happening is that the model runs out of tokens before the json completes (at least I see this error with hf sometimes):
"{\n\n\n \"name\": \"binary\",\n \"value\": 1\n " <-- missing closing bracket
There was a problem hiding this comment.
yeah, that seems to be it. just re-ran the test and I see:
E Invalid JSON: EOF while parsing an object at line 130 column 0 [type=json_invalid, input_value='{ \n\n\n\n\n\n\n\n\n\n\...n\n\n\n\n\n\n\n\n\n\n\n', input_type=str]
and also
E : [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: EOF while parsing an object at line 130 column 0","input":"{ \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","ctx":{"error":"EOF while parsing an object at line 130 column 0"},"url":"https://errors.pydantic.dev/2.12/v/json_invalid"}]
though it's interesting, on my run looks like the string is an opening bracket and then a LOT of newlines...
| if os.environ.get("VLLM_USE_V1", -1) != "0": | ||
| pytest.skip("skipping vllm tests; tests require `export VLLM_USE_V1=0`") |
There was a problem hiding this comment.
My laptop GPU wasn't big enough to run this test 😢
I'm going to try running on a remote environment where I can get a bigger GPU...
No description provided.