Skip to content

bug: Correct categorization and detection for heavier tests #396

@planetf1

Description

@planetf1

#372 and #326 optimized tests to run cleanly, to the best of the machine's capability, on a macbook m1 32GB

However there remain other tests which are not running cleanly in a better resourced environment (for example >64GB ram/gpu)

Example from hugging face tests:

The whole set will fail if ollama is not running (despite selecting not to run ollama tests:

==================================== ERRORS ====================================
__________ ERROR collecting test/stdlib/sampling/test_sampling_ctx.py __________
test/stdlib/sampling/test_sampling_ctx.py:10: in <module>
    class TestSamplingCtxCase:
test/stdlib/sampling/test_sampling_ctx.py:11: in TestSamplingCtxCase
    m = start_session(
mellea/stdlib/session.py:163: in start_session
    backend = backend_class(model_id, model_options=model_options, **backend_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
mellea/backends/ollama.py:81: in __init__
    raise Exception(err)
E   Exception: could not create OllamaModelBackend: ollama server not running at None
------------------------------- Captured stdout --------------------------------
�[31;20m=== 14:32:34-ERROR ======
could not create OllamaModelBackend: ollama server not running at None�[0m

Some fail with GPU memory issues (more management is needed):

E       torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB. GPU 0 has a total capacity of 79.19 GiB of which 9.06 MiB is free. Including non-PyTorch memory, this process has 79.16 GiB memory in use. Of the allocated memory 78.05 GiB is allocated by PyTorch, with 2.00 MiB allocated in private pools (e.g., CUDA Graphs), and 112.10 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

overall summary on hf:

=========================== short test summary info ============================
FAILED test/backends/test_vllm.py::test_generate_from_raw_with_format - Asser...
FAILED test/stdlib/requirements/test_requirement.py::test_llmaj_validation_req_output_field
FAILED test/stdlib/requirements/test_requirement.py::test_llmaj_requirement_uses_requirement_template
FAILED test/stdlib/test_spans.py::test_kv - AssertionError: Expected correct ...
FAILED docs/examples/aLora/101_example.py::101_example.py - Example failed wi...
FAILED docs/examples/image_text_models/vision_litellm_backend.py::vision_litellm_backend.py
FAILED docs/examples/intrinsics/answer_relevance.py::answer_relevance.py - Ex...
FAILED docs/examples/intrinsics/answerability.py::answerability.py - Example ...
FAILED docs/examples/intrinsics/citations.py::citations.py - Example failed w...
FAILED docs/examples/intrinsics/context_relevance.py::context_relevance.py - ...
FAILED docs/examples/intrinsics/hallucination_detection.py::hallucination_detection.py
FAILED docs/examples/intrinsics/intrinsics.py::intrinsics.py - Example failed...
FAILED docs/examples/intrinsics/query_rewrite.py::query_rewrite.py - Example ...
FAILED docs/examples/mify/rich_document_advanced.py::rich_document_advanced.py
FAILED docs/examples/safety/guardian.py::guardian.py - Example failed with ex...
FAILED docs/examples/safety/guardian_huggingface.py::guardian_huggingface.py
FAILED docs/examples/safety/repair_with_guardian.py::repair_with_guardian.py
ERROR test/backends/test_tool_calls.py::test_tool_called_from_context_action
ERROR test/backends/test_tool_calls.py::test_tool_called - Exception: could n...
ERROR test/backends/test_tool_calls.py::test_tool_not_called - Exception: cou...
ERROR test/backends/test_vllm_tools.py::test_tool - RuntimeError: no matter h...
ERROR test/core/test_model_output_thunk.py::test_model_output_thunk_copy - Ex...
ERROR test/core/test_model_output_thunk.py::test_model_output_thunk_deepcopy
ERROR test/stdlib/components/docs/test_richdocument.py::test_richdocument_basics
ERROR test/stdlib/components/docs/test_richdocument.py::test_richdocument_markdown
ERROR test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
ERROR test/stdlib/components/docs/test_richdocument.py::test_table - docling....
ERROR test/stdlib/sampling/test_majority_voting.py::test_majority_voting_for_math
ERROR test/stdlib/sampling/test_majority_voting.py::test_MBRDRougeL - Excepti...
ERROR test/stdlib/test_chat_view.py::test_chat_view_linear_ctx - Exception: c...
ERROR test/stdlib/test_chat_view.py::test_chat_view_simple_ctx - Exception: c...
ERROR test/stdlib/test_functional.py::test_func_context - Exception: could no...
ERROR test/stdlib/test_functional.py::test_aact - Exception: could not create...
ERROR test/stdlib/test_functional.py::test_ainstruct - Exception: could not c...
ERROR test/stdlib/test_functional.py::test_avalidate - Exception: could not c...
ERROR test/stdlib/test_session.py::test_start_session_openai_with_kwargs - Ex...
ERROR test/stdlib/test_session.py::test_aact - Exception: could not create Ol...
ERROR test/stdlib/test_session.py::test_ainstruct - Exception: could not crea...
ERROR test/stdlib/test_session.py::test_async_await_with_chat_context - Excep...
ERROR test/stdlib/test_session.py::test_async_without_waiting_with_chat_context
ERROR test/stdlib/test_session.py::test_session_copy_with_context_ops - Excep...
ERROR test/stdlib/test_session.py::test_powerup - Exception: could not create...
= 17 failed, 165 passed, 19 skipped, 68 deselected, 13 warnings, 25 errors in 407.42s (0:06:47) =

------------------------------------------------------------

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions