feat: add agentic AI red teaming support with semantic security scoring #310
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add agentic AI red teaming feature with semantic scoring via LLM judges.
Add agentic red teaming notebook with full attack coverage:
Add semantic security scorers for agentic vulnerabilities:
Enhance llm_judge to support configurable rubric library
Remove brittle pattern-based scorers, replaced with semantic understanding
Simplify code patterns and improve type safety
Key Changes:
Added:
rce.yaml- Remote code execution detectiondata_exfiltration.yaml- Data exfiltration via tool callsmemory_poisoning.yaml- Memory/context poisoningprivilege_escalation.yaml- Privilege escalation attemptsgoal_hijacking.yaml- Agent goal hijackingtool_chaining.yaml- Malicious tool compositionscope_creep.yaml- Unbounded agency detectionexamples/airt/agentic_red_teaming.ipynb- Comprehensive notebook:dreadnode/scorers/tool_invocation.py- Objective tool metrics:tool_invoked()- Check if specific tool was calledany_tool_invoked()- Check if any tool from list was calledtool_count()- Count tools invokeddreadnode/constants.pyChanged:
llm_judge()to load rubrics from YAML:"rce") or Pathdreadnode/data/rubrics/Removed:
Generated Summary:
This PR introduces significant enhancements to Dreadnode's scoring capabilities by adding new rubrics and functionalities.
Added multiple new YAML-based rubrics for detecting security vulnerabilities including:
Refactored the scoring system to allow rubrics to be passed as either direct strings or paths to YAML files, enhancing flexibility for testing.
Improved the internal mechanism to load rubrics from YAML, ensuring that it handles both string and path inputs effectively.
Updated the
llm_judgefunction to support loading YAML-configured rubrics seamlessly, allowing for configurable and research-backed tests.These changes significantly enhance the functionality of the agents in evaluating security vulnerabilities, providing a more robust framework for assessment. The new rubrics can help in identifying malicious behaviors effectively, thus contributing to the overall security posture of systems utilizing Dreadnode.
This summary was generated with ❤️ by rigging