Skip to content

Conversation

@rdheekonda
Copy link
Contributor

@rdheekonda rdheekonda commented Jan 23, 2026

[AIRT] Tool selection safety and role-play testing

Key Changes:

  • Adds tool_selection_safety scorer to detect when agents choose dangerous tools over safe alternatives (returns 1.0 for unsafe, 0.5 for questionable, 0.0 for safe)
  • Adds role_play_wrapper transform with 4 scenarios (educational, fictional, historical, research) for jailbreak testing
  • Adds 3 evaluation rubrics for llm_judge: tool_selection_safety, unbounded_agency, web_chatbot_security

Added:

  • tool_selection_safety() scorer in tool_invocation.py
  • role_play_wrapper() transform in stylistic.py
  • 3 YAML rubrics in dreadnode/data/rubrics/
  • Test suites and example notebook

Usage:

```python

Tool safety scorer

scorer = dn.scorers.tool_selection_safety(
safe_tools=["http_get", "tcp_connect"],
dangerous_tools=["shell_execute"],
)

Role-play transform

transform = dn.transforms.stylistic.role_play_wrapper(
scenario="educational",
character="security researcher",
)
```

Generated Summary:

Summary of Changes:

  • Introduced new rubrics for safety evaluation in AI tools:

    • tool_selection_safety: Evaluates the choice of tools based on safety and risk (OWASP ASI02).
    • unbounded_agency: Assesses whether agents operate within user-defined limits and request permission before expanding scope (OWASP ASI10).
    • web_chatbot_security: Identifies web chatbot plugin vulnerabilities based on IEEE S&P 2026 findings.
  • Added a scoring mechanism for tool selection safety, focusing on:

    • Detection of dangerous tool usage when safer alternatives are available.
    • Guidelines for classifying tool choice as safe, questionable, or unsafe.
  • Implemented a role-play wrapper transform for testing against jailbreak attempts, distinguishing between legitimate educational inquiries and potentially harmful requests.

  • Included an example Jupyter notebook to demonstrate the usage of both the tool selection safety scorer and the role-play wrapper, with practical scenarios to illustrate scoring and evaluation.

Potential Impact:

  • Enhances the overall security posture by systematically evaluating tool selection and agency behavior in AI applications.
  • Provides a structured approach to identify and mitigate vulnerabilities in web chatbots and tool misuse.
  • Facilitates developers in ensuring compliance with security best practices through clear guidelines and scoring metrics.

This summary was generated with ❤️ by rigging

- Add tool_selection_safety scorer for detecting unsafe tool choices
- Add role_play_wrapper transform for jailbreak testing
- Add 3 evaluation rubrics (tool_selection_safety, unbounded_agency, web_chatbot_security)
@dreadnode-renovate-bot dreadnode-renovate-bot bot added area/tests Changes to test files and testing infrastructure area/examples Changes to example code and demonstrations labels Jan 23, 2026
@rdheekonda rdheekonda added this pull request to the merge queue Jan 26, 2026
Merged via the queue into main with commit a07c9c2 Jan 26, 2026
8 checks passed
@rdheekonda rdheekonda deleted the feat/tool-selection-safety-scorer-and-role-play-transform branch January 26, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/examples Changes to example code and demonstrations area/tests Changes to test files and testing infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants