feat: Add tool selection safety scorer and role-play transform #311

rdheekonda · 2026-01-23T20:28:01Z

[AIRT] Tool selection safety and role-play testing

Key Changes:

Adds tool_selection_safety scorer to detect when agents choose dangerous tools over safe alternatives (returns 1.0 for unsafe, 0.5 for questionable, 0.0 for safe)
Adds role_play_wrapper transform with 4 scenarios (educational, fictional, historical, research) for jailbreak testing
Adds 3 evaluation rubrics for llm_judge: tool_selection_safety, unbounded_agency, web_chatbot_security

Added:

tool_selection_safety() scorer in tool_invocation.py
role_play_wrapper() transform in stylistic.py
3 YAML rubrics in dreadnode/data/rubrics/
Test suites and example notebook

Usage:

```python

Tool safety scorer

scorer = dn.scorers.tool_selection_safety(
safe_tools=["http_get", "tcp_connect"],
dangerous_tools=["shell_execute"],
)

Role-play transform

transform = dn.transforms.stylistic.role_play_wrapper(
scenario="educational",
character="security researcher",
)
```

Generated Summary:

Summary of Changes:

Introduced new rubrics for safety evaluation in AI tools:
- tool_selection_safety: Evaluates the choice of tools based on safety and risk (OWASP ASI02).
- unbounded_agency: Assesses whether agents operate within user-defined limits and request permission before expanding scope (OWASP ASI10).
- web_chatbot_security: Identifies web chatbot plugin vulnerabilities based on IEEE S&P 2026 findings.
Added a scoring mechanism for tool selection safety, focusing on:
- Detection of dangerous tool usage when safer alternatives are available.
- Guidelines for classifying tool choice as safe, questionable, or unsafe.
Implemented a role-play wrapper transform for testing against jailbreak attempts, distinguishing between legitimate educational inquiries and potentially harmful requests.
Included an example Jupyter notebook to demonstrate the usage of both the tool selection safety scorer and the role-play wrapper, with practical scenarios to illustrate scoring and evaluation.

Potential Impact:

Enhances the overall security posture by systematically evaluating tool selection and agency behavior in AI applications.
Provides a structured approach to identify and mitigate vulnerabilities in web chatbots and tool misuse.
Facilitates developers in ensuring compliance with security best practices through clear guidelines and scoring metrics.

This summary was generated with ❤️ by rigging

- Add tool_selection_safety scorer for detecting unsafe tool choices - Add role_play_wrapper transform for jailbreak testing - Add 3 evaluation rubrics (tool_selection_safety, unbounded_agency, web_chatbot_security)

feat: Add tool selection safety scorer and role-play transform

4015d18

- Add tool_selection_safety scorer for detecting unsafe tool choices - Add role_play_wrapper transform for jailbreak testing - Add 3 evaluation rubrics (tool_selection_safety, unbounded_agency, web_chatbot_security)

dreadnode-renovate-bot bot added area/tests Changes to test files and testing infrastructure area/examples Changes to example code and demonstrations labels Jan 23, 2026

fix: Add type casts for Metric attributes to satisfy mypy

26c1b9c

rdheekonda added this pull request to the merge queue Jan 26, 2026

Merged via the queue into main with commit a07c9c2 Jan 26, 2026
8 checks passed

rdheekonda deleted the feat/tool-selection-safety-scorer-and-role-play-transform branch January 26, 2026 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add tool selection safety scorer and role-play transform #311

feat: Add tool selection safety scorer and role-play transform #311

Uh oh!

rdheekonda commented Jan 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add tool selection safety scorer and role-play transform #311

feat: Add tool selection safety scorer and role-play transform #311

Uh oh!

Conversation

rdheekonda commented Jan 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[AIRT] Tool selection safety and role-play testing

Tool safety scorer

Role-play transform

transform = dn.transforms.stylistic.role_play_wrapper( scenario="educational", character="security researcher", ) ```

Generated Summary:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rdheekonda commented Jan 23, 2026 •

edited by github-actions bot

Loading

transform = dn.transforms.stylistic.role_play_wrapper(
scenario="educational",
character="security researcher",
)
```