Is your feature request related to a problem? Please describe.
#511 tracks LLM-scanner feature parity with Garak/Giskard/CyberSecEval, and PyRIT currently has no converter for the structured-policy-injection family. I'd like to close that gap with a converter for Policy Puppetry, disclosed by HiddenLayer researchers (Conor McCauley, Kenneth Yeung, Jason Martin, Kasimir Schulz) on April 24, 2025: https://www.hiddenlayer.com/research/novel-universal-bypass-for-all-major-llms
The technique reformulates a request as a fabricated policy/config block (XML, JSON, or INI) that many models treat as trusted developer instructions rather than untrusted input — combining a policy envelope (e.g. <interaction-config>), a fictional roleplay scene, and optional leetspeak encoding. HiddenLayer reported it as near-universal: a single template transferring across ChatGPT, Claude, Gemini, Llama, DeepSeek, Qwen, Mistral, and Copilot, with only minor adjustments needed for advanced reasoning models.
Describe the solution you'd like
A PolicyPuppetryConverter implemented generically, per doc/contributing/2_incorporating_research.md (line 6: "Attacks should always favor being generic… any attack should incorporate generic converters, generic scorers, multi-modal functionality"):
- Pure-template, no-LLM converter (deterministic, no target dependency).
- A
policy_format parameter ("xml" | "json" | "ini", default "xml"), since the technique is demonstrated in all three formats.
- The user's prompt injected via a
SeedPrompt YAML template ({{ prompt }}) — wrapper is data, not hardcoded. The shipped template uses a benign placeholder and a generalized persona, not a weaponized payload.
- Parameterized roleplay persona/scene so it generalizes beyond the paper's specific example.
Describe alternatives you've considered, if relevant
- Leetspeak: compose with the existing
LeetspeakConverter in a chain (keeps each converter single-responsibility) vs. an optional leetspeak flag on the converter for single-call ergonomics. I lean toward composition but can add the flag.
- Template packaging: one YAML with format branches vs. three per-format YAMLs vs. selecting the block in Python (relevant because
SeedPrompt.from_yaml_file eagerly pre-renders trusted templates).
- LLM-backed rephrasing vs. pure-template — I chose pure-template for determinism and testability.
Additional context
This advances the #511 parity goal with a reusable, chainable building block for the structured-policy-injection class. A few questions before I open the PR:
- Does a pure-template, no-LLM
PolicyPuppetryConverter with policy_format (xml/json/ini) match how you'd want this scoped?
- Leetspeak: compose with
LeetspeakConverter (my default) or an optional flag?
- Preferred home/packaging for the wrapper template (single vs. per-format YAML)?
- Persona/scene: fully parameterized vs. a sensible default?
(Happy to sign the Microsoft CLA at PR time.)
Is your feature request related to a problem? Please describe.
#511 tracks LLM-scanner feature parity with Garak/Giskard/CyberSecEval, and PyRIT currently has no converter for the structured-policy-injection family. I'd like to close that gap with a converter for Policy Puppetry, disclosed by HiddenLayer researchers (Conor McCauley, Kenneth Yeung, Jason Martin, Kasimir Schulz) on April 24, 2025: https://www.hiddenlayer.com/research/novel-universal-bypass-for-all-major-llms
The technique reformulates a request as a fabricated policy/config block (XML, JSON, or INI) that many models treat as trusted developer instructions rather than untrusted input — combining a policy envelope (e.g.
<interaction-config>), a fictional roleplay scene, and optional leetspeak encoding. HiddenLayer reported it as near-universal: a single template transferring across ChatGPT, Claude, Gemini, Llama, DeepSeek, Qwen, Mistral, and Copilot, with only minor adjustments needed for advanced reasoning models.Describe the solution you'd like
A
PolicyPuppetryConverterimplemented generically, perdoc/contributing/2_incorporating_research.md(line 6: "Attacks should always favor being generic… any attack should incorporate generic converters, generic scorers, multi-modal functionality"):policy_formatparameter ("xml" | "json" | "ini", default"xml"), since the technique is demonstrated in all three formats.SeedPromptYAML template ({{ prompt }}) — wrapper is data, not hardcoded. The shipped template uses a benign placeholder and a generalized persona, not a weaponized payload.Describe alternatives you've considered, if relevant
LeetspeakConverterin a chain (keeps each converter single-responsibility) vs. an optionalleetspeakflag on the converter for single-call ergonomics. I lean toward composition but can add the flag.SeedPrompt.from_yaml_fileeagerly pre-renders trusted templates).Additional context
This advances the #511 parity goal with a reusable, chainable building block for the structured-policy-injection class. A few questions before I open the PR:
PolicyPuppetryConverterwithpolicy_format(xml/json/ini) match how you'd want this scoped?LeetspeakConverter(my default) or an optional flag?(Happy to sign the Microsoft CLA at PR time.)