Skip to content

FEAT Add Policy Puppetry converter (parent #511) #2080

Description

@kenlacroix

Is your feature request related to a problem? Please describe.

#511 tracks LLM-scanner feature parity with Garak/Giskard/CyberSecEval, and PyRIT currently has no converter for the structured-policy-injection family. I'd like to close that gap with a converter for Policy Puppetry, disclosed by HiddenLayer researchers (Conor McCauley, Kenneth Yeung, Jason Martin, Kasimir Schulz) on April 24, 2025: https://www.hiddenlayer.com/research/novel-universal-bypass-for-all-major-llms

The technique reformulates a request as a fabricated policy/config block (XML, JSON, or INI) that many models treat as trusted developer instructions rather than untrusted input — combining a policy envelope (e.g. <interaction-config>), a fictional roleplay scene, and optional leetspeak encoding. HiddenLayer reported it as near-universal: a single template transferring across ChatGPT, Claude, Gemini, Llama, DeepSeek, Qwen, Mistral, and Copilot, with only minor adjustments needed for advanced reasoning models.

Describe the solution you'd like

A PolicyPuppetryConverter implemented generically, per doc/contributing/2_incorporating_research.md (line 6: "Attacks should always favor being generic… any attack should incorporate generic converters, generic scorers, multi-modal functionality"):

  • Pure-template, no-LLM converter (deterministic, no target dependency).
  • A policy_format parameter ("xml" | "json" | "ini", default "xml"), since the technique is demonstrated in all three formats.
  • The user's prompt injected via a SeedPrompt YAML template ({{ prompt }}) — wrapper is data, not hardcoded. The shipped template uses a benign placeholder and a generalized persona, not a weaponized payload.
  • Parameterized roleplay persona/scene so it generalizes beyond the paper's specific example.

Describe alternatives you've considered, if relevant

  • Leetspeak: compose with the existing LeetspeakConverter in a chain (keeps each converter single-responsibility) vs. an optional leetspeak flag on the converter for single-call ergonomics. I lean toward composition but can add the flag.
  • Template packaging: one YAML with format branches vs. three per-format YAMLs vs. selecting the block in Python (relevant because SeedPrompt.from_yaml_file eagerly pre-renders trusted templates).
  • LLM-backed rephrasing vs. pure-template — I chose pure-template for determinism and testability.

Additional context

This advances the #511 parity goal with a reusable, chainable building block for the structured-policy-injection class. A few questions before I open the PR:

  1. Does a pure-template, no-LLM PolicyPuppetryConverter with policy_format (xml/json/ini) match how you'd want this scoped?
  2. Leetspeak: compose with LeetspeakConverter (my default) or an optional flag?
  3. Preferred home/packaging for the wrapper template (single vs. per-format YAML)?
  4. Persona/scene: fully parameterized vs. a sensible default?

(Happy to sign the Microsoft CLA at PR time.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions