[DRAFT] FEAT add Policy Puppetry converter (#2080) by kenlacroix · Pull Request #2081 · microsoft/PyRIT

kenlacroix · 2026-06-24T21:42:23Z

Description

Implements the PolicyPuppetryConverter proposed in #2080 — a converter for HiddenLayer's Policy Puppetry technique (Apr 2025), which wraps a request in a fabricated policy/config block (XML/JSON/INI) that many models treat as trusted developer instructions.

This is a [DRAFT] opened alongside #2080 so the working implementation is visible while the design is still under discussion. I'd like maintainer steer on the open questions in #2080 before finalizing:

Pure-template, no-LLM converter with a policy_format param (xml/json/ini) — does this match how you'd want it scoped?
Leetspeak: compose with the existing LeetspeakConverter in a chain, or keep the optional leetspeak flag this draft currently exposes?
Template packaging: the wrapper ships as a SeedPrompt YAML. Because SeedPrompt.from_yaml_file eagerly pre-renders trusted templates (collapsing the policy_format branch), this draft loads the YAML and constructs SeedPrompt(**data) directly. Preference between that, selecting the format block in Python, or three per-format YAMLs?
Roleplay persona/scene is parameterized rather than hardcoded — keep generic, or ship a sensible default?

Design intentionally favors generic implementation per doc/contributing/2_incorporating_research.md. The shipped template uses a benign {{ prompt }} placeholder and a generalized persona, not a weaponized payload.

Tests and Documentation

Tests: tests/unit/prompt_converter/test_policy_puppetry_converter.py — 8 unit tests (placeholder substitution, each policy_format, formats differ, leetspeak toggle, input/output support). All pass locally (8 passed).
Documentation / JupyText: not yet added — I held the doc/code/converters/1_text_to_text_converters.py demo cell until the design (esp. the leetspeak + template-packaging questions) is settled, since those change the example. Will add the JupyText-paired cell before marking ready for review.
CLA: happy to sign the Microsoft CLA.

kenlacroix · 2026-06-24T21:46:54Z

@microsoft-github-policy-service agree

romanlutz · 2026-06-26T21:44:31Z

+  - HiddenLayer
+groups:
+  - HiddenLayer
+source: https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/


I'd like to cite that in the references.bib file, too.

Done — added a hiddenlayer2025policypuppetry entry to doc/references.bib pointing at the HiddenLayer disclosure (in the latest push), and kept the source: URL on the template itself. Let me know if you'd prefer a different citation key.

romanlutz · 2026-06-26T21:47:24Z

+        wrapped = self._prompt_template.render_template_value(
+            prompt=prompt,
+            policy_format=self._policy_format,
+        )
+
+        if self._leetspeak_converter is not None:
+            wrapped = (await self._leetspeak_converter.convert_async(prompt=wrapped, input_type="text")).output_text


I'm not convinced this needs a dedicated converter. We could just use TextJailbreakConverter with the given template and optionally add LeetspeakConverter on top. Am I missing anything?

Good push — and for a single fixed format you're right: this would just be a SeedPrompt/TextJailbreakConverter template with LeetspeakConverter chained on, the same as the DAN/AIM-style jailbreaks. If that's all this were, a dedicated converter wouldn't earn its keep.

The reason it's a converter is the policy_format parameter (xml/json/ini), and it runs into a concrete lifecycle problem with the template route:

TextJailbreakConverter can only source its template via TextJailBreak, and every TextJailBreak path constructs the SeedPrompt with is_jinja_template=True (from_yaml_file for template_file_name/template_path; explicitly for string_template). That trips the eager render in SeedPrompt._render_and_infer_data_type — self.value = render_template_value_silent(**PATHS_DICT) at construction. Since PATHS_DICT carries no policy_format, the {% if policy_format == "json" %} … {% elif "ini" %} … {% else %} block resolves with policy_format undefined and collapses to the XML else-branch before any format kwarg is applied. JSON and INI are unreachable through TextJailbreakConverter.

The converter avoids that by building the SeedPrompt straight from the parsed YAML (no is_jinja_template=True) and deferring the Jinja render to convert_async, where both prompt and policy_format are known. The "each policy_format" / "formats differ" unit tests exercise exactly that path.

The alternatives I see:

Three per-format YAML templates + TextJailbreakConverter. Works, but it's three near-duplicate files, format selection isn't validated (vs. the Literal["xml","json","ini"] here), and leetspeak stays a manual second chain step.

This converter. One component, validated format enum, optional integrated leetspeak, mapping 1:1 to the HiddenLayer technique per 2_incorporating_research.md.

I went with (2) because Policy Puppetry isn't "another DAN" — the structural format variants are intrinsic to the technique, and the single-conditional template can't survive the trusted-template pre-render. That said, if you'd still prefer the template route for corpus consistency, I'm happy to split into three per-format templates and document the LeetspeakConverter chain — just flagging the conditional-collapse tradeoff so it's a deliberate call rather than a surprise. Your repo, your taste here.

This is what I meant:

# pyrit/setup/initializers/components/scenario_techniques.py from pyrit.executor.attack import AttackConverterConfig, PromptSendingAttack from pyrit.prompt_converter import PolicyPuppetryConverter from pyrit.prompt_normalizer import PromptConverterConfiguration AttackTechniqueFactory( name="policy_puppetry", attack_class=PromptSendingAttack, strategy_tags=["core", "single_turn", "default", "light"], attack_kwargs={ "attack_converter_config": AttackConverterConfig( request_converters=PromptConverterConfiguration.from_converters( converters=[PolicyPuppetryConverter(policy_format="xml")] ) ) }, )

You can pass the arg for the converter there. Or stack Leetspeak on top.

Got it — thanks, that snippet clears it up. Keeping the dedicated converter then, with policy_format passed at construction exactly as you show.

On leetspeak: agreed it shouldnt live inside this converter. I've removed the leetspeak flag (and its internal LeetspeakConverter wiring) so this stays a single-responsibility format-wrapper; obfuscation is now composed in the chain — e.g. request_converters=[PolicyPuppetryConverter(policy_format="xml"), LeetspeakConverter()] — matching your example. Dropped the corresponding unit test (now 7). Pushed.

Want me to add the scenario_techniques.py AttackTechniqueFactory registration (tags core/single_turn/default/light, as in your snippet) in this PR, or land it as a follow-up once the converter merges? Happy either way — just say which and I'll wire it up along with the JupyText doc cell before marking ready for review.

Add PolicyPuppetryConverter, a pure-template (no-LLM) converter implementing HiddenLayer's Policy Puppetry technique: wraps a prompt in a fabricated policy/config block (xml/json/ini, selectable via policy_format) so models treat it as trusted developer instructions. Optional leetspeak composition. Template ships as a SeedPrompt YAML with a benign {{ prompt }} placeholder. Includes unit tests (8) and registration in prompt_converter/__init__.py. Opened as a draft pending maintainer feedback on the design questions in microsoft#2080. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

romanlutz reviewed Jun 26, 2026

View reviewed changes

kenlacroix force-pushed the feat/policy-puppetry-converter branch from 5c2caf4 to cc17163 Compare June 27, 2026 04:07

kenlacroix force-pushed the feat/policy-puppetry-converter branch from cc17163 to 9921407 Compare June 27, 2026 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] FEAT add Policy Puppetry converter (#2080)#2081

[DRAFT] FEAT add Policy Puppetry converter (#2080)#2081
kenlacroix wants to merge 1 commit into
microsoft:mainfrom
kenlacroix:feat/policy-puppetry-converter

kenlacroix commented Jun 24, 2026

Uh oh!

kenlacroix commented Jun 24, 2026

Uh oh!

romanlutz Jun 26, 2026

Uh oh!

kenlacroix Jun 27, 2026

Uh oh!

romanlutz Jun 26, 2026

Uh oh!

kenlacroix Jun 27, 2026

Uh oh!

romanlutz Jun 27, 2026

Uh oh!

kenlacroix Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kenlacroix commented Jun 24, 2026

Description

Tests and Documentation

Uh oh!

kenlacroix commented Jun 24, 2026

Uh oh!

romanlutz Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

kenlacroix Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

romanlutz Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

kenlacroix Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

romanlutz Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

kenlacroix Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants