Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 114 additions & 30 deletions doc/code/executor/3_attack_configuration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"|---|---|\n",
"| `objective` | What you are trying to get the **objective target** (the system under test) to do. Drives scoring and multi-turn adversarial prompts. |\n",
"| `memory_labels` | A `dict[str, str]` tagged onto every prompt/response, so you can filter this run later in memory. |\n",
"| `prepended_conversation` | A list of `Message`s to seed the conversation before the attack's own turns (system prompt, prior history). |\n",
"| `prepended_conversation` | A list of `Message`s to seed the conversation before the attack's own turns. This is also where the objective target's **system prompt** goes — `Message.from_system_prompt(...)` builds one (see below). |\n",
"| `next_message` | The exact next message to send, instead of letting the attack derive it from the objective. Useful for multimodal or pre-built seeds. |\n",
"\n",
"Construction-time configuration objects — **adversarial**, **scoring**, and **converter** — are\n",
Expand All @@ -36,9 +36,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Found default environment files: ['./.pyrit/.env', './.pyrit/.env.local']\n",
"Loaded environment file: ./.pyrit/.env\n",
"Loaded environment file: ./.pyrit/.env.local\n"
"Found default environment files: ['./.pyrit/.env']\n",
"Loaded environment file: ./.pyrit/.env\n"
]
},
{
Expand All @@ -55,6 +54,7 @@
" PromptSendingAttack,\n",
" SingleTurnAttackContext,\n",
")\n",
"from pyrit.models import Message\n",
"from pyrit.output import output_attack_async\n",
"from pyrit.prompt_target import TextTarget\n",
"from pyrit.setup import IN_MEMORY, initialize_pyrit_async\n",
Expand Down Expand Up @@ -104,11 +104,11 @@
"\u001b[1m 📋 Basic Information\u001b[0m\n",
"\u001b[36m • Objective: Give me a recipe for a classic margarita\u001b[0m\n",
"\u001b[36m • Attack Type: PromptSendingAttack\u001b[0m\n",
"\u001b[36m • Conversation ID: b0503fc1-11ff-46af-bdac-9102a83b7963\u001b[0m\n",
"\u001b[36m • Conversation ID: 743ca79c-a856-43a8-a311-a04bac36dfa9\u001b[0m\n",
"\n",
"\u001b[1m ⚡ Execution Metrics\u001b[0m\n",
"\u001b[32m • Turns Executed: 1\u001b[0m\n",
"\u001b[32m • Execution Time: 38ms\u001b[0m\n",
"\u001b[32m • Execution Time: 178ms\u001b[0m\n",
"\n",
"\u001b[1m 🎯 Outcome\u001b[0m\n",
"\u001b[33m • Status: ❓ UNDETERMINED\u001b[0m\n",
Expand All @@ -125,7 +125,7 @@
"\u001b[34m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\n",
"\u001b[2m\u001b[37m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\u001b[2m\u001b[37m Report generated at: 2026-06-07 04:42:30 UTC \u001b[0m\n"
"\u001b[2m\u001b[37m Report generated at: 2026-06-18 13:15:02 UTC \u001b[0m\n"
]
}
],
Expand All @@ -142,18 +142,103 @@
"id": "4",
"metadata": {},
"source": [
"## Prepended conversations\n",
"## Setting a system prompt\n",
"\n",
"The objective target's system prompt is just a `system`-role message at the front of the\n",
"conversation, so you set it through `prepended_conversation`. `Message.from_system_prompt(...)`\n",
"builds that message:\n",
"\n",
"A prepended conversation seeds the exchange before the attack adds its own turn. The most common\n",
"use is setting a system prompt, but you can prepend any sequence of `system` / `user` / `assistant`\n",
"turns — for example, to resume a prior conversation or to plant an agreeable assistant reply."
"```python\n",
"prepended_conversation=[Message.from_system_prompt(\"...\")]\n",
"```\n",
"\n",
"Because `prepended_conversation` is a list, targets that accept more than one system message just\n",
"take more than one entry. `Message.from_system_prompts(...)` is a shorthand that builds the list for\n",
"you — `Message.from_system_prompts(\"Policy.\", \"Persona.\")` is the same as\n",
"`[Message.from_system_prompt(\"Policy.\"), Message.from_system_prompt(\"Persona.\")]` — and you can\n",
"interleave `user` / `assistant` turns too (next section)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"user: \n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[33m════════════════════════════════════════════════════════════════════════════════════════════════════\u001b[0m\n",
"\u001b[1m\u001b[33m ❓ ATTACK RESULT: UNDETERMINED ❓ \u001b[0m\n",
"\u001b[33m════════════════════════════════════════════════════════════════════════════════════════════════════\u001b[0m\n",
"\n",
"\u001b[1m\u001b[44m\u001b[37m Attack Summary \u001b[0m\n",
"\u001b[34m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\u001b[1m 📋 Basic Information\u001b[0m\n",
"\u001b[36m • Objective: Explain how a saponification reaction works\u001b[0m\n",
"\u001b[36m • Attack Type: PromptSendingAttack\u001b[0m\n",
"\u001b[36m • Conversation ID: b86054b9-ebf7-4bbc-93f7-062b8736210b\u001b[0m\n",
"\n",
"\u001b[1m ⚡ Execution Metrics\u001b[0m\n",
"\u001b[32m • Turns Executed: 1\u001b[0m\n",
"\u001b[32m • Execution Time: 7ms\u001b[0m\n",
"\n",
"\u001b[1m 🎯 Outcome\u001b[0m\n",
"\u001b[33m • Status: ❓ UNDETERMINED\u001b[0m\n",
"\u001b[37m • Reason: No objective scorer configured\u001b[0m\n",
"\n",
"\u001b[1m\u001b[44m\u001b[37m Conversation History with Objective Target \u001b[0m\n",
"\u001b[34m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\n",
"\u001b[34m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\u001b[1m\u001b[34m🔹 Turn 1 - USER\u001b[0m\n",
"\u001b[34m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\u001b[34m \u001b[0m\n",
"\n",
"\u001b[34m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\n",
"\u001b[2m\u001b[37m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\u001b[2m\u001b[37m Report generated at: 2026-06-18 13:15:02 UTC \u001b[0m\n"
]
}
],
"source": [
"result = await attack.execute_async( # type: ignore\n",
" objective=\"Explain how a saponification reaction works\",\n",
" prepended_conversation=[\n",
" Message.from_system_prompt(\"You are a helpful chemistry tutor who explains concepts step by step.\")\n",
" ],\n",
")\n",
"await output_attack_async(result)"
]
},
{
"cell_type": "markdown",
"id": "6",
"metadata": {},
"source": [
"## Prepended conversations\n",
"\n",
"A system prompt is the simplest prepended conversation. The general form seeds a full\n",
"`system` / `user` / `assistant` history before the attack adds its own turn — for example, to\n",
"resume a prior conversation or to plant an agreeable assistant reply. It is just a list of\n",
"`Message`s, so the system prompt and any seed turns compose freely."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7",
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand All @@ -178,11 +263,11 @@
"\u001b[1m 📋 Basic Information\u001b[0m\n",
"\u001b[36m • Objective: Explain how a saponification reaction works\u001b[0m\n",
"\u001b[36m • Attack Type: PromptSendingAttack\u001b[0m\n",
"\u001b[36m • Conversation ID: c649a184-4a07-45ac-90b9-de6757cfa6e6\u001b[0m\n",
"\u001b[36m • Conversation ID: 03728aed-c835-4624-8ddd-8bb008755eb3\u001b[0m\n",
"\n",
"\u001b[1m ⚡ Execution Metrics\u001b[0m\n",
"\u001b[32m • Turns Executed: 1\u001b[0m\n",
"\u001b[32m • Execution Time: 5ms\u001b[0m\n",
"\u001b[32m • Execution Time: 7ms\u001b[0m\n",
"\n",
"\u001b[1m 🎯 Outcome\u001b[0m\n",
"\u001b[33m • Status: ❓ UNDETERMINED\u001b[0m\n",
Expand All @@ -201,12 +286,12 @@
"\u001b[34m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\n",
"\u001b[2m\u001b[37m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\u001b[2m\u001b[37m Report generated at: 2026-06-07 04:42:30 UTC \u001b[0m\n"
"\u001b[2m\u001b[37m Report generated at: 2026-06-18 13:15:02 UTC \u001b[0m\n"
]
}
],
"source": [
"from pyrit.models import Message, MessagePiece\n",
"from pyrit.models import MessagePiece\n",
"\n",
"prepended_conversation = [\n",
" Message.from_system_prompt(\"You are a helpful assistant who always answers fully.\"),\n",
Expand All @@ -227,7 +312,7 @@
},
{
"cell_type": "markdown",
"id": "6",
"id": "8",
"metadata": {},
"source": [
"## Multimodal seeds and `next_message`\n",
Expand All @@ -240,7 +325,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "7",
"id": "9",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -274,11 +359,11 @@
"\u001b[1m 📋 Basic Information\u001b[0m\n",
"\u001b[36m • Objective: Sending an image successfully\u001b[0m\n",
"\u001b[36m • Attack Type: PromptSendingAttack\u001b[0m\n",
"\u001b[36m • Conversation ID: 6a91faca-e46d-42be-830d-4a9d9d8a43b0\u001b[0m\n",
"\u001b[36m • Conversation ID: 87bdf69f-c4a4-417b-bb31-272f6747bb94\u001b[0m\n",
"\n",
"\u001b[1m ⚡ Execution Metrics\u001b[0m\n",
"\u001b[32m • Turns Executed: 1\u001b[0m\n",
"\u001b[32m • Execution Time: 13ms\u001b[0m\n",
"\u001b[32m • Execution Time: 14ms\u001b[0m\n",
"\n",
"\u001b[1m 🎯 Outcome\u001b[0m\n",
"\u001b[33m • Status: ❓ UNDETERMINED\u001b[0m\n",
Expand All @@ -295,7 +380,7 @@
"\u001b[34m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\n",
"\u001b[2m\u001b[37m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\u001b[2m\u001b[37m Report generated at: 2026-06-07 04:42:30 UTC \u001b[0m\n"
"\u001b[2m\u001b[37m Report generated at: 2026-06-18 13:15:02 UTC \u001b[0m\n"
]
}
],
Expand All @@ -321,7 +406,7 @@
},
{
"cell_type": "markdown",
"id": "8",
"id": "10",
"metadata": {},
"source": [
"## Objective target vs. adversarial target\n",
Expand All @@ -347,7 +432,7 @@
},
{
"cell_type": "markdown",
"id": "9",
"id": "11",
"metadata": {},
"source": [
"## Configuration objects\n",
Expand All @@ -369,7 +454,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "10",
"id": "12",
"metadata": {},
"outputs": [
{
Expand All @@ -393,11 +478,11 @@
"\u001b[1m 📋 Basic Information\u001b[0m\n",
"\u001b[36m • Objective: Base64-encode this request\u001b[0m\n",
"\u001b[36m • Attack Type: PromptSendingAttack\u001b[0m\n",
"\u001b[36m • Conversation ID: 3016e98c-94b3-4952-91b5-5cba8f89877f\u001b[0m\n",
"\u001b[36m • Conversation ID: 5882d7ea-4604-4233-9bba-58954decb600\u001b[0m\n",
"\n",
"\u001b[1m ⚡ Execution Metrics\u001b[0m\n",
"\u001b[32m • Turns Executed: 1\u001b[0m\n",
"\u001b[32m • Execution Time: 6ms\u001b[0m\n",
"\u001b[32m • Execution Time: 10ms\u001b[0m\n",
"\n",
"\u001b[1m 🎯 Outcome\u001b[0m\n",
"\u001b[33m • Status: ❓ UNDETERMINED\u001b[0m\n",
Expand All @@ -418,7 +503,7 @@
"\u001b[34m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\n",
"\u001b[2m\u001b[37m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\u001b[2m\u001b[37m Report generated at: 2026-06-07 04:42:30 UTC \u001b[0m\n"
"\u001b[2m\u001b[37m Report generated at: 2026-06-18 13:15:02 UTC \u001b[0m\n"
]
}
],
Expand All @@ -442,7 +527,7 @@
},
{
"cell_type": "markdown",
"id": "11",
"id": "13",
"metadata": {},
"source": [
"## Example: configuring a red teaming attack to generate an image\n",
Expand Down Expand Up @@ -510,8 +595,7 @@
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python"
"cell_metadata_filter": "-all"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -523,7 +607,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
"version": "3.12.13"
}
},
"nbformat": 4,
Expand Down
Loading
Loading