Skip to content

[FEATURE] Add adversarial agent mode with default rules and project override editor #40

Description

@Joncallim

Problem Statement

FORGE needs an optional safety-review layer for risky agent actions. Without this, coding agents may run shell commands, access sensitive files, perform destructive filesystem operations, or touch network/remote systems without a separate project-specific reviewer checking the action against user-defined rules.

This should be implemented as a project-level adversarial agent mode, inspired by Goose adversary mode, but adapted for FORGE's coding dashboard and project settings model.

Reference:

https://goose-docs.ai/docs/guides/security/adversary-mode/

Desired Outcome

FORGE supports an optional adversarial agent mode for each project. The mode is off by default. When enabled, it reviews risky tool actions before execution using default rules that can be overridden per project.

The user can enable the mode from Project Settings, review/edit the adversarial rules in a modal, restore default rules, and see adversarial mode status from the Projects dashboard.

User Story

As a FORGE user,
I want to enable an adversarial reviewer for risky agent actions on selected projects,
So that I can give coding agents more autonomy while retaining a safety layer for dangerous, destructive, or privacy-invasive actions.

Requirements

  • Adversarial mode is off by default for every project.
  • New projects get default adversarial rules.
  • Project Settings includes an “Enable adversarial agent” checkbox.
  • Enabling the checkbox opens a modal for reviewing and editing rules.
  • Users can save project-specific adversarial rules.
  • Users can restore default adversarial rules.
  • The Projects dashboard shows adversarial mode status.
  • The reviewer runs before high-risk tool actions when adversarial mode is enabled.
  • The reviewer blocks clearly dangerous actions while avoiding unnecessary interruption to normal coding work.
  • Blocked actions include a specific explanation and a safer alternative where possible.
  • The reviewer returns a structured allow/block decision.

Acceptance Criteria

  • New projects get default adversary.md rules.
  • Adversarial mode is off by default.
  • Project Settings includes an “Enable adversarial agent” checkbox.
  • Enabling the checkbox opens a modal for reviewing/editing rules.
  • User can save project-specific adversarial rules.
  • User can restore default rules.
  • Projects dashboard shows adversarial mode status.
  • Risky tool actions are reviewed when adversarial mode is enabled.
  • Normal coding actions are not blocked unnecessarily.
  • Blocked actions include a clear explanation and safer alternative where possible.
  • Reviewer decisions use a structured allow/block response.

Out of Scope

  • Do not enable adversarial mode by default.
  • Do not block normal coding work unnecessarily.
  • Do not make this a general chat-only feature; it should be integrated into FORGE project settings and project dashboard behaviour.
  • Do not implement unrelated MCP management work in this issue.

Implementation Scope

Large - architecture or workflow change

Technical Notes

Project Settings should include:

Security
[ ] Enable adversarial agent

Suggested default global rules path:

~/Documents/Forge/templates/default-project/adversary.md

Suggested project override path:

~/Documents/Forge/projects/<project>/adversary.md

Suggested project config shape:

{
  "security": {
    "adversarialModeEnabled": false,
    "adversaryRulesPath": "~/Documents/Forge/projects/<project>/adversary.md",
    "usesDefaultAdversaryRules": true
  }
}

Suggested dashboard states:

Adversarial Mode: Off
Adversarial Mode: On · Default rules
Adversarial Mode: On · Custom rules
Last blocked action: npm script attempted to read .env

Suggested modal behaviour:

  • Toggle: Enable adversarial agent
  • Text area for rules
  • Button: Restore default rules
  • Button: Save rules
  • Indicator: Default rules / Custom rules
  • Warning that this mode may block risky agent actions

Suggested high-risk actions requiring review:

  • Shell commands
  • Filesystem writes/deletes outside expected project paths
  • Network calls
  • Package install scripts
  • Git operations that rewrite history or touch remotes
  • Access to secrets or credential-like files

Suggested reviewer decision shape:

{
  "decision": "allow | block",
  "reason": "Short explanation",
  "triggeredRules": ["rule-id-or-summary"],
  "saferAlternative": "Optional safer action"
}

Suggested default adversary.md:

# FORGE Adversarial Agent Rules

You are the adversarial safety reviewer for this FORGE project.

Your job is to review risky agent actions before they execute. You should block actions that are clearly dangerous, destructive, privacy-invasive, or outside the user's likely intent.

Default policy:

1. Block exfiltration of secrets, `.env` files, SSH keys, API tokens, cookies, credentials, or private keys.
2. Block destructive actions outside the project folder unless the user explicitly approves them.
3. Block privilege escalation, sudo usage, chmod/chown of sensitive paths, or system-level mutation unless explicitly approved.
4. Block downloading and executing unknown remote scripts.
5. Block network calls to unknown domains unless the project has a clear reason to contact them.
6. Block attempts to disable security checks, remove logs, hide activity, or bypass this reviewer.
7. Allow normal coding work, including editing project files, running tests, installing ordinary dependencies, local builds, and git operations inside the project.
8. Err on the side of allowing normal development unless the action is clearly risky.
9. When blocking, explain the specific rule triggered and suggest a safer alternative.

Depends on the workspace structure from issue #38. Complements the project dashboard and MCP health work in issue #39.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions