Add benchmark-integrity audit issue pack with evidence-backed remediation criteria by Copilot · Pull Request #6 · VoDaiLocz/Lock-ART.

Copilot · 2026-03-21T06:30:55Z

This PR adds a strict benchmark audit artifact focused on anti-mimicry evaluation quality. It translates the identified risk areas into ready-to-file GitHub issues with concrete repo evidence and measurable upgrade criteria.

Scope: benchmark audit issue set
- Added docs/BENCHMARK_AUDIT_ISSUES.md with 7 issue drafts covering:
  - proxy metric validity
  - benchmark framing accuracy
  - missing ground-truth LoRA/DreamBooth evaluation
  - robustness transform gaps (incl. JPEG/crop)
  - dataset split leakage risk
  - reproducibility metadata gaps
  - weak comparative methodology
Issue format standardization
- Each draft includes:
  - Title
  - What is wrong
  - Why it misleads
  - Evidence from repo (file/line references)
  - Proposed benchmark upgrade
  - Acceptance criteria
  - Labels
Traceability note
- Added a header note to re-verify line references at filing time to account for line drift across commits.

Example excerpt format used in the new document:

## 4) Robustness suite misses common purification transforms

**Title**
`Robustness testing gap: no JPEG compression or crop transforms in protection-readability benchmark`

**Evidence from the repo**
- Current transform suite: identity, blur, resize(0.75), resize(0.5): `src/auralock/core/style.py:217-231`

**Acceptance criteria**
- [ ] Transform suite includes JPEG and crop variants by name.
- [ ] Protection report contains per-transform metrics for all added transforms.

Original prompt

This section details on the original issue you should resolve

<issue_title>audit benchmark</issue_title>
<issue_description>Analyze this repository specifically for benchmark integrity and anti-mimicry evaluation quality.

Create GitHub issues for:

proxy metrics that may not reflect real anti-mimicry strength

misleading benchmark framing

missing ground-truth LoRA/DreamBooth evaluation

weak robustness testing (resize, JPEG, blur, crop)

dataset split risks

missing reproducibility details

weak comparison methodology

For each issue, include:

Title

What is wrong with the benchmark

Why it can mislead users

Evidence from the repo

Proposed benchmark upgrade

Acceptance criteria

Labels

Be strict and technical.
</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes audit benchmark #5

⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.

Co-authored-by: VoDaiLocz <88762074+VoDaiLocz@users.noreply.github.com> Agent-Logs-Url: https://github.com/VoDaiLocz/Lock-ART./sessions/f7b6d45f-46ec-4c09-af7e-7db2610496d6

Initial plan

e09b630

Copilot AI assigned Copilot and VoDaiLocz Mar 21, 2026

Copilot started work on behalf of VoDaiLocz March 21, 2026 06:30 View session

Copilot AI and others added 2 commits March 21, 2026 06:34

docs: add strict benchmark integrity issue drafts

cfd89ab

Co-authored-by: VoDaiLocz <88762074+VoDaiLocz@users.noreply.github.com> Agent-Logs-Url: https://github.com/VoDaiLocz/Lock-ART./sessions/f7b6d45f-46ec-4c09-af7e-7db2610496d6

docs: add line-reference verification note to benchmark audit draft

19037a3

Co-authored-by: VoDaiLocz <88762074+VoDaiLocz@users.noreply.github.com> Agent-Logs-Url: https://github.com/VoDaiLocz/Lock-ART./sessions/f7b6d45f-46ec-4c09-af7e-7db2610496d6

Copilot AI changed the title ~~[WIP] Create GitHub issues for benchmark integrity evaluation~~ Add benchmark-integrity audit issue pack with evidence-backed remediation criteria Mar 21, 2026

Copilot AI requested a review from VoDaiLocz March 21, 2026 06:40

Copilot finished work on behalf of VoDaiLocz March 21, 2026 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark-integrity audit issue pack with evidence-backed remediation criteria#6

Add benchmark-integrity audit issue pack with evidence-backed remediation criteria#6
Copilot wants to merge 3 commits into
mainfrom
copilot/audit-benchmark-issues

Copilot AI commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 21, 2026 •

edited

Loading