-
Notifications
You must be signed in to change notification settings - Fork 0
docs: add GRA-374 experiment artifact #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Gradata
wants to merge
1
commit into
main
Choose a base branch
from
docs/gra-374-experiment-artifact
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
96 changes: 96 additions & 0 deletions
96
Gradata/docs/research/gra-374-multi-cli-install-success-rate.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| # GRA-374: multi_cli_install_success_rate experiment | ||
|
|
||
| Status: evaluated — DISCARD | ||
| Window: 2026-05-12 to 2026-05-19 | ||
| Primary owner: gradata-eng | ||
| Related issues: GRA-374, GRA-55, GRA-1161, GRA-1163 | ||
|
|
||
| ## Objective | ||
|
|
||
| Define and evaluate one quick 7-day experiment for Gradata's multi-agent installer: | ||
| can the installer reach a useful cross-CLI success rate after adding bidirectional | ||
| hook capture for Codex, Hermes, and OpenCode? | ||
|
|
||
| ## One measure | ||
|
|
||
| `multi_cli_install_success_rate` | ||
|
|
||
| Definition: | ||
|
|
||
| - Numerator: CLI targets whose install plus correction-capture smoke test passes. | ||
| - Denominator: CLI targets tested. | ||
| - Target set: Claude Code, Codex, Cursor, Hermes, OpenCode. | ||
| - Success threshold: at least 4 of 5 CLIs pass, i.e. >=80%. | ||
|
|
||
| Formula: | ||
|
|
||
| ```text | ||
| multi_cli_install_success_rate = passing_cli_targets / tested_cli_targets | ||
| ``` | ||
|
|
||
| ## Baseline | ||
|
|
||
| Before GRA-55, only Claude Code had confirmed bidirectional support. | ||
|
|
||
| | CLI | Baseline install/capture state | | ||
| | --- | --- | | ||
| | Claude Code | PASS | | ||
| | Codex | FAIL — capture broken | | ||
| | Cursor | FAIL conservative — unverified | | ||
| | Hermes | FAIL — capture broken | | ||
| | OpenCode | FAIL — capture broken | | ||
|
|
||
| Baseline: 1/5 = 20%. | ||
|
|
||
| ## Measurement runbook | ||
|
|
||
| For each target CLI: | ||
|
|
||
| ```bash | ||
| gradata install --agent <cli> | ||
| # Record exit code. | ||
|
|
||
| # Make one test correction in that CLI. | ||
|
|
||
| gradata list-corrections --last 1h --format json | ||
| # PASS if at least one fresh correction from that CLI appears. | ||
| ``` | ||
|
|
||
| Decision rule: | ||
|
|
||
| | Result | Decision | | ||
| | --- | --- | | ||
| | >=4/5 pass | KEEP — continue multi-CLI installer investment | | ||
| | 2-3/5 pass | PARTIAL — file per-CLI debug issues | | ||
| | <2/5 pass | DISCARD — re-scope before more onboarding investment | | ||
|
|
||
| ## Evaluation result | ||
|
|
||
| Verdict: DISCARD. | ||
|
|
||
| Reason: the prerequisite implementation, GRA-55, was not actually merged. Code audit | ||
| showed Codex, Hermes, and OpenCode still had injection-only hook registration and no | ||
| post-tool/session-end capture path. Cursor remained unverified, so the conservative | ||
| cross-CLI result stayed at the baseline. | ||
|
|
||
| | CLI | Injection | Capture | Evaluation status | | ||
| | --- | --- | --- | --- | | ||
| | Claude Code | yes | yes | PASS | | ||
| | Codex | yes | no | FAIL | | ||
| | Cursor | yes, via MCP | unverified | FAIL conservative | | ||
| | Hermes | yes | no | FAIL | | ||
| | OpenCode | yes | no | FAIL | | ||
|
|
||
| Final result: 1/5 = 20%. | ||
|
|
||
| ## Follow-up | ||
|
|
||
| The hypothesis was not disproven; the prerequisite implementation did not ship. | ||
| Follow-up issue GRA-1163 was filed to re-implement post_tool and session_end hooks | ||
| for Codex, Hermes, and OpenCode with an explicit verification gate before completion. | ||
|
|
||
| ## Paperclip artifact note | ||
|
|
||
| This document exists because the original GRA-374 completion was reverted by the | ||
| artifact monitor: it had a useful in-thread experiment spec but no durable merged | ||
| artifact URL. This file is the durable artifact for the experiment spec and evaluation. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the experiment window to match the stated 7-day duration.
2026-05-12 to 2026-05-19reads as 8 days (inclusive), which conflicts with “quick 7-day experiment.” Please either adjust one date (e.g., end at2026-05-18) or explicitly state exclusive-end semantics.Proposed doc fix
Also applies to: 10-10
🤖 Prompt for AI Agents