fix(core): parse non-GitHub SSH remote URLs correctly in registerRepo…#1267
fix(core): parse non-GitHub SSH remote URLs correctly in registerRepo…#1267LeonLiuY wants to merge 1 commit intocoleam00:devfrom
Conversation
…sitory When registering a repo whose remote uses a self-hosted or GitLab SSH URL (git@host:org/repo), the previous code only normalised git@github.com: URLs. All other SSH hosts fell through to a generic split-on-slash, leaving the full SSH host (e.g. git@gitlab.example.com:org) as the owner component. This caused two downstream failures: 1. The codebase name stored in the DB contained a colon (e.g. git@gitlab.example.com:org/repo), which made parseOwnerRepo() return null due to its SAFE_NAME regex. resolveProjectPaths() then fell back to writing artifacts inside the worktree directory instead of the canonical ~/.archon/workspaces/ tree, so the server could never serve them and the Web UI always showed "Artifact file not found". 2. The colon in the worktree path acts as a classpath separator on Unix, breaking Java/Maven/Gradle test compilation for any project run inside that worktree. Fix: match any git@host:owner/repo SSH URL with a single regex and extract just the owner/repo portion, discarding the host. HTTPS and other URL forms continue to use the existing path-split fallback unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/core/src/handlers/clone.ts (1)
331-345: Fix looks correct — optional note on GitLab subgroup URLs.The regex and fallback cleanly address the reported bug:
nameandownerNameno longer carry the host/colon for non-GitHub SSH remotes, which is exactly what fixes theparseOwnerRepo-returns-null and Java-classpath issues called out in the PR description.One edge case worth considering (optional): for GitLab-style subgroup SSH remotes such as
git@gitlab.example.com:group/subgroup/project, the(.+)capture greedily matchessubgroup/project, producingname = "group/subgroup/project". Downstream effects:
parseOwnerRepo(name)(at line 349) returnsnullfor a 3-segment name, soprojRepofalls back tobasename(localPath)while the DBnameretains all three segments. The two representations diverge.findCodebaseByNamededup inregisterRepoAtPathkeys off this exact multi-slash string, so the same repo registered via an HTTPS URL (which goes through the split-on-slash path and yields a 2-segmentowner/repo) won't match.If you want symmetry with the HTTPS fallback (which only keeps the last two path segments), restricting the repo capture to a single segment would do it:
♻️ Optional: mirror HTTPS behaviour for subgroup URLs
- const sshMatch = /^git@[^:]+:([^/]+)\/(.+)$/.exec(cleaned); + // Match any git@host:owner/repo SSH URL. For subgroup-style paths + // (e.g. gitlab `group/subgroup/project`), keep only the final segment + // as repo — mirrors the HTTPS split-on-slash fallback below. + const sshMatch = /^git@[^:]+:(?:.+\/)?([^/]+)\/([^/]+)$/.exec(cleaned); if (sshMatch) { name = `${sshMatch[1]}/${sshMatch[2]}`; ownerName = sshMatch[1]; }Happy to defer if you'd prefer to keep the full path in
namefor subgroup awareness — just wanted to flag the dedup asymmetry.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/handlers/clone.ts` around lines 331 - 345, The SSH regex and fallback currently allow multi-segment repo paths (e.g., git@gitlab:group/subgroup/project) which produces name with three segments and diverges from the HTTPS fallback (which keeps only last two segments); update the SSH handling so name/ownerName mirror the HTTPS behavior by restricting the second capture to a single segment (or otherwise reduce the captured path to the last two path segments) — modify the regex used in the sshMatch (currently /^git@[^:]+:([^/]+)\/(.+)$/) to only allow two segments (e.g., change the second capture to [^/]+) or post-process sshMatch[2] to keep only its basename, and ensure parseOwnerRepo, findCodebaseByName, registerRepoAtPath behavior will then see consistent ownerName and name values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/core/src/handlers/clone.ts`:
- Around line 331-345: The SSH regex and fallback currently allow multi-segment
repo paths (e.g., git@gitlab:group/subgroup/project) which produces name with
three segments and diverges from the HTTPS fallback (which keeps only last two
segments); update the SSH handling so name/ownerName mirror the HTTPS behavior
by restricting the second capture to a single segment (or otherwise reduce the
captured path to the last two path segments) — modify the regex used in the
sshMatch (currently /^git@[^:]+:([^/]+)\/(.+)$/) to only allow two segments
(e.g., change the second capture to [^/]+) or post-process sshMatch[2] to keep
only its basename, and ensure parseOwnerRepo, findCodebaseByName,
registerRepoAtPath behavior will then see consistent ownerName and name values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 142b7522-5216-49dc-ab9c-e5e27a7f8acd
📒 Files selected for processing (2)
packages/core/src/handlers/clone.test.tspackages/core/src/handlers/clone.ts
Summary
split, embedding the full SSH host (e.g. git@gitlab.example.com:org) as the codebase owner.
fall back to writing artifacts inside the worktree. The server then couldn't serve them → Web UI always showed "Artifact file not found". The same
colon in the worktree path also breaks Java/Maven/Gradle classpath resolution on Unix.
host. HTTPS and other URL formats use the existing path-split fallback unchanged.
UPDATE (documented in migration steps below).
UX Journey
Before
User CLI / Server DB / Filesystem
──── ──────────── ───────────────
registers repo ───────▶ registerRepository()
(SSH URL) splits on '/' only
owner = "git@host:org" ──▶ name stored as
"git@host:org/repo"
runs workflow ────────▶ resolveProjectPaths()
parseOwnerRepo() ──────▶ returns null (SAFE_NAME fails)
uses fallback path ─────▶ artifacts written to
{worktree}/.archon/artifacts/
clicks artifact ──────▶ GET /api/artifacts/…
parseOwnerRepo() ──────▶ returns null
404 response ◀──────────
sees "Artifact ◀───────
file not found"
(Java builds also fail: colon in worktree path acts as classpath separator)
After
User CLI / Server DB / Filesystem
──── ──────────── ───────────────
registers repo ───────▶ registerRepository()
(SSH URL) [matches git@host:o/r]
owner = "org" ──▶ name stored as
"org/repo" ✓
runs workflow ────────▶ resolveProjectPaths()
parseOwnerRepo() ──────▶ returns {owner, repo} ✓
uses canonical path ────▶ artifacts written to
~/.archon/workspaces/
org/repo/artifacts/ ✓
clicks artifact ──────▶ GET /api/artifacts/…
parseOwnerRepo() ──────▶ returns {owner, repo} ✓
reads file ────────────▶ 200 OK
sees artifact ◀────────
(Java builds work: no colon in worktree path)
Architecture Diagram
Before
registerRepository() ──▶ split('/') only
owner = "git@host:org" ──▶ DB: name = "git@host:org/repo"
│
┌───────────────┘
▼
resolveProjectPaths() ──▶ parseOwnerRepo("git@host:org/repo") ──▶ null
fallback: {worktree}/.archon/artifacts/
GET /api/artifacts/ ──▶ parseOwnerRepo("git@host:org/repo") ──▶ null ──▶ 404
After
registerRepository() ──▶ [~] regex: /^git@[^:]+:([^/]+)/(.+)$/
owner = "org" ──▶ DB: name = "org/repo"
│
┌───────────────┘
▼
resolveProjectPaths() ──▶ parseOwnerRepo("org/repo") ──▶ {owner, repo}
canonical: ~/.archon/workspaces/org/repo/artifacts/ ✓
GET /api/artifacts/ ──▶ parseOwnerRepo("org/repo") ──▶ {owner, repo} ──▶ 200 ✓
Connection inventory:
┌─────────────────────┬────────────────┬───────────┬───────────────────────────────────────┐
│ From │ To │ Status │ Notes │
├─────────────────────┼────────────────┼───────────┼───────────────────────────────────────┤
│ registerRepository │ parseOwnerRepo │ unchanged │ now receives valid input │
├─────────────────────┼────────────────┼───────────┼───────────────────────────────────────┤
│ registerRepository │ SSH URL regex │ new │ replaces GitHub-only special case │
├─────────────────────┼────────────────┼───────────┼───────────────────────────────────────┤
│ resolveProjectPaths │ parseOwnerRepo │ unchanged │ no longer hits fallback for SSH repos │
├─────────────────────┼────────────────┼───────────┼───────────────────────────────────────┤
│ GET /api/artifacts │ parseOwnerRepo │ unchanged │ no longer returns 404 for SSH repos │
└─────────────────────┴────────────────┴───────────┴───────────────────────────────────────┘
Label Snapshot
Change Metadata
Linked Issue
N/A
Validation Evidence
bun run type-check ✓ all packages pass
bun run lint ✓ 0 warnings
bun run format:check ✓ no formatting issues
bun run test ✓ new test added and passes (45/45 in clone.test.ts)
3 pre-existing failures in core:connection (mock
pollution in full suite; pass in isolation — unrelated
to this change)
Evidence: New regression test registerRepository > builds owner/repo name from non-GitHub SSH remote URL without colon in name — written to fail
first, then fixed.
Security Impact
Compatibility / Migration
Upgrade steps for affected repos:
-- Find affected records (name contains '@' or ':')
SELECT id, name FROM remote_agent_codebases WHERE name LIKE '%@%' OR name LIKE '%:%';
-- Fix each one (replace with correct owner/repo):
UPDATE remote_agent_codebases
SET name = '/'
WHERE id = '';
Human Verification
Verified scenarios:
Not verified: Re-registration flow after applying the DB migration on a live instance.
Side Effects / Blast Radius
re-registration. No known case of this.
Rollback Plan
failures with classpath errors mentioning .archon/workspaces/
Risks and Mitigations