Skip to content

fix(core): parse non-GitHub SSH remote URLs correctly in registerRepo…#1267

Open
LeonLiuY wants to merge 1 commit intocoleam00:devfrom
LeonLiuY:dev
Open

fix(core): parse non-GitHub SSH remote URLs correctly in registerRepo…#1267
LeonLiuY wants to merge 1 commit intocoleam00:devfrom
LeonLiuY:dev

Conversation

@LeonLiuY
Copy link
Copy Markdown

@LeonLiuY LeonLiuY commented Apr 17, 2026


Summary

  • Problem: registerRepository only normalized git@github.com: SSH URLs. Any other SSH host (GitLab, self-hosted) fell through to a generic path
    split, embedding the full SSH host (e.g. git@gitlab.example.com:org) as the codebase owner.
  • Why it matters: The colon in the stored name caused parseOwnerRepo() to return null (SAFE_NAME regex rejects @/:), making resolveProjectPaths()
    fall back to writing artifacts inside the worktree. The server then couldn't serve them → Web UI always showed "Artifact file not found". The same
    colon in the worktree path also breaks Java/Maven/Gradle classpath resolution on Unix.
  • What changed: registerRepository now matches any git@host:owner/repo SSH URL with a single regex, extracting only owner/repo and discarding the
    host. HTTPS and other URL formats use the existing path-split fallback unchanged.
  • What did not change: Existing codebase records in the DB are not migrated. Repos already registered with the broken name need a one-time manual
    UPDATE (documented in migration steps below).

UX Journey

Before

User CLI / Server DB / Filesystem
──── ──────────── ───────────────
registers repo ───────▶ registerRepository()
(SSH URL) splits on '/' only
owner = "git@host:org" ──▶ name stored as
"git@host:org/repo"
runs workflow ────────▶ resolveProjectPaths()
parseOwnerRepo() ──────▶ returns null (SAFE_NAME fails)
uses fallback path ─────▶ artifacts written to
{worktree}/.archon/artifacts/
clicks artifact ──────▶ GET /api/artifacts/…
parseOwnerRepo() ──────▶ returns null
404 response ◀──────────
sees "Artifact ◀───────
file not found"

(Java builds also fail: colon in worktree path acts as classpath separator)

After

User CLI / Server DB / Filesystem
──── ──────────── ───────────────
registers repo ───────▶ registerRepository()
(SSH URL) [matches git@host:o/r]
owner = "org" ──▶ name stored as
"org/repo" ✓
runs workflow ────────▶ resolveProjectPaths()
parseOwnerRepo() ──────▶ returns {owner, repo} ✓
uses canonical path ────▶ artifacts written to
~/.archon/workspaces/
org/repo/artifacts/ ✓
clicks artifact ──────▶ GET /api/artifacts/…
parseOwnerRepo() ──────▶ returns {owner, repo} ✓
reads file ────────────▶ 200 OK
sees artifact ◀────────

(Java builds work: no colon in worktree path)


Architecture Diagram

Before

registerRepository() ──▶ split('/') only
owner = "git@host:org" ──▶ DB: name = "git@host:org/repo"

┌───────────────┘

resolveProjectPaths() ──▶ parseOwnerRepo("git@host:org/repo") ──▶ null
fallback: {worktree}/.archon/artifacts/

GET /api/artifacts/ ──▶ parseOwnerRepo("git@host:org/repo") ──▶ null ──▶ 404

After

registerRepository() ──▶ [~] regex: /^git@[^:]+:([^/]+)/(.+)$/
owner = "org" ──▶ DB: name = "org/repo"

┌───────────────┘

resolveProjectPaths() ──▶ parseOwnerRepo("org/repo") ──▶ {owner, repo}
canonical: ~/.archon/workspaces/org/repo/artifacts/ ✓

GET /api/artifacts/ ──▶ parseOwnerRepo("org/repo") ──▶ {owner, repo} ──▶ 200 ✓

Connection inventory:

┌─────────────────────┬────────────────┬───────────┬───────────────────────────────────────┐
│ From │ To │ Status │ Notes │
├─────────────────────┼────────────────┼───────────┼───────────────────────────────────────┤
│ registerRepository │ parseOwnerRepo │ unchanged │ now receives valid input │
├─────────────────────┼────────────────┼───────────┼───────────────────────────────────────┤
│ registerRepository │ SSH URL regex │ new │ replaces GitHub-only special case │
├─────────────────────┼────────────────┼───────────┼───────────────────────────────────────┤
│ resolveProjectPaths │ parseOwnerRepo │ unchanged │ no longer hits fallback for SSH repos │
├─────────────────────┼────────────────┼───────────┼───────────────────────────────────────┤
│ GET /api/artifacts │ parseOwnerRepo │ unchanged │ no longer returns 404 for SSH repos │
└─────────────────────┴────────────────┴───────────┴───────────────────────────────────────┘


Label Snapshot

  • Risk: risk: low
  • Size: size: XS
  • Scope: core
  • Module: core:clone

Change Metadata

  • Change type: bug
  • Primary scope: core

Linked Issue

N/A


Validation Evidence

bun run type-check ✓ all packages pass
bun run lint ✓ 0 warnings
bun run format:check ✓ no formatting issues
bun run test ✓ new test added and passes (45/45 in clone.test.ts)
3 pre-existing failures in core:connection (mock
pollution in full suite; pass in isolation — unrelated
to this change)

Evidence: New regression test registerRepository > builds owner/repo name from non-GitHub SSH remote URL without colon in name — written to fail
first, then fixed.


Security Impact

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No

Compatibility / Migration

  • Backward compatible? Yes — HTTPS URLs and git@github.com: SSH URLs are unaffected.
  • Config/env changes? No
  • Database migration needed? Yes, for repos already registered with the broken name.

Upgrade steps for affected repos:

-- Find affected records (name contains '@' or ':')
SELECT id, name FROM remote_agent_codebases WHERE name LIKE '%@%' OR name LIKE '%:%';

-- Fix each one (replace with correct owner/repo):
UPDATE remote_agent_codebases
SET name = '/'
WHERE id = '';


Human Verification

  • Confirmed file at {worktree}/.archon/artifacts/runs/{id}/issues/file.md was the fallback path for an SSH-registered repo
  • Confirmed parseOwnerRepo returned null for the stored name by tracing the SAFE_NAME regex
  • Confirmed the 3 connection test failures are pre-existing by running connection.test.ts in isolation (5 pass, 0 fail)

Verified scenarios:

  • git@gitlab.example.com:myorg/myproject.git → name myorg/myproject ✓
  • git@github.com:acme/backend.git → name acme/backend (existing behaviour preserved) ✓
  • HTTPS URLs unchanged ✓

Not verified: Re-registration flow after applying the DB migration on a live instance.


Side Effects / Blast Radius

  • Affected subsystems: registerRepository only — called on first registration and auto-registration from CLI.
  • Potential unintended effects: Repos with a custom SSH host that intentionally used the full host as a namespace would get a different name on
    re-registration. No known case of this.
  • Guardrails: parseOwnerRepo continues to enforce SAFE_NAME — any name that can't be parsed falls back gracefully (no throw).

Rollback Plan

  • Fast rollback: revert the single commit (git revert 4e4982e)
  • Feature flags: none
  • Observable failure symptoms: codebase names containing @ or : in DB; "Artifact file not found" in Web UI for SSH-registered repos; Java build
    failures with classpath errors mentioning .archon/workspaces/

Risks and Mitigations

  • Risk: Existing DB records with the broken name are not auto-migrated — affected users still see the bug until they run the SQL update.
  • Mitigation: Manual migration is simple and surgical (one UPDATE per affected codebase); documented above.

…sitory

When registering a repo whose remote uses a self-hosted or GitLab SSH URL
(git@host:org/repo), the previous code only normalised git@github.com: URLs.
All other SSH hosts fell through to a generic split-on-slash, leaving the
full SSH host (e.g. git@gitlab.example.com:org) as the owner component.

This caused two downstream failures:

1. The codebase name stored in the DB contained a colon
   (e.g. git@gitlab.example.com:org/repo), which made parseOwnerRepo()
   return null due to its SAFE_NAME regex. resolveProjectPaths() then fell
   back to writing artifacts inside the worktree directory instead of the
   canonical ~/.archon/workspaces/ tree, so the server could never serve
   them and the Web UI always showed "Artifact file not found".

2. The colon in the worktree path acts as a classpath separator on Unix,
   breaking Java/Maven/Gradle test compilation for any project run inside
   that worktree.

Fix: match any git@host:owner/repo SSH URL with a single regex and extract
just the owner/repo portion, discarding the host. HTTPS and other URL forms
continue to use the existing path-split fallback unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

The registerRepository function's remote URL parsing logic is enhanced to support SSH URLs from non-GitHub git hosts (e.g., git@gitlab.example.com:org/repo) using regex pattern matching. Previously, it only handled GitHub SSH URLs by converting them to HTTPS. A corresponding unit test validates the new behavior.

Changes

Cohort / File(s) Summary
Clone Handler Enhancement
packages/core/src/handlers/clone.ts, packages/core/src/handlers/clone.test.ts
Added regex-based parsing for generic SSH URLs (git@<host>:<owner>/<repo>) to extract owner and repo names. Fallback behavior for slash-delimited URLs remains unchanged. New test case validates normalization of non-GitHub SSH URLs without host or special characters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A Rabbit's Ode to Remotes

From GitHub halls to GitLab's gates,
We parse the SSH URLs' fates,
With regex patterns, clean and neat,
Our clone handler's now complete! 🌿✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main fix: parsing non-GitHub SSH remote URLs correctly in registerRepository.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The pull request description is exceptionally comprehensive, covering all required template sections with detailed diagrams, clear problem context, validated solutions, and migration steps.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/core/src/handlers/clone.ts (1)

331-345: Fix looks correct — optional note on GitLab subgroup URLs.

The regex and fallback cleanly address the reported bug: name and ownerName no longer carry the host/colon for non-GitHub SSH remotes, which is exactly what fixes the parseOwnerRepo-returns-null and Java-classpath issues called out in the PR description.

One edge case worth considering (optional): for GitLab-style subgroup SSH remotes such as git@gitlab.example.com:group/subgroup/project, the (.+) capture greedily matches subgroup/project, producing name = "group/subgroup/project". Downstream effects:

  • parseOwnerRepo(name) (at line 349) returns null for a 3-segment name, so projRepo falls back to basename(localPath) while the DB name retains all three segments. The two representations diverge.
  • findCodebaseByName dedup in registerRepoAtPath keys off this exact multi-slash string, so the same repo registered via an HTTPS URL (which goes through the split-on-slash path and yields a 2-segment owner/repo) won't match.

If you want symmetry with the HTTPS fallback (which only keeps the last two path segments), restricting the repo capture to a single segment would do it:

♻️ Optional: mirror HTTPS behaviour for subgroup URLs
-    const sshMatch = /^git@[^:]+:([^/]+)\/(.+)$/.exec(cleaned);
+    // Match any git@host:owner/repo SSH URL. For subgroup-style paths
+    // (e.g. gitlab `group/subgroup/project`), keep only the final segment
+    // as repo — mirrors the HTTPS split-on-slash fallback below.
+    const sshMatch = /^git@[^:]+:(?:.+\/)?([^/]+)\/([^/]+)$/.exec(cleaned);
     if (sshMatch) {
       name = `${sshMatch[1]}/${sshMatch[2]}`;
       ownerName = sshMatch[1];
     }

Happy to defer if you'd prefer to keep the full path in name for subgroup awareness — just wanted to flag the dedup asymmetry.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/handlers/clone.ts` around lines 331 - 345, The SSH regex
and fallback currently allow multi-segment repo paths (e.g.,
git@gitlab:group/subgroup/project) which produces name with three segments and
diverges from the HTTPS fallback (which keeps only last two segments); update
the SSH handling so name/ownerName mirror the HTTPS behavior by restricting the
second capture to a single segment (or otherwise reduce the captured path to the
last two path segments) — modify the regex used in the sshMatch (currently
/^git@[^:]+:([^/]+)\/(.+)$/) to only allow two segments (e.g., change the second
capture to [^/]+) or post-process sshMatch[2] to keep only its basename, and
ensure parseOwnerRepo, findCodebaseByName, registerRepoAtPath behavior will then
see consistent ownerName and name values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/core/src/handlers/clone.ts`:
- Around line 331-345: The SSH regex and fallback currently allow multi-segment
repo paths (e.g., git@gitlab:group/subgroup/project) which produces name with
three segments and diverges from the HTTPS fallback (which keeps only last two
segments); update the SSH handling so name/ownerName mirror the HTTPS behavior
by restricting the second capture to a single segment (or otherwise reduce the
captured path to the last two path segments) — modify the regex used in the
sshMatch (currently /^git@[^:]+:([^/]+)\/(.+)$/) to only allow two segments
(e.g., change the second capture to [^/]+) or post-process sshMatch[2] to keep
only its basename, and ensure parseOwnerRepo, findCodebaseByName,
registerRepoAtPath behavior will then see consistent ownerName and name values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 142b7522-5216-49dc-ab9c-e5e27a7f8acd

📥 Commits

Reviewing files that changed from the base of the PR and between bed36ca and 4e4982e.

📒 Files selected for processing (2)
  • packages/core/src/handlers/clone.test.ts
  • packages/core/src/handlers/clone.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant