Skip to content

soxoj/gitcolombo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gitcolombo

Gitcolombo

OSINT tool that extracts identities — names, emails, and links between seemingly unrelated accounts — from git repositories and GitHub.

  • Python CLI (gitcolombo.py) — clones repos, walks git log, and can call the GitHub API for richer signals.
  • Web version (gitcolombo.html) — a single static HTML file; open it in a browser and query the GitHub API directly, no install.

For the full breakdown of where each email/name comes from (PGP keys, public events, commit search, commit-message trailers, etc.) see docs.md.

Web version

Hosted at https://gitcolombo.soxoj.com — or open gitcolombo.html locally. A single static HTML file that queries the GitHub API straight from your browser; no install, no backend.

Gitcolombo web version

Install

Requires Python 3.10+ and a working git binary. No third-party Python dependencies.

pip install gitcolombo

Or from source:

git clone https://github.com/Soxoj/gitcolombo
cd gitcolombo
pip install -e .

Usage

# from any git URL
gitcolombo -u https://github.com/Soxoj/maigret

# from a local directory, recursively
gitcolombo -d ./maigret -r

# clone and scan every public repo of a GitHub user/org
gitcolombo --nickname octocat

# API-only: find emails for a GitHub username without cloning
gitcolombo --search Soxoj

# change where remote repos get cloned (default: ./repos)
gitcolombo -u https://github.com/Soxoj/maigret --repos-dir ./clones

python -m gitcolombo works equivalently if you'd rather not put the script on $PATH.

Remote repositories are cloned into ./repos/ by default; override with --repos-dir. For batch cloning from GitLab and Bitbucket groups use ghorg.

Output

  • Per-person details: name, email, author/committer counts, and other identities that may belong to the same person.
  • Emails that share a name.
  • Different names tied to the same email.
  • General statistics across the scanned repos.

Why it works

Developers often commit with one identity (e.g. work account), then switch to another (e.g. personal account) and run git commit --amend, forgetting that this rewrites the committer but leaves the original author in place. The two roles drift apart, and that mismatch is exactly what gitcolombo correlates.

Short explainer on author vs. committer: https://stackoverflow.com/questions/18750808/difference-between-author-and-committer-in-git

Testing

Stdlib-only test suite — no third-party dependencies. From the repo root (after pip install -e .):

python3 -m unittest test_gitcolombo -v

The end-to-end test creates a real git repository in a temp directory, so a working git binary is required (the test is skipped if git is missing).

Tests run on every push and pull request via GitHub Actions (.github/workflows/tests.yml) across Python 3.10–3.13.

Further reading

Roadmap

  • Total statistics for repos in a directory
  • GitHub support: clone all repos from account/group
  • GitHub support: extract links to accounts from commit info
  • GitHub support: API pagination
  • Exclude "system" accounts (e.g. noreply@github.com, @users.noreply.github.com)
  • Reverse mapping email → names (currently only name → emails)
  • Probabilistic graph links based on shared names/emails and Levenshtein distance
  • Other popular git platforms: GitLab, Bitbucket