-
Notifications
You must be signed in to change notification settings - Fork 224
Design: Managing forked WASM dependencies #850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
pgrayy
wants to merge
3
commits into
strands-agents:main
Choose a base branch
from
pgrayy:designs/fork-management
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+395
−0
Draft
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,174 @@ | ||
| # Managing Forked Dependencies | ||
|
|
||
| ## Overview | ||
|
|
||
| The WASM integration in sdk-typescript depends on modified versions of several upstream repositories: | ||
|
|
||
| - **wasmtime** (Rust) — the WASM runtime | ||
| - **wasmtime-py** (Python) — Python bindings for wasmtime | ||
| - **componentize-js** (JavaScript) — JS component model tooling | ||
| - **jco** (JavaScript/Rust) — JS toolchain for WebAssembly Components | ||
|
|
||
| This document proposes an approach for housing these forks within the sdk-typescript repo to streamline development and CI/CD. | ||
|
|
||
| ## Problem | ||
|
|
||
| Upstream WASM tooling does not yet support all the features we need for the Python-via-WASM bindings. Certain capabilities (e.g., async streaming) require workarounds that live in forked versions of these dependencies. Today, those forks exist as separate repositories, which creates friction: | ||
|
|
||
| - Changes that span sdk-typescript and a fork require coordinating across multiple repos, branches, and PRs. | ||
| - CI/CD pipelines must clone and build from multiple sources, adding complexity and fragility to the build. | ||
| - Contributors need to discover and set up the correct fork versions manually. | ||
| - There is no single place to see the full picture of what's modified and why. | ||
|
|
||
| Bringing the forks into sdk-typescript simplifies the development loop: one clone, one set of commits, one CI pipeline. | ||
|
|
||
| ## Solution | ||
|
|
||
| Use **git subtrees** to import each fork's source directly into sdk-typescript. | ||
|
|
||
| Subtrees copy a remote repo's file tree into a subdirectory of the host repo. After import, the files are regular tracked content. No special tooling is required beyond the `git subtree` command, which ships with git. | ||
|
|
||
| Benefits of this approach: | ||
|
|
||
| - **Single clone** — contributors get everything with a standard `git clone`. No extra steps. | ||
| - **Atomic commits** — a change to sdk-typescript and a fork workaround land in one commit/PR. | ||
| - **Upstream sync** — `git subtree pull` merges new upstream changes; `git subtree push` extracts patches back when ready to upstream. | ||
| - **Simpler CI** — one checkout, one pipeline. No submodule initialization, no cross-repo coordination. | ||
|
|
||
| The resulting layout: | ||
|
|
||
| ``` | ||
| sdk-typescript/ | ||
| ├── strands-ts/ # existing SDK package | ||
| ├── strands-wasm/ # existing WASM build tooling | ||
| ├── strands-py-wasm/ # existing Python bindings | ||
| ├── forks/ | ||
| │ ├── wasmtime/ # subtree | ||
| │ ├── wasmtime-py/ # subtree | ||
| │ ├── componentize-js/ # subtree | ||
| │ └── jco/ # subtree | ||
| ├── package.json # workspace config | ||
| ``` | ||
|
|
||
| ## Metrics | ||
|
|
||
| sdk-typescript today clones at about 4 MB over the network (8.6 MB total on disk with full history, 408 files). After `npm ci`, disk usage grows to ~591 MB, almost entirely from `node_modules/` (~548 MB). The source code itself is small. | ||
|
|
||
| Each fork was measured via shallow clone (`--depth=1`) against the current main branch: | ||
|
|
||
| | Fork | Network transfer | Disk | Files | Notes | | ||
| |------|-----------------|------|-------|-------| | ||
| | wasmtime | 26 MB | 86 MB | 6,700 | Largest: `crates/` 41 MB, `cranelift/` 21 MB, `tests/` 17 MB | | ||
| | wasmtime-py | 304 KB | 1 MB | 91 | Negligible | | ||
| | componentize-js | 352 KB | 1.5 MB | 265 | Negligible | | ||
| | jco | 80 MB | 413 MB | 1,060 | 401 MB is `.wasm` test fixtures; source is ~12 MB | | ||
|
|
||
| wasmtime and jco carry significant weight in directories we don't need (test fixtures, docs, benchmarks). Maintaining a **slim branch** in each fork strips these before import: | ||
|
|
||
| | Fork | After stripping | Files | Reduction | | ||
| |------|----------------|-------|-----------| | ||
| | wasmtime (drop tests, docs, benches, examples) | ~60 MB | ~4,000 | ~30% | | ||
| | jco (drop test fixtures) | ~12 MB | ~650 | ~97% | | ||
| | wasmtime-py | No change needed | 91 | Already tiny | | ||
| | componentize-js | No change needed | 265 | Already tiny | | ||
|
|
||
| A slim branch is regenerated from main whenever upstream is updated: | ||
|
|
||
| ```bash | ||
| # In the fork repo: | ||
| git checkout main | ||
| git pull upstream main | ||
| git checkout -B slim | ||
| git rm -rf tests/ docs/ benches/ examples/ | ||
| git commit -m "chore: strip non-essential dirs for slim branch" | ||
| git push origin slim --force | ||
| ``` | ||
|
|
||
| Then pulled into sdk-typescript with `--squash` to collapse upstream history into a single merge commit: | ||
|
|
||
| ```bash | ||
| git subtree add --prefix=forks/wasmtime <fork-url> slim --squash # initial | ||
| git subtree pull --prefix=forks/wasmtime <fork-url> slim --squash # updates | ||
| ``` | ||
|
|
||
| With all four forks imported (slim where applicable, squashed), the projected clone size rises from ~4 MB to roughly 30-35 MB. This is still modest compared to the ~548 MB that `npm ci` adds. | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Making Changes to Fork Code | ||
|
|
||
| Edit files directly and commit normally: | ||
|
|
||
| ```bash | ||
| vim forks/wasmtime/crates/something/src/lib.rs | ||
| git add forks/wasmtime/ | ||
| git commit -m "fix: workaround for async streaming in wasmtime" | ||
| ``` | ||
|
|
||
| ### Pulling Upstream Changes | ||
|
|
||
| Update the slim branch in the fork repo first (scripted), then pull into sdk-typescript: | ||
|
|
||
| ```bash | ||
| git subtree pull --prefix=forks/wasmtime <fork-url> slim --squash | ||
| ``` | ||
|
|
||
| Conflicts with local patches are resolved as a normal merge. | ||
|
|
||
| ### Pushing Changes Back Upstream | ||
|
|
||
| When ready to contribute fixes back: | ||
|
|
||
| ```bash | ||
| git subtree push --prefix=forks/wasmtime <fork-url> my-upstream-pr-branch | ||
| ``` | ||
|
|
||
| This extracts only the commits that touched `forks/wasmtime/`, rewrites their paths (strips the prefix), and pushes them to the fork repo. From there, open a PR on the upstream project. | ||
|
|
||
| ## Alternatives | ||
|
|
||
| ### Submodules | ||
|
|
||
| Each fork lives in its own GitHub repo. sdk-typescript references them at pinned commits via `.gitmodules`. Each submodule is a pointer (SHA) to a specific commit in the fork repo. Contributors must use `git clone --recurse-submodules` or run `git submodule update --init` after cloning. | ||
|
|
||
| **Pros:** | ||
|
|
||
| - Each fork retains full independence (own history, branches, tags, CI). | ||
| - Pinned to exact commits for reproducible builds. | ||
| - No increase to sdk-typescript's pack size. | ||
| - Natural for upstreaming (fork repos are first-class). | ||
|
|
||
| **Cons:** | ||
|
|
||
| - Known developer experience pain: forgetting `--recurse-submodules` is common. | ||
| - PRs that span sdk-typescript + a fork require coordinating across repos. | ||
| - Detached HEAD state inside submodules is confusing. | ||
| - Nested submodules (wasmtime has its own) compound complexity. | ||
| - CI must be configured to initialize submodules. | ||
|
|
||
| ### Git LFS | ||
|
|
||
| Replace large binary files (e.g., `.wasm` test fixtures) with small pointer files (~130 bytes each). Actual content lives on a separate LFS server and is fetched lazily on checkout. Supported on our GitHub Enterprise plan (5 GB storage, 5 GB/month bandwidth included). | ||
|
|
||
| **Pros:** | ||
|
|
||
| - Dramatically reduces clone size for repos with large binaries. | ||
| - Transparent to most workflows once set up. | ||
|
|
||
| **Cons:** | ||
|
|
||
| - Only helps with binary files; doesn't reduce file count from source directories (tests, docs, etc.). | ||
| - Adds infrastructure dependency (LFS server must be available). | ||
| - Some tooling (mirrors, forks, CI runners) doesn't handle LFS transparently. | ||
| - Switching files in/out of LFS after the fact is messy. | ||
| - Doesn't address the core problem of cross-repo coordination. | ||
|
|
||
| ## References | ||
|
|
||
| - [Git Subtree Documentation](https://git-scm.com/book/en/v2/Git-Tools-Advanced-Merging#_subtree_merge) | ||
| - [Git Submodule Documentation](https://git-scm.com/book/en/v2/Git-Tools-Submodules) | ||
| - [Git LFS](https://git-lfs.com/) | ||
| - [bytecodealliance/wasmtime](https://github.com/bytecodealliance/wasmtime) | ||
| - [bytecodealliance/wasmtime-py](https://github.com/bytecodealliance/wasmtime-py) | ||
| - [bytecodealliance/ComponentizeJS](https://github.com/bytecodealliance/ComponentizeJS) | ||
| - [bytecodealliance/jco](https://github.com/bytecodealliance/jco) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest con to me is that this source code ends up in the history forever; e.g. there's no way to strip it later without rewriting history. Whereas a submodule being a pointer [effectively] means the overhead is trivial once we remove them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a feature to me though no? At one point we vended these dependencies and getting old builds working means simply checking out the old history. Hermetic builds.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good call out. It would permanently alter the size of sdk-typescript. Based on the metrics, it would end up being roughly 30MB (calculated from the WASM repo .git sizes) of dead weight once removed.
To mitigate, we could try slimming down wasmtime and jco even more than what was proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, what Patrick said; it's a permanent size bump for what is/should be a temporary solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So no subtree? Commit your fork into the repo without history, and we can override from your fork if we need to downstream anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience submodules are just incredibly clunky and make it difficult to use git tools to split history or something. Memory is fuzzy here. My understanding is that they'd poison our repo and be hard to remove but don't take my word for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah effectively
If you're doing active development and changing things often, this is true - it's why I don't suggest it as an alternative to a monorepository. But for infrequent changes or "pinning" at a specific commit, it works pretty well.
AFAIK there's no poisoning as it's more like a file/pointer more than anything. Deleting them is a couple of configs, whereas subtrees you have the entire history as part of your repo forever.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore me, conflating LFS and submodules for some reason. Need a nap.