-
Notifications
You must be signed in to change notification settings - Fork 224
Design: Managing forked WASM dependencies #850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,174 @@ | ||
| # Managing Forked Dependencies | ||
|
|
||
| ## Overview | ||
|
|
||
| The WASM integration in sdk-typescript depends on modified versions of several upstream repositories: | ||
|
|
||
| - **wasmtime** (Rust) — the WASM runtime | ||
| - **wasmtime-py** (Python) — Python bindings for wasmtime | ||
| - **componentize-js** (JavaScript) — JS component model tooling | ||
| - **jco** (JavaScript/Rust) — JS toolchain for WebAssembly Components | ||
|
|
||
| This document proposes an approach for housing these forks within the sdk-typescript repo to streamline development and CI/CD. | ||
|
|
||
| ## Problem | ||
|
|
||
| Upstream WASM tooling does not yet support all the features we need for the Python-via-WASM bindings. Certain capabilities (e.g., async streaming) require workarounds that live in forked versions of these dependencies. Today, those forks exist as separate repositories, which creates friction: | ||
|
|
||
| - Changes that span sdk-typescript and a fork require coordinating across multiple repos, branches, and PRs. | ||
| - CI/CD pipelines must clone and build from multiple sources, adding complexity and fragility to the build. | ||
| - Contributors need to discover and set up the correct fork versions manually. | ||
| - There is no single place to see the full picture of what's modified and why. | ||
|
|
||
| Bringing the forks into sdk-typescript simplifies the development loop: one clone, one set of commits, one CI pipeline. | ||
|
|
||
| ## Solution | ||
|
|
||
| Use **git subtrees** to import each fork's source directly into sdk-typescript. | ||
|
|
||
| Subtrees copy a remote repo's file tree into a subdirectory of the host repo. After import, the files are regular tracked content. No special tooling is required beyond the `git subtree` command, which ships with git. | ||
|
|
||
| Benefits of this approach: | ||
|
|
||
| - **Single clone** — contributors get everything with a standard `git clone`. No extra steps. | ||
| - **Atomic commits** — a change to sdk-typescript and a fork workaround land in one commit/PR. | ||
| - **Upstream sync** — `git subtree pull` merges new upstream changes; `git subtree push` extracts patches back when ready to upstream. | ||
| - **Simpler CI** — one checkout, one pipeline. No submodule initialization, no cross-repo coordination. | ||
|
|
||
| The resulting layout: | ||
|
|
||
| ``` | ||
| sdk-typescript/ | ||
| ├── strands-ts/ # existing SDK package | ||
| ├── strands-wasm/ # existing WASM build tooling | ||
| ├── strands-py-wasm/ # existing Python bindings | ||
| ├── forks/ | ||
| │ ├── wasmtime/ # subtree | ||
| │ ├── wasmtime-py/ # subtree | ||
| │ ├── componentize-js/ # subtree | ||
| │ └── jco/ # subtree | ||
| ├── package.json # workspace config | ||
| ``` | ||
|
|
||
| ## Metrics | ||
|
|
||
| sdk-typescript today clones at about 4 MB over the network (8.6 MB total on disk with full history, 408 files). After `npm ci`, disk usage grows to ~591 MB, almost entirely from `node_modules/` (~548 MB). The source code itself is small. | ||
|
|
||
| Each fork was measured via shallow clone (`--depth=1`) against the current main branch: | ||
|
|
||
| | Fork | Network transfer | Disk | Files | Notes | | ||
| |------|-----------------|------|-------|-------| | ||
| | wasmtime | 26 MB | 86 MB | 6,700 | Largest: `crates/` 41 MB, `cranelift/` 21 MB, `tests/` 17 MB | | ||
| | wasmtime-py | 304 KB | 1 MB | 91 | Negligible | | ||
| | componentize-js | 352 KB | 1.5 MB | 265 | Negligible | | ||
| | jco | 80 MB | 413 MB | 1,060 | 401 MB is `.wasm` test fixtures; source is ~12 MB | | ||
|
|
||
| wasmtime and jco carry significant weight in directories we don't need (test fixtures, docs, benchmarks). Maintaining a **slim branch** in each fork strips these before import: | ||
|
|
||
| | Fork | After stripping | Files | Reduction | | ||
| |------|----------------|-------|-----------| | ||
| | wasmtime (drop tests, docs, benches, examples) | ~60 MB | ~4,000 | ~30% | | ||
| | jco (drop test fixtures) | ~12 MB | ~650 | ~97% | | ||
| | wasmtime-py | No change needed | 91 | Already tiny | | ||
| | componentize-js | No change needed | 265 | Already tiny | | ||
|
|
||
| A slim branch is regenerated from main whenever upstream is updated: | ||
|
|
||
| ```bash | ||
| # In the fork repo: | ||
| git checkout main | ||
| git pull upstream main | ||
| git checkout -B slim | ||
| git rm -rf tests/ docs/ benches/ examples/ | ||
| git commit -m "chore: strip non-essential dirs for slim branch" | ||
| git push origin slim --force | ||
| ``` | ||
|
|
||
| Then pulled into sdk-typescript with `--squash` to collapse upstream history into a single merge commit: | ||
|
|
||
| ```bash | ||
| git subtree add --prefix=forks/wasmtime <fork-url> slim --squash # initial | ||
| git subtree pull --prefix=forks/wasmtime <fork-url> slim --squash # updates | ||
| ``` | ||
|
|
||
| With all four forks imported (slim where applicable, squashed), the projected clone size rises from ~4 MB to roughly 30-35 MB. This is still modest compared to the ~548 MB that `npm ci` adds. | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Making Changes to Fork Code | ||
|
|
||
| Edit files directly and commit normally: | ||
|
|
||
| ```bash | ||
| vim forks/wasmtime/crates/something/src/lib.rs | ||
| git add forks/wasmtime/ | ||
| git commit -m "fix: workaround for async streaming in wasmtime" | ||
| ``` | ||
|
|
||
| ### Pulling Upstream Changes | ||
|
|
||
| Update the slim branch in the fork repo first (scripted), then pull into sdk-typescript: | ||
|
|
||
| ```bash | ||
| git subtree pull --prefix=forks/wasmtime <fork-url> slim --squash | ||
| ``` | ||
|
|
||
| Conflicts with local patches are resolved as a normal merge. | ||
|
|
||
| ### Pushing Changes Back Upstream | ||
|
|
||
| When ready to contribute fixes back: | ||
|
|
||
| ```bash | ||
| git subtree push --prefix=forks/wasmtime <fork-url> my-upstream-pr-branch | ||
| ``` | ||
|
|
||
| This extracts only the commits that touched `forks/wasmtime/`, rewrites their paths (strips the prefix), and pushes them to the fork repo. From there, open a PR on the upstream project. | ||
|
|
||
| ## Alternatives | ||
|
|
||
| ### Submodules | ||
|
|
||
| Each fork lives in its own GitHub repo. sdk-typescript references them at pinned commits via `.gitmodules`. Each submodule is a pointer (SHA) to a specific commit in the fork repo. Contributors must use `git clone --recurse-submodules` or run `git submodule update --init` after cloning. | ||
|
|
||
| **Pros:** | ||
|
|
||
| - Each fork retains full independence (own history, branches, tags, CI). | ||
| - Pinned to exact commits for reproducible builds. | ||
| - No increase to sdk-typescript's pack size. | ||
| - Natural for upstreaming (fork repos are first-class). | ||
|
|
||
| **Cons:** | ||
|
|
||
| - Known developer experience pain: forgetting `--recurse-submodules` is common. | ||
| - PRs that span sdk-typescript + a fork require coordinating across repos. | ||
| - Detached HEAD state inside submodules is confusing. | ||
| - Nested submodules (wasmtime has its own) compound complexity. | ||
| - CI must be configured to initialize submodules. | ||
|
|
||
| ### Git LFS | ||
|
|
||
| Replace large binary files (e.g., `.wasm` test fixtures) with small pointer files (~130 bytes each). Actual content lives on a separate LFS server and is fetched lazily on checkout. Supported on our GitHub Enterprise plan (5 GB storage, 5 GB/month bandwidth included). | ||
|
|
||
| **Pros:** | ||
|
|
||
| - Dramatically reduces clone size for repos with large binaries. | ||
| - Transparent to most workflows once set up. | ||
|
|
||
| **Cons:** | ||
|
|
||
| - Only helps with binary files; doesn't reduce file count from source directories (tests, docs, etc.). | ||
| - Adds infrastructure dependency (LFS server must be available). | ||
| - Some tooling (mirrors, forks, CI runners) doesn't handle LFS transparently. | ||
| - Switching files in/out of LFS after the fact is messy. | ||
| - Doesn't address the core problem of cross-repo coordination. | ||
|
|
||
| ## References | ||
|
|
||
| - [Git Subtree Documentation](https://git-scm.com/book/en/v2/Git-Tools-Advanced-Merging#_subtree_merge) | ||
| - [Git Submodule Documentation](https://git-scm.com/book/en/v2/Git-Tools-Submodules) | ||
| - [Git LFS](https://git-lfs.com/) | ||
| - [bytecodealliance/wasmtime](https://github.com/bytecodealliance/wasmtime) | ||
| - [bytecodealliance/wasmtime-py](https://github.com/bytecodealliance/wasmtime-py) | ||
| - [bytecodealliance/ComponentizeJS](https://github.com/bytecodealliance/ComponentizeJS) | ||
| - [bytecodealliance/jco](https://github.com/bytecodealliance/jco) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,221 @@ | ||
| # WASM Dependency Distribution | ||
|
|
||
| ## Overview | ||
|
|
||
| The WASM integration in sdk-typescript depends on forked versions of: | ||
|
|
||
| - **wasmtime** (Rust) — the WASM runtime, built as part of wasmtime-py wheels | ||
| - **wasmtime-py** (Python) — Python bindings that load and execute the WASM component | ||
| - **componentize-js** (JavaScript) — compiles bundled JS into a WASM component | ||
| - **jco** (JavaScript/Rust) — generates TypeScript types from WIT and transpiles WASM for testing | ||
|
|
||
| These forks contain workarounds for features not yet available upstream (e.g., async streaming). This document proposes how to distribute these forks and how developers interact with them locally. | ||
|
|
||
| ## Problem | ||
|
|
||
| The forks need to be consumable by both sdk-typescript's build process and end users of strands-py-wasm. Today, this is handled ad hoc (e.g., `@chaynabors/componentize-js` published manually to npm). There's no unified approach for: | ||
|
|
||
| - Publishing fork artifacts reliably. | ||
| - Testing fork changes against sdk-typescript before shipping. | ||
| - Giving developers a fast local iteration loop when modifying forks. | ||
| - Keeping sdk-typescript's repo and CI simple. | ||
|
|
||
| ## Solution | ||
|
|
||
| Publish fork artifacts to **npm** and **PyPI** under the `@strands-agents` scope. Use a dedicated fork repo with CI that tests against sdk-typescript before publishing. Use **git submodules** in sdk-typescript as an optional local development convenience. | ||
|
|
||
| ### Distribution | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The biggest pro is that this streamlines development for those who don't need changes to these packages. The biggest con is that it is a much higher bar for those that are making the changes to the packages. How confident are we that changes will be infrequent? |
||
|
|
||
| Fork artifacts are published to public package registries: | ||
|
|
||
| | Fork | Registry | Package name | Notes | | ||
| |------|----------|--------------|-------| | ||
| | componentize-js | npm | `@strands-agents/componentize-js` | JS package | | ||
| | jco | npm | `@strands-agents/jco` | JS package | | ||
| | wasmtime + wasmtime-py | PyPI | `strands-agents-wasmtime` | wasmtime Rust source compiled into wasmtime-py wheels | | ||
|
|
||
| sdk-typescript consumes them as normal dependencies: | ||
|
|
||
| ```json | ||
| // strands-wasm/package.json | ||
| { | ||
| "devDependencies": { | ||
| "@strands-agents/componentize-js": "^0.19.3", | ||
| "@strands-agents/jco": "^1.16.1" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ```toml | ||
| # strands-py-wasm/pyproject.toml | ||
| dependencies = [ | ||
| "strands-agents-wasmtime>=37.0.0,<38.0.0", | ||
| ] | ||
| ``` | ||
|
|
||
| ### Fork Repo Structure | ||
|
|
||
| A single repo contains all fork source and CI. Each upstream dependency is imported via **git subtree**, enabling periodic sync with upstream while keeping everything in one place: | ||
|
|
||
| ``` | ||
| strands-agents/wasm-deps/ | ||
| ├── wasmtime/ # subtree from bytecodealliance/wasmtime | ||
| ├── wasmtime-py/ # subtree from bytecodealliance/wasmtime-py | ||
| ├── componentize-js/ # subtree from bytecodealliance/ComponentizeJS | ||
| ├── jco/ # subtree from bytecodealliance/jco | ||
| └── .github/workflows/ | ||
| ├── test.yml # test against sdk-typescript on PR | ||
| └── publish.yml # build and publish on release tag | ||
| ``` | ||
|
|
||
| Day-to-day, developers just edit files and commit normally. The subtree commands are only used for syncing with upstream: | ||
|
|
||
| ```bash | ||
| # Pull latest upstream changes into a specific dependency: | ||
| git subtree pull --prefix=wasmtime https://github.com/bytecodealliance/wasmtime.git main --squash | ||
|
|
||
| # Push a fix back upstream (when ready to contribute): | ||
| git subtree push --prefix=wasmtime https://github.com/bytecodealliance/wasmtime.git fix/async-streaming | ||
| ``` | ||
|
|
||
| ### CI Pipeline (Fork Repo) | ||
|
|
||
| On push/PR, the fork repo's CI validates changes against sdk-typescript: | ||
|
|
||
| ```yaml | ||
| # test.yml | ||
| - run: npm run build # build fork artifacts | ||
| - run: git clone https://github.com/strands-agents/sdk-typescript.git /tmp/sdk-ts | ||
| - run: cd /tmp/sdk-ts && npm ci | ||
| - run: cd /tmp/sdk-ts && npm link ${{ github.workspace }}/componentize-js | ||
| - run: cd /tmp/sdk-ts && npm test # validate compatibility | ||
| ``` | ||
|
|
||
| On release tag, publish: | ||
|
|
||
| ```yaml | ||
| # publish.yml | ||
| - run: npm publish --access public | ||
| env: | ||
| NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }} | ||
| ``` | ||
|
|
||
| sdk-typescript's own CI has no awareness of forks. It runs `npm ci`, which pulls published packages from the registry. | ||
|
|
||
| ### Local Development (Submodules) | ||
|
|
||
| Submodules give developers a consistent local checkout of the fork source for fast iteration: | ||
|
|
||
| ``` | ||
| sdk-typescript/ | ||
| ├── strands-ts/ | ||
| ├── strands-wasm/ | ||
| ├── strands-py-wasm/ | ||
| ├── forks/ # submodules (optional) | ||
| │ ├── wasmtime/ # → github.com/strands-agents/wasmtime | ||
| │ ├── wasmtime-py/ # → github.com/strands-agents/wasmtime-py | ||
| │ ├── componentize-js/ # → github.com/strands-agents/componentize-js | ||
| │ └── jco/ # → github.com/strands-agents/jco | ||
| ``` | ||
|
|
||
| Submodules are optional. Contributors who don't work on the WASM layer never initialize them: | ||
|
|
||
| ```bash | ||
| # Normal contributor: | ||
| git clone git@github.com:strands-agents/sdk-typescript.git | ||
| npm ci # pulls published packages from registry | ||
| npm test # works fine | ||
|
|
||
| # WASM developer: | ||
| git clone --recurse-submodules git@github.com:strands-agents/sdk-typescript.git | ||
| npm ci | ||
| npm run dev:link-forks # overrides registry packages with local submodule source | ||
| ``` | ||
|
|
||
| The `dev:link-forks` script in the root `package.json`: | ||
|
|
||
| ```json | ||
| { | ||
| "scripts": { | ||
| "dev:link-forks": "npm link ./forks/componentize-js ./forks/jco" | ||
| } | ||
| } | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could continue to add to these commands over time to offer more conveniences if need be. But I feel something like this is a reasonable start.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It balances the development flow (with friction) with the consumer flow quite well IMHO |
||
| ``` | ||
|
|
||
| This creates symlinks in `node_modules` pointing at the submodule directories instead of the published registry versions. | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Normal Development (No Fork Changes) | ||
|
|
||
| No special steps. `npm ci` and `pip install` pull from registries as usual. | ||
|
|
||
| ### Modifying Fork Code Locally | ||
|
|
||
| ```bash | ||
| # Initialize submodules (one time) | ||
| git submodule update --init | ||
|
|
||
| # Get on a feature branch in the fork | ||
| cd forks/componentize-js | ||
| git checkout -b fix/async-streaming | ||
|
|
||
| # Make changes and rebuild | ||
| vim src/something.ts | ||
| npm run build | ||
|
|
||
| # Link into sdk-typescript | ||
| cd ../.. | ||
| npm run dev:link-forks | ||
|
|
||
| # Test | ||
| cd strands-wasm && node build.js | ||
| ``` | ||
|
|
||
| For Python (wasmtime-py): | ||
|
|
||
| ```bash | ||
| cd forks/wasmtime-py | ||
| pip install -e . | ||
|
|
||
| # Now strands-py-wasm uses the local wasmtime-py | ||
| # Changes to Python files are reflected immediately | ||
| ``` | ||
|
|
||
| ### Publishing Fork Changes | ||
|
|
||
| ```bash | ||
| # Push the feature branch in the fork submodule | ||
| cd forks/componentize-js | ||
| git push origin fix/async-streaming | ||
|
|
||
| # Open a PR in the wasm-deps repo | ||
| # Fork repo CI runs: | ||
| # 1. Builds package | ||
| # 2. Tests against sdk-typescript | ||
| # 3. Publishes to npm on merge/tag | ||
|
|
||
| # After merge, update submodule pointer in sdk-typescript | ||
| cd ../.. | ||
| git add forks/componentize-js | ||
| git commit -m "chore: bump componentize-js submodule" | ||
|
|
||
| # Bump version in package.json (or let Dependabot handle it) | ||
| ``` | ||
|
|
||
| ### Unlinking (Back to Published Packages) | ||
|
|
||
| ```bash | ||
| npm ci # reinstalls everything from registry, links are gone | ||
| ``` | ||
|
|
||
| ## References | ||
|
|
||
| - [npm link documentation](https://docs.npmjs.com/cli/v10/commands/npm-link) | ||
| - [pip editable installs](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs) | ||
| - [Git Submodule Documentation](https://git-scm.com/book/en/v2/Git-Tools-Submodules) | ||
| - [GitHub Packages](https://docs.github.com/en/packages) | ||
| - [npm publishing](https://docs.npmjs.com/cli/v10/commands/npm-publish) | ||
| - [bytecodealliance/wasmtime-py](https://github.com/bytecodealliance/wasmtime-py) | ||
| - [bytecodealliance/ComponentizeJS](https://github.com/bytecodealliance/ComponentizeJS) | ||
| - [bytecodealliance/jco](https://github.com/bytecodealliance/jco) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest con to me is that this source code ends up in the history forever; e.g. there's no way to strip it later without rewriting history. Whereas a submodule being a pointer [effectively] means the overhead is trivial once we remove them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a feature to me though no? At one point we vended these dependencies and getting old builds working means simply checking out the old history. Hermetic builds.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good call out. It would permanently alter the size of sdk-typescript. Based on the metrics, it would end up being roughly 30MB (calculated from the WASM repo .git sizes) of dead weight once removed.
To mitigate, we could try slimming down wasmtime and jco even more than what was proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, what Patrick said; it's a permanent size bump for what is/should be a temporary solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So no subtree? Commit your fork into the repo without history, and we can override from your fork if we need to downstream anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience submodules are just incredibly clunky and make it difficult to use git tools to split history or something. Memory is fuzzy here. My understanding is that they'd poison our repo and be hard to remove but don't take my word for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah effectively
If you're doing active development and changing things often, this is true - it's why I don't suggest it as an alternative to a monorepository. But for infrequent changes or "pinning" at a specific commit, it works pretty well.
AFAIK there's no poisoning as it's more like a file/pointer more than anything. Deleting them is a couple of configs, whereas subtrees you have the entire history as part of your repo forever.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore me, conflating LFS and submodules for some reason. Need a nap.