diff --git a/designs/0011-fork-management.md b/designs/0011-fork-management.md new file mode 100644 index 000000000..80c3f776c --- /dev/null +++ b/designs/0011-fork-management.md @@ -0,0 +1,174 @@ +# Managing Forked Dependencies + +## Overview + +The WASM integration in sdk-typescript depends on modified versions of several upstream repositories: + +- **wasmtime** (Rust) — the WASM runtime +- **wasmtime-py** (Python) — Python bindings for wasmtime +- **componentize-js** (JavaScript) — JS component model tooling +- **jco** (JavaScript/Rust) — JS toolchain for WebAssembly Components + +This document proposes an approach for housing these forks within the sdk-typescript repo to streamline development and CI/CD. + +## Problem + +Upstream WASM tooling does not yet support all the features we need for the Python-via-WASM bindings. Certain capabilities (e.g., async streaming) require workarounds that live in forked versions of these dependencies. Today, those forks exist as separate repositories, which creates friction: + +- Changes that span sdk-typescript and a fork require coordinating across multiple repos, branches, and PRs. +- CI/CD pipelines must clone and build from multiple sources, adding complexity and fragility to the build. +- Contributors need to discover and set up the correct fork versions manually. +- There is no single place to see the full picture of what's modified and why. + +Bringing the forks into sdk-typescript simplifies the development loop: one clone, one set of commits, one CI pipeline. + +## Solution + +Use **git subtrees** to import each fork's source directly into sdk-typescript. + +Subtrees copy a remote repo's file tree into a subdirectory of the host repo. After import, the files are regular tracked content. No special tooling is required beyond the `git subtree` command, which ships with git. + +Benefits of this approach: + +- **Single clone** — contributors get everything with a standard `git clone`. No extra steps. +- **Atomic commits** — a change to sdk-typescript and a fork workaround land in one commit/PR. +- **Upstream sync** — `git subtree pull` merges new upstream changes; `git subtree push` extracts patches back when ready to upstream. +- **Simpler CI** — one checkout, one pipeline. No submodule initialization, no cross-repo coordination. + +The resulting layout: + +``` +sdk-typescript/ +├── strands-ts/ # existing SDK package +├── strands-wasm/ # existing WASM build tooling +├── strands-py-wasm/ # existing Python bindings +├── forks/ +│ ├── wasmtime/ # subtree +│ ├── wasmtime-py/ # subtree +│ ├── componentize-js/ # subtree +│ └── jco/ # subtree +├── package.json # workspace config +``` + +## Metrics + +sdk-typescript today clones at about 4 MB over the network (8.6 MB total on disk with full history, 408 files). After `npm ci`, disk usage grows to ~591 MB, almost entirely from `node_modules/` (~548 MB). The source code itself is small. + +Each fork was measured via shallow clone (`--depth=1`) against the current main branch: + +| Fork | Network transfer | Disk | Files | Notes | +|------|-----------------|------|-------|-------| +| wasmtime | 26 MB | 86 MB | 6,700 | Largest: `crates/` 41 MB, `cranelift/` 21 MB, `tests/` 17 MB | +| wasmtime-py | 304 KB | 1 MB | 91 | Negligible | +| componentize-js | 352 KB | 1.5 MB | 265 | Negligible | +| jco | 80 MB | 413 MB | 1,060 | 401 MB is `.wasm` test fixtures; source is ~12 MB | + +wasmtime and jco carry significant weight in directories we don't need (test fixtures, docs, benchmarks). Maintaining a **slim branch** in each fork strips these before import: + +| Fork | After stripping | Files | Reduction | +|------|----------------|-------|-----------| +| wasmtime (drop tests, docs, benches, examples) | ~60 MB | ~4,000 | ~30% | +| jco (drop test fixtures) | ~12 MB | ~650 | ~97% | +| wasmtime-py | No change needed | 91 | Already tiny | +| componentize-js | No change needed | 265 | Already tiny | + +A slim branch is regenerated from main whenever upstream is updated: + +```bash +# In the fork repo: +git checkout main +git pull upstream main +git checkout -B slim +git rm -rf tests/ docs/ benches/ examples/ +git commit -m "chore: strip non-essential dirs for slim branch" +git push origin slim --force +``` + +Then pulled into sdk-typescript with `--squash` to collapse upstream history into a single merge commit: + +```bash +git subtree add --prefix=forks/wasmtime slim --squash # initial +git subtree pull --prefix=forks/wasmtime slim --squash # updates +``` + +With all four forks imported (slim where applicable, squashed), the projected clone size rises from ~4 MB to roughly 30-35 MB. This is still modest compared to the ~548 MB that `npm ci` adds. + +## Usage + +### Making Changes to Fork Code + +Edit files directly and commit normally: + +```bash +vim forks/wasmtime/crates/something/src/lib.rs +git add forks/wasmtime/ +git commit -m "fix: workaround for async streaming in wasmtime" +``` + +### Pulling Upstream Changes + +Update the slim branch in the fork repo first (scripted), then pull into sdk-typescript: + +```bash +git subtree pull --prefix=forks/wasmtime slim --squash +``` + +Conflicts with local patches are resolved as a normal merge. + +### Pushing Changes Back Upstream + +When ready to contribute fixes back: + +```bash +git subtree push --prefix=forks/wasmtime my-upstream-pr-branch +``` + +This extracts only the commits that touched `forks/wasmtime/`, rewrites their paths (strips the prefix), and pushes them to the fork repo. From there, open a PR on the upstream project. + +## Alternatives + +### Submodules + +Each fork lives in its own GitHub repo. sdk-typescript references them at pinned commits via `.gitmodules`. Each submodule is a pointer (SHA) to a specific commit in the fork repo. Contributors must use `git clone --recurse-submodules` or run `git submodule update --init` after cloning. + +**Pros:** + +- Each fork retains full independence (own history, branches, tags, CI). +- Pinned to exact commits for reproducible builds. +- No increase to sdk-typescript's pack size. +- Natural for upstreaming (fork repos are first-class). + +**Cons:** + +- Known developer experience pain: forgetting `--recurse-submodules` is common. +- PRs that span sdk-typescript + a fork require coordinating across repos. +- Detached HEAD state inside submodules is confusing. +- Nested submodules (wasmtime has its own) compound complexity. +- CI must be configured to initialize submodules. + +### Git LFS + +Replace large binary files (e.g., `.wasm` test fixtures) with small pointer files (~130 bytes each). Actual content lives on a separate LFS server and is fetched lazily on checkout. Supported on our GitHub Enterprise plan (5 GB storage, 5 GB/month bandwidth included). + +**Pros:** + +- Dramatically reduces clone size for repos with large binaries. +- Transparent to most workflows once set up. + +**Cons:** + +- Only helps with binary files; doesn't reduce file count from source directories (tests, docs, etc.). +- Adds infrastructure dependency (LFS server must be available). +- Some tooling (mirrors, forks, CI runners) doesn't handle LFS transparently. +- Switching files in/out of LFS after the fact is messy. +- Doesn't address the core problem of cross-repo coordination. + +## References + +- [Git Subtree Documentation](https://git-scm.com/book/en/v2/Git-Tools-Advanced-Merging#_subtree_merge) +- [Git Submodule Documentation](https://git-scm.com/book/en/v2/Git-Tools-Submodules) +- [Git LFS](https://git-lfs.com/) +- [bytecodealliance/wasmtime](https://github.com/bytecodealliance/wasmtime) +- [bytecodealliance/wasmtime-py](https://github.com/bytecodealliance/wasmtime-py) +- [bytecodealliance/ComponentizeJS](https://github.com/bytecodealliance/ComponentizeJS) +- [bytecodealliance/jco](https://github.com/bytecodealliance/jco) diff --git a/designs/0012-wasm-dependency-distribution.md b/designs/0012-wasm-dependency-distribution.md new file mode 100644 index 000000000..b799ddd53 --- /dev/null +++ b/designs/0012-wasm-dependency-distribution.md @@ -0,0 +1,221 @@ +# WASM Dependency Distribution + +## Overview + +The WASM integration in sdk-typescript depends on forked versions of: + +- **wasmtime** (Rust) — the WASM runtime, built as part of wasmtime-py wheels +- **wasmtime-py** (Python) — Python bindings that load and execute the WASM component +- **componentize-js** (JavaScript) — compiles bundled JS into a WASM component +- **jco** (JavaScript/Rust) — generates TypeScript types from WIT and transpiles WASM for testing + +These forks contain workarounds for features not yet available upstream (e.g., async streaming). This document proposes how to distribute these forks and how developers interact with them locally. + +## Problem + +The forks need to be consumable by both sdk-typescript's build process and end users of strands-py-wasm. Today, this is handled ad hoc (e.g., `@chaynabors/componentize-js` published manually to npm). There's no unified approach for: + +- Publishing fork artifacts reliably. +- Testing fork changes against sdk-typescript before shipping. +- Giving developers a fast local iteration loop when modifying forks. +- Keeping sdk-typescript's repo and CI simple. + +## Solution + +Publish fork artifacts to **npm** and **PyPI** under the `@strands-agents` scope. Use a dedicated fork repo with CI that tests against sdk-typescript before publishing. Use **git submodules** in sdk-typescript as an optional local development convenience. + +### Distribution + +Fork artifacts are published to public package registries: + +| Fork | Registry | Package name | Notes | +|------|----------|--------------|-------| +| componentize-js | npm | `@strands-agents/componentize-js` | JS package | +| jco | npm | `@strands-agents/jco` | JS package | +| wasmtime + wasmtime-py | PyPI | `strands-agents-wasmtime` | wasmtime Rust source compiled into wasmtime-py wheels | + +sdk-typescript consumes them as normal dependencies: + +```json +// strands-wasm/package.json +{ + "devDependencies": { + "@strands-agents/componentize-js": "^0.19.3", + "@strands-agents/jco": "^1.16.1" + } +} +``` + +```toml +# strands-py-wasm/pyproject.toml +dependencies = [ + "strands-agents-wasmtime>=37.0.0,<38.0.0", +] +``` + +### Fork Repo Structure + +A single repo contains all fork source and CI. Each upstream dependency is imported via **git subtree**, enabling periodic sync with upstream while keeping everything in one place: + +``` +strands-agents/wasm-deps/ +├── wasmtime/ # subtree from bytecodealliance/wasmtime +├── wasmtime-py/ # subtree from bytecodealliance/wasmtime-py +├── componentize-js/ # subtree from bytecodealliance/ComponentizeJS +├── jco/ # subtree from bytecodealliance/jco +└── .github/workflows/ + ├── test.yml # test against sdk-typescript on PR + └── publish.yml # build and publish on release tag +``` + +Day-to-day, developers just edit files and commit normally. The subtree commands are only used for syncing with upstream: + +```bash +# Pull latest upstream changes into a specific dependency: +git subtree pull --prefix=wasmtime https://github.com/bytecodealliance/wasmtime.git main --squash + +# Push a fix back upstream (when ready to contribute): +git subtree push --prefix=wasmtime https://github.com/bytecodealliance/wasmtime.git fix/async-streaming +``` + +### CI Pipeline (Fork Repo) + +On push/PR, the fork repo's CI validates changes against sdk-typescript: + +```yaml +# test.yml +- run: npm run build # build fork artifacts +- run: git clone https://github.com/strands-agents/sdk-typescript.git /tmp/sdk-ts +- run: cd /tmp/sdk-ts && npm ci +- run: cd /tmp/sdk-ts && npm link ${{ github.workspace }}/componentize-js +- run: cd /tmp/sdk-ts && npm test # validate compatibility +``` + +On release tag, publish: + +```yaml +# publish.yml +- run: npm publish --access public + env: + NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }} +``` + +sdk-typescript's own CI has no awareness of forks. It runs `npm ci`, which pulls published packages from the registry. + +### Local Development (Submodules) + +Submodules give developers a consistent local checkout of the fork source for fast iteration: + +``` +sdk-typescript/ +├── strands-ts/ +├── strands-wasm/ +├── strands-py-wasm/ +├── forks/ # submodules (optional) +│ ├── wasmtime/ # → github.com/strands-agents/wasmtime +│ ├── wasmtime-py/ # → github.com/strands-agents/wasmtime-py +│ ├── componentize-js/ # → github.com/strands-agents/componentize-js +│ └── jco/ # → github.com/strands-agents/jco +``` + +Submodules are optional. Contributors who don't work on the WASM layer never initialize them: + +```bash +# Normal contributor: +git clone git@github.com:strands-agents/sdk-typescript.git +npm ci # pulls published packages from registry +npm test # works fine + +# WASM developer: +git clone --recurse-submodules git@github.com:strands-agents/sdk-typescript.git +npm ci +npm run dev:link-forks # overrides registry packages with local submodule source +``` + +The `dev:link-forks` script in the root `package.json`: + +```json +{ + "scripts": { + "dev:link-forks": "npm link ./forks/componentize-js ./forks/jco" + } +} +``` + +This creates symlinks in `node_modules` pointing at the submodule directories instead of the published registry versions. + +## Usage + +### Normal Development (No Fork Changes) + +No special steps. `npm ci` and `pip install` pull from registries as usual. + +### Modifying Fork Code Locally + +```bash +# Initialize submodules (one time) +git submodule update --init + +# Get on a feature branch in the fork +cd forks/componentize-js +git checkout -b fix/async-streaming + +# Make changes and rebuild +vim src/something.ts +npm run build + +# Link into sdk-typescript +cd ../.. +npm run dev:link-forks + +# Test +cd strands-wasm && node build.js +``` + +For Python (wasmtime-py): + +```bash +cd forks/wasmtime-py +pip install -e . + +# Now strands-py-wasm uses the local wasmtime-py +# Changes to Python files are reflected immediately +``` + +### Publishing Fork Changes + +```bash +# Push the feature branch in the fork submodule +cd forks/componentize-js +git push origin fix/async-streaming + +# Open a PR in the wasm-deps repo +# Fork repo CI runs: +# 1. Builds package +# 2. Tests against sdk-typescript +# 3. Publishes to npm on merge/tag + +# After merge, update submodule pointer in sdk-typescript +cd ../.. +git add forks/componentize-js +git commit -m "chore: bump componentize-js submodule" + +# Bump version in package.json (or let Dependabot handle it) +``` + +### Unlinking (Back to Published Packages) + +```bash +npm ci # reinstalls everything from registry, links are gone +``` + +## References + +- [npm link documentation](https://docs.npmjs.com/cli/v10/commands/npm-link) +- [pip editable installs](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs) +- [Git Submodule Documentation](https://git-scm.com/book/en/v2/Git-Tools-Submodules) +- [GitHub Packages](https://docs.github.com/en/packages) +- [npm publishing](https://docs.npmjs.com/cli/v10/commands/npm-publish) +- [bytecodealliance/wasmtime-py](https://github.com/bytecodealliance/wasmtime-py) +- [bytecodealliance/ComponentizeJS](https://github.com/bytecodealliance/ComponentizeJS) +- [bytecodealliance/jco](https://github.com/bytecodealliance/jco)