Skip to content
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions designs/0011-fork-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Managing Forked Dependencies

## Overview

The WASM integration in sdk-typescript depends on modified versions of several upstream repositories:

- **wasmtime** (Rust) — the WASM runtime
- **wasmtime-py** (Python) — Python bindings for wasmtime
- **componentize-js** (JavaScript) — JS component model tooling
- **jco** (JavaScript/Rust) — JS toolchain for WebAssembly Components

This document proposes an approach for housing these forks within the sdk-typescript repo to streamline development and CI/CD.

## Problem

Upstream WASM tooling does not yet support all the features we need for the Python-via-WASM bindings. Certain capabilities (e.g., async streaming) require workarounds that live in forked versions of these dependencies. Today, those forks exist as separate repositories, which creates friction:

- Changes that span sdk-typescript and a fork require coordinating across multiple repos, branches, and PRs.
- CI/CD pipelines must clone and build from multiple sources, adding complexity and fragility to the build.
- Contributors need to discover and set up the correct fork versions manually.
- There is no single place to see the full picture of what's modified and why.

Bringing the forks into sdk-typescript simplifies the development loop: one clone, one set of commits, one CI pipeline.

## Solution

Use **git subtrees** to import each fork's source directly into sdk-typescript.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest con to me is that this source code ends up in the history forever; e.g. there's no way to strip it later without rewriting history. Whereas a submodule being a pointer [effectively] means the overhead is trivial once we remove them

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a feature to me though no? At one point we vended these dependencies and getting old builds working means simply checking out the old history. Hermetic builds.

Copy link
Copy Markdown
Member Author

@pgrayy pgrayy May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good call out. It would permanently alter the size of sdk-typescript. Based on the metrics, it would end up being roughly 30MB (calculated from the WASM repo .git sizes) of dead weight once removed.

To mitigate, we could try slimming down wasmtime and jco even more than what was proposed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what Patrick said; it's a permanent size bump for what is/should be a temporary solution

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So no subtree? Commit your fork into the repo without history, and we can override from your fork if we need to downstream anything?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience submodules are just incredibly clunky and make it difficult to use git tools to split history or something. Memory is fuzzy here. My understanding is that they'd poison our repo and be hard to remove but don't take my word for it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit your fork into the repo without history, and we can override from your fork if we need to downstream anything?

Yeah effectively

In my experience submodules are just incredibly clunky

If you're doing active development and changing things often, this is true - it's why I don't suggest it as an alternative to a monorepository. But for infrequent changes or "pinning" at a specific commit, it works pretty well.

AFAIK there's no poisoning as it's more like a file/pointer more than anything. Deleting them is a couple of configs, whereas subtrees you have the entire history as part of your repo forever.

Copy link
Copy Markdown
Member

@chaynabors chaynabors May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore me, conflating LFS and submodules for some reason. Need a nap.


Subtrees copy a remote repo's file tree into a subdirectory of the host repo. After import, the files are regular tracked content. No special tooling is required beyond the `git subtree` command, which ships with git.

Benefits of this approach:

- **Single clone** — contributors get everything with a standard `git clone`. No extra steps.
- **Atomic commits** — a change to sdk-typescript and a fork workaround land in one commit/PR.
- **Upstream sync** — `git subtree pull` merges new upstream changes; `git subtree push` extracts patches back when ready to upstream.
- **Simpler CI** — one checkout, one pipeline. No submodule initialization, no cross-repo coordination.

The resulting layout:

```
sdk-typescript/
├── strands-ts/ # existing SDK package
├── strands-wasm/ # existing WASM build tooling
├── strands-py-wasm/ # existing Python bindings
├── forks/
│ ├── wasmtime/ # subtree
│ ├── wasmtime-py/ # subtree
│ ├── componentize-js/ # subtree
│ └── jco/ # subtree
├── package.json # workspace config
```

## Metrics

sdk-typescript today clones at about 4 MB over the network (8.6 MB total on disk with full history, 408 files). After `npm ci`, disk usage grows to ~591 MB, almost entirely from `node_modules/` (~548 MB). The source code itself is small.

Each fork was measured via shallow clone (`--depth=1`) against the current main branch:

| Fork | Network transfer | Disk | Files | Notes |
|------|-----------------|------|-------|-------|
| wasmtime | 26 MB | 86 MB | 6,700 | Largest: `crates/` 41 MB, `cranelift/` 21 MB, `tests/` 17 MB |
| wasmtime-py | 304 KB | 1 MB | 91 | Negligible |
| componentize-js | 352 KB | 1.5 MB | 265 | Negligible |
| jco | 80 MB | 413 MB | 1,060 | 401 MB is `.wasm` test fixtures; source is ~12 MB |

wasmtime and jco carry significant weight in directories we don't need (test fixtures, docs, benchmarks). Maintaining a **slim branch** in each fork strips these before import:

| Fork | After stripping | Files | Reduction |
|------|----------------|-------|-----------|
| wasmtime (drop tests, docs, benches, examples) | ~60 MB | ~4,000 | ~30% |
| jco (drop test fixtures) | ~12 MB | ~650 | ~97% |
| wasmtime-py | No change needed | 91 | Already tiny |
| componentize-js | No change needed | 265 | Already tiny |

A slim branch is regenerated from main whenever upstream is updated:

```bash
# In the fork repo:
git checkout main
git pull upstream main
git checkout -B slim
git rm -rf tests/ docs/ benches/ examples/
git commit -m "chore: strip non-essential dirs for slim branch"
git push origin slim --force
```

Then pulled into sdk-typescript with `--squash` to collapse upstream history into a single merge commit:

```bash
git subtree add --prefix=forks/wasmtime <fork-url> slim --squash # initial
git subtree pull --prefix=forks/wasmtime <fork-url> slim --squash # updates
```

With all four forks imported (slim where applicable, squashed), the projected clone size rises from ~4 MB to roughly 30-35 MB. This is still modest compared to the ~548 MB that `npm ci` adds.

## Usage

### Making Changes to Fork Code

Edit files directly and commit normally:

```bash
vim forks/wasmtime/crates/something/src/lib.rs
git add forks/wasmtime/
git commit -m "fix: workaround for async streaming in wasmtime"
```

### Pulling Upstream Changes

Update the slim branch in the fork repo first (scripted), then pull into sdk-typescript:

```bash
git subtree pull --prefix=forks/wasmtime <fork-url> slim --squash
```

Conflicts with local patches are resolved as a normal merge.

### Pushing Changes Back Upstream

When ready to contribute fixes back:

```bash
git subtree push --prefix=forks/wasmtime <fork-url> my-upstream-pr-branch
```

This extracts only the commits that touched `forks/wasmtime/`, rewrites their paths (strips the prefix), and pushes them to the fork repo. From there, open a PR on the upstream project.

## Alternatives

### Submodules

Each fork lives in its own GitHub repo. sdk-typescript references them at pinned commits via `.gitmodules`. Each submodule is a pointer (SHA) to a specific commit in the fork repo. Contributors must use `git clone --recurse-submodules` or run `git submodule update --init` after cloning.

**Pros:**

- Each fork retains full independence (own history, branches, tags, CI).
- Pinned to exact commits for reproducible builds.
- No increase to sdk-typescript's pack size.
- Natural for upstreaming (fork repos are first-class).

**Cons:**

- Known developer experience pain: forgetting `--recurse-submodules` is common.
- PRs that span sdk-typescript + a fork require coordinating across repos.
- Detached HEAD state inside submodules is confusing.
- Nested submodules (wasmtime has its own) compound complexity.
- CI must be configured to initialize submodules.

### Git LFS

Replace large binary files (e.g., `.wasm` test fixtures) with small pointer files (~130 bytes each). Actual content lives on a separate LFS server and is fetched lazily on checkout. Supported on our GitHub Enterprise plan (5 GB storage, 5 GB/month bandwidth included).

**Pros:**

- Dramatically reduces clone size for repos with large binaries.
- Transparent to most workflows once set up.

**Cons:**

- Only helps with binary files; doesn't reduce file count from source directories (tests, docs, etc.).
- Adds infrastructure dependency (LFS server must be available).
- Some tooling (mirrors, forks, CI runners) doesn't handle LFS transparently.
- Switching files in/out of LFS after the fact is messy.
- Doesn't address the core problem of cross-repo coordination.

## References

- [Git Subtree Documentation](https://git-scm.com/book/en/v2/Git-Tools-Advanced-Merging#_subtree_merge)
- [Git Submodule Documentation](https://git-scm.com/book/en/v2/Git-Tools-Submodules)
- [Git LFS](https://git-lfs.com/)
- [bytecodealliance/wasmtime](https://github.com/bytecodealliance/wasmtime)
- [bytecodealliance/wasmtime-py](https://github.com/bytecodealliance/wasmtime-py)
- [bytecodealliance/ComponentizeJS](https://github.com/bytecodealliance/ComponentizeJS)
- [bytecodealliance/jco](https://github.com/bytecodealliance/jco)
211 changes: 211 additions & 0 deletions designs/0012-wasm-dependency-distribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# WASM Dependency Distribution

## Overview

The WASM integration in sdk-typescript depends on forked versions of:

- **wasmtime** (Rust) — the WASM runtime, built as part of wasmtime-py wheels
- **wasmtime-py** (Python) — Python bindings that load and execute the WASM component
- **componentize-js** (JavaScript) — compiles bundled JS into a WASM component
- **jco** (JavaScript/Rust) — generates TypeScript types from WIT and transpiles WASM for testing

These forks contain workarounds for features not yet available upstream (e.g., async streaming). This document proposes how to distribute these forks and how developers interact with them locally.

## Problem

The forks need to be consumable by both sdk-typescript's build process and end users of strands-py-wasm. Today, this is handled ad hoc (e.g., `@chaynabors/componentize-js` published manually to npm). There's no unified approach for:

- Publishing fork artifacts reliably.
- Testing fork changes against sdk-typescript before shipping.
- Giving developers a fast local iteration loop when modifying forks.
- Keeping sdk-typescript's repo and CI simple.

## Solution

Publish fork artifacts to **npm** and **PyPI** under the `@strands-agents` scope. Use a dedicated fork repo with CI that tests against sdk-typescript before publishing. Use **git submodules** in sdk-typescript as an optional local development convenience.

### Distribution
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest pro is that this streamlines development for those who don't need changes to these packages. The biggest con is that it is a much higher bar for those that are making the changes to the packages.

How confident are we that changes will be infrequent?


Fork artifacts are published to public package registries:

| Fork | Registry | Package name | Notes |
|------|----------|--------------|-------|
| componentize-js | npm | `@strands-agents/componentize-js` | JS package |
| jco | npm | `@strands-agents/jco` | JS package |
| wasmtime + wasmtime-py | PyPI | `strands-agents-wasmtime` | wasmtime Rust source compiled into wasmtime-py wheels |

sdk-typescript consumes them as normal dependencies:

```json
// strands-wasm/package.json
{
"devDependencies": {
"@strands-agents/componentize-js": "^0.19.3",
"@strands-agents/jco": "^1.16.1"
}
}
```

```toml
# strands-py-wasm/pyproject.toml
dependencies = [
"strands-agents-wasmtime>=37.0.0,<38.0.0",
]
```

### Fork Repo Structure

A single repo contains all fork source and CI:

```
strands-agents/wasm-deps/
├── wasmtime/
├── wasmtime-py/
├── componentize-js/
├── jco/
└── .github/workflows/
├── test.yml # test against sdk-typescript on PR
└── publish.yml # build and publish on release tag
```

### CI Pipeline (Fork Repo)

On push/PR, the fork repo's CI validates changes against sdk-typescript:

```yaml
# test.yml
- run: npm run build # build fork artifacts
- run: git clone https://github.com/strands-agents/sdk-typescript.git /tmp/sdk-ts
- run: cd /tmp/sdk-ts && npm ci
- run: cd /tmp/sdk-ts && npm link ${{ github.workspace }}/componentize-js
- run: cd /tmp/sdk-ts && npm test # validate compatibility
```

On release tag, publish:

```yaml
# publish.yml
- run: npm publish --access public
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
```

sdk-typescript's own CI has no awareness of forks. It runs `npm ci`, which pulls published packages from the registry.

### Local Development (Submodules)

Submodules give developers a consistent local checkout of the fork source for fast iteration:

```
sdk-typescript/
├── strands-ts/
├── strands-wasm/
├── strands-py-wasm/
├── forks/ # submodules (optional)
│ ├── wasmtime/ # → github.com/strands-agents/wasmtime
│ ├── wasmtime-py/ # → github.com/strands-agents/wasmtime-py
│ ├── componentize-js/ # → github.com/strands-agents/componentize-js
│ └── jco/ # → github.com/strands-agents/jco
```

Submodules are optional. Contributors who don't work on the WASM layer never initialize them:

```bash
# Normal contributor:
git clone git@github.com:strands-agents/sdk-typescript.git
npm ci # pulls published packages from registry
npm test # works fine

# WASM developer:
git clone --recurse-submodules git@github.com:strands-agents/sdk-typescript.git
npm ci
npm run dev:link-forks # overrides registry packages with local submodule source
```

The `dev:link-forks` script in the root `package.json`:

```json
{
"scripts": {
"dev:link-forks": "npm link ./forks/componentize-js ./forks/jco"
}
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could continue to add to these commands over time to offer more conveniences if need be. But I feel something like this is a reasonable start.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It balances the development flow (with friction) with the consumer flow quite well IMHO

```

This creates symlinks in `node_modules` pointing at the submodule directories instead of the published registry versions.

## Usage

### Normal Development (No Fork Changes)

No special steps. `npm ci` and `pip install` pull from registries as usual.

### Modifying Fork Code Locally

```bash
# Initialize submodules (one time)
git submodule update --init

# Get on a feature branch in the fork
cd forks/componentize-js
git checkout -b fix/async-streaming

# Make changes and rebuild
vim src/something.ts
npm run build

# Link into sdk-typescript
cd ../..
npm run dev:link-forks

# Test
cd strands-wasm && node build.js
```

For Python (wasmtime-py):

```bash
cd forks/wasmtime-py
pip install -e .

# Now strands-py-wasm uses the local wasmtime-py
# Changes to Python files are reflected immediately
```

### Publishing Fork Changes

```bash
# Push the feature branch in the fork submodule
cd forks/componentize-js
git push origin fix/async-streaming

# Open a PR in the wasm-deps repo
# Fork repo CI runs:
# 1. Builds package
# 2. Tests against sdk-typescript
# 3. Publishes to npm on merge/tag

# After merge, update submodule pointer in sdk-typescript
cd ../..
git add forks/componentize-js
git commit -m "chore: bump componentize-js submodule"

# Bump version in package.json (or let Dependabot handle it)
```

### Unlinking (Back to Published Packages)

```bash
npm ci # reinstalls everything from registry, links are gone
```

## References

- [npm link documentation](https://docs.npmjs.com/cli/v10/commands/npm-link)
- [pip editable installs](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs)
- [Git Submodule Documentation](https://git-scm.com/book/en/v2/Git-Tools-Submodules)
- [GitHub Packages](https://docs.github.com/en/packages)
- [npm publishing](https://docs.npmjs.com/cli/v10/commands/npm-publish)
- [bytecodealliance/wasmtime-py](https://github.com/bytecodealliance/wasmtime-py)
- [bytecodealliance/ComponentizeJS](https://github.com/bytecodealliance/ComponentizeJS)
- [bytecodealliance/jco](https://github.com/bytecodealliance/jco)
Loading