Regexped (pronounced reg-exped, short for REGexp EXPEDited) compiles regular expression patterns into standalone WebAssembly modules. It analyses your patterns, picks the best engine (DFA, TDFA, or Backtracking/BitState), emits WASM bytecode, and generates ready-to-use stubs for Rust, Go, C, JavaScript, TypeScript, and AssemblyScript.
Embed high-performance regexp matchers directly into WASM applications — no full regexp engine needed at runtime.
Supports RE2/Perl (leftmost-first) semantics. Unicode not yet supported.
- DFA engine — O(n) anchored matching and non-anchored find, word boundary assertions (
\b,\B), byte class compression, SIMD prefix scan (Teddy algorithm) - TDFA engine — O(n) capture group tracking via Laurikari’s tagged DFA; register-based slot updates on DFA transitions
- Backtracking engine — capture group tracking for non-TDFA-eligible patterns, BitState memoization for O(n) worst-case on zero-matchable loops
- Pattern sets — compile multiple patterns into a single merged DFA; a single
find_all/find_any/matchcall scans for all patterns simultaneously and returns(pattern_id, start, length)tuples; SIMD Teddy (≤16 literals) or Aho-Corasick (≤32 literals) literal frontends keep per-byte cost near-constant in set size, with a scalar DFA fallback for sets without mandatory literals - Stub generation for Rust, Go (wasip1), C, JavaScript, TypeScript, and AssemblyScript — with iterator/generator support (match, find, groups, named groups)
- WASM module merging via
wasm-merge— WASM Component Model support coming soon - Configurable via YAML
go install github.com/qrdl/regexped@latestOr build from source:
git clone https://github.com/qrdl/regexped
cd regexped
go build -o regexped .External dependency: wasm-merge (Binaryen toolkit) — required for the merge command.
Or use the official Docker image — no local install needed:
docker pull qrdl/regexped
docker run --rm -v $(pwd):/work -w /work qrdl/regexped <command> [flags]See docker.md for full Docker usage and workflow examples.
- CLI — see cli.md for all commands, flags, and config schema.
- Docker — see docker.md; official image
qrdl/regexpedincludeswasm-merge.
- CLI reference — commands, flags, config schema, pattern support
Languages
- Rust API — generated Rust stubs
- Go API — generated Go stubs
- JavaScript API — generated JS ES module and generator functions
- TypeScript API — generated TS ES module with typed generator functions
- AssemblyScript API — generated AS module with typed iterator classes
- C API — generated C header with static iterator functions
Environments
- Browser embedding — standalone WASM, JS/TS stub, no merge needed
- Node.js — standalone WASM, TypeScript stub,
readFileSync+init() - wasmtime — embedded WASM merged with a Rust/Go/C/AssemblyScript host, run via the
wasmtimeCLI or any wasmtime embedding - Cloudflare Workers — standalone WASM, JS module import, isolate-level init
- Gcore FastEdge — embedded WASM, Rust stubs, merge workflow
Sets
- Pattern sets — multi-pattern composition, YAML schema, output format, frontend selection
Internals
- Engines — DFA, TDFA, Backtracking, engine selection
- RE2 test coverage — pass/skip counts per engine and skip reasons
- WASM internals — WASM interface, memory layout, table formats
Examples are available for the following environments: wasmtime, Node.js, Cloudflare Workers, FastEdge, browser.
Languages: Rust, Go, C, JavaScript, TypeScript, AssemblyScript.
See examples/README.md for more details.
DFA/TDFA matching: O(n) time, O(1) runtime stack — no worst-case blowup.
Backtracking: LeftmostFirst (RE2/Perl) semantics for non-deterministic capture patterns. BitState memoization bounds runtime to O(n × numStates) for patterns with zero-matchable loops; stack overflow guard prevents memory corruption on deeply nested patterns.
SIMD prefix scan: First-byte and two-byte Teddy algorithm skips non-matching positions in bulk using WASM SIMD instructions, reducing DFA transitions on typical inputs.
Comparison vs regex crate (benchmarked via wasmtime, measured in fuel consumed and median execution time):
| Scenario | Fuel consumed | Median latency |
|---|---|---|
| Anchored match (email, URL) | 1.1–2.2× less | 1.0–1.6× faster |
| Non-anchored find (secrets, SQL injection) | 1.7–7.8× less | 1.6–7.2× faster |
| Multi-pattern find (combined secrets, 100 KB) | 8.2–8.4× less | 12.9–13.9× faster |
| TDFA capture groups (URL parse) | 2.3–6.9× less | 3.0–5.1× faster |
| Backtracking capture groups | 1.9–12.3× less | 1.7–21.4× faster |
| No-match fast-reject | up to 21.9× less | up to 12.7× faster |
Pattern sets vs RegexSet+rescan (8–20 patterns, 100 KB) |
— | 2.0–18.5× faster |
- No Unicode support — patterns and input are treated as raw bytes (Latin-1/ASCII). Unicode character classes (
\p{L},\p{N}, etc.), Unicode case folding, and multi-byte Unicode literals are not supported. - No WASM Component Model — generated modules use the core WASM ABI (linear memory + exported functions). WASM Component Model support is planned.
- Not thread-safe — the C, JS, TS, and AS stubs are not safe for concurrent use. Only the Rust and Go stubs are thread-safe.
Regexped is almost dependency-free. The only compile-time dependency is github.com/goccy/go-yaml for YAML config parsing. All regexp compilation, WASM emission, and stub generation are implemented from scratch with no external libraries.
The wasmtime-go binding is used only in re2test/ and perftest/ testing tools and is not a part of the main tool.
wasm-merge (from the Binaryen toolkit) is an external binary required only for the merge command. It must be installed separately using get_wasm_merge.sh shell script.
See LICENSE.