Skip to content

Blackcat-Informatics/gmeow-gts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

363 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

GTS logo

GTS — Graph Transport Substrate

A single-file, content-addressed, append-only transport for RDF 1.2 graphs and the binaries they reference.

A whole graph in a single, verifiable file.

CI crates.io PyPI npm DOI: 10.67342/umcdg7675h/v1 License: MIT OR Apache-2.0

Localized documentation: index (fr-CA, zh-Hans)


Why does this exist?

Most “portable data” is not actually one thing.

It is a database export, a directory of files, a manifest, some checksums, a signature, and a README explaining how the pieces fit together. Updates add another layer of conventions. Copy the wrong subset, lose a sidecar, or encounter an unsupported tool, and the package becomes incomplete or unverifiable.

GTS exists because the package itself should be the unit of trust.

A .gts file can carry structured data, the files that data refers to, provenance, integrity information, and an append-only history of change. It can be copied, streamed, verified, extended, concatenated, and partially read without depending on the system that created it.

Think of GTS as the transport layer between:

  • a pile of files;
  • a database export;
  • an event log;
  • and a signed data package.

It does not replace databases, archives, or query engines. It gives them a common artifact they can exchange without losing meaning, history, or integrity.

Under the hood, GTS uses RDF 1.2 for structured data and CBOR Sequences for the container. You do not need to adopt a particular ontology, database, or application architecture to use it.

Use GTS when you need to hand another person, service, or system one verifiable file containing data, attachments, provenance, and history.


What does this look like in practice?

GTS is most useful when the thing being moved is more than a single file or database export.

In each example below, the .gts file becomes the artifact: data, files, machine-readable context, provenance, signatures, and history travel together.

1. A document reviewed by several AIs—and a human

A research team asks three AI systems to analyze the same report:

  • one extracts factual claims;
  • one checks citations and supporting evidence;
  • one identifies risks, omissions, and contradictions;
  • a human reviewer accepts, rejects, or qualifies the results.

Without a shared artifact, the review quickly becomes a collection of PDFs, JSON responses, spreadsheets, chat logs, and comments. It becomes difficult to determine which model produced a claim, which document passage supported it, and whether a human later overruled it.

A GTS package can carry the complete review:

document-review.gts
├── original report and attachments
├── extracted claims
├── source-page and passage references
├── model identities and versions
├── confidence and uncertainty annotations
├── citation checks
├── contradiction and risk analysis
├── human review decisions
└── signatures and review history

Each participant can append a separate segment.

The claim-extraction model records what it found and where it found it. The citation-checking model adds evidence or challenges. The risk model contributes a different analysis without overwriting the first two. The human reviewer then appends a signed decision explaining which claims are accepted, rejected, disputed, or still unresolved.

If a claim is superseded, it can be suppressed from the current view without erasing the original claim or the analysis that produced it.

The resulting file can answer questions that are otherwise surprisingly difficult:

  • Which version of the document was analyzed?
  • Which model produced each claim?
  • What source passage supports it?
  • Did another model disagree?
  • What did the human reviewer decide?
  • Has any part of the review changed since it was signed?

The recipient can verify the artifact, inspect the disagreements, reproduce the current folded view, and project the result into a database, review interface, or knowledge system.

GTS does not decide which AI is correct. It preserves what each participant said, the evidence they cited, and the history of the review.


2. A digital artwork that keeps its license and history

A museum publishes a high-resolution reproduction of a painting.

The image may be accompanied by:

  • a catalog record;
  • a public-domain or licensed-use statement;
  • required attribution text;
  • provenance and historical context;
  • accessibility descriptions;
  • conservation notes;
  • capture and color-calibration information;
  • smaller web derivatives.

Normally, these pieces live in different systems. The image is downloaded without its catalog page. The license is copied separately. A thumbnail circulates without attribution. A later correction to the artist or date does not travel with earlier copies.

A GTS artifact can bind that context to the exact image bytes:

artwork-reproduction.gts
├── archival TIFF
├── web-resolution JPEG
├── thumbnail
├── color profile
├── title, artist, date, and dimensions
├── collection and accession information
├── provenance and historical context
├── multilingual description and alt text
├── capture and restoration metadata
├── signed license and attribution terms
└── correction and relicensing history

Each image derivative is content-addressed and linked to the master reproduction. The license can identify the exact digest to which it applies, rather than referring ambiguously to “the image.”

A curator or rights holder can sign the license and catalog record. A later correction, new scan, or revised license can be appended without silently replacing the earlier record.

Someone receiving the file can determine:

  • whether the image bytes are the published reproduction;
  • who issued the license;
  • which attribution text is required;
  • how a derivative relates to the master;
  • whether metadata has been corrected;
  • what historical and accessibility context belongs with the image.

This makes the reproduction portable without stripping it of identity or context.

GTS is not DRM and does not enforce copyright law. It makes the license, provenance, and exact licensed content explicit, verifiable, and difficult to separate accidentally.


3. A portable AI tool capsule

An AI system creates a small WASM module that exposes MCP-compatible tools when loaded by a suitable host.

The tool might answer questions about a specialized dataset, validate documents against a schema, transform records, or perform a domain-specific calculation. Alongside the code, it needs a knowledge graph, tool definitions, examples, permissions, and build information.

Normally, distribution means assembling a .wasm file, README, schemas, package metadata, model documentation, test fixtures, license, and perhaps a container image. The receiving system must determine which pieces belong together and whether they can be trusted.

A GTS package can distribute the tool as one self-describing artifact:

portable-tool.gts
├── tool.wasm
├── MCP tool and resource declarations
├── input and output schemas
├── selected domain knowledge graph
├── usage examples
├── test vectors and expected results
├── required host capabilities
├── network and filesystem permission declarations
├── build provenance and source revision
├── SBOM and license
└── publisher and reviewer signatures

The package can describe:

  • which tools the module provides;
  • what each tool accepts and returns;
  • what concepts and datasets it understands;
  • which host ABI it requires;
  • whether it expects network, filesystem, or clock access;
  • which version supersedes an earlier module;
  • what tests establish expected behavior.

A source AI can export a selected portion of its domain knowledge together with the tool that operates on it. A receiving AI or application can then:

  1. verify the package and publisher;
  2. inspect the tool’s declared capabilities before execution;
  3. review its knowledge and examples;
  4. decide whether its requested permissions are acceptable;
  5. load the WASM module in an appropriate sandbox;
  6. import or query the accompanying knowledge graph;
  7. run the included tests before trusting the tool.

An update can append a new module, changed schemas, migration notes, and review signatures while preserving the exact package that preceded it.

GTS does not execute the WASM module or grant it permissions. The host remains responsible for sandboxing, authorization, and runtime policy. GTS provides the portable unit that binds the code to its interface, knowledge, provenance, and verification evidence.


These examples show three different reasons for GTS to exist:

Example What GTS keeps together
Document review Layered claims, disagreement, evidence, model identity, and human decisions
Artwork reproduction Binary content, license, attribution, accessibility, provenance, and corrections
Portable AI tooling Executable code, machine-readable interfaces, knowledge, permissions, tests, and trust evidence

In every case, GTS remains the exchanged artifact—not the database, AI model, policy engine, or execution environment.


GTS encodes a graph as an append-only log of CBOR frames. The logical graph is the fold (replay) of the log. Growth is an append; "deletion" is suppression, never a physical removal; optimisation is a separate, explicitly lossy compaction. Concatenating two valid GTS files (cat) yields a valid GTS file whose fold is the value-union of the inputs.

GTS is ontology-independent. GTS is the primary distribution method for GMEOW, but GTS does not depend on GMEOW. A conformant reader does not need GMEOW vocabulary, OWL reasoning, domain specific rules, or agent-memory conventions to parse, verify, fold, or transport a GTS file.

The package family is gmeow-gts; the format is GTS. The package name is intentionally distinctive across ecosystems, while the CLI, import surface, and file extension remain the short gts and .gts forms where ecosystem rules permit.

This repository holds six interoperable full engines (Rust, Python, Go, TypeScript, Smalltalk/Pharo, Kotlin/JVM) that gate against one frozen, byte-exact conformance corpus and the specification that defines them. It also publishes a Rust-backed C ABI (libgts) and thin derived wrappers for C-compatible ecosystems; those wrappers consume the Rust engine through rust/capi/include/gts.h and are not new wire-format engines or CLI parity columns.

Table of contents

Why GTS?

Four properties define the format (full spec):

  1. CBOR all the way down (RFC 8949). One IETF-standardised binary encoding with native byte strings (no base64 tax), deterministic encoding (clean content hashes), and CBOR Sequences — concatenated items with no enclosing length, so append is cheap. A reader needs only a CBOR library.
  2. A durable transform catalog. Each frame's payload carries a stackable chain of codecs from an open, long-lived catalog (identity, gzip, zstd, zstd-rsyncable, cose-encrypt, …) — separating structure durability (CBOR + this spec, forever) from density and confidentiality (swappable codecs).
  3. Integrity by construction. Every frame carries an independent BLAKE3 self-hash and names its predecessor — a git-style content-addressed chain. Verification is parallel, a damaged frame is independently detectable, and the head id transitively commits to all history. Signatures and encryption (COSE, RFC 9052) are optional, layered, and algorithm-agile.
  4. Recursive composition (matryoshka). A payload, once its transforms are reversed, is just bytes — and a GTS file is just bytes. So a payload MAY itself be a complete signed GTS, wrapped in any transform, riding inside an encrypted field with its own header and chain.

Non-goals. GTS is explicitly not a database, query engine, reasoner, or mutation protocol. Random-access query, deep traversal, and SPARQL are the job of a transform target (.ttl, .nq, DuckDB, SQLite, …), not of GTS. It is a durable, self-describing interchange container — the narrow waist through which graphs and their referenced data travel.

Use GTS without GMEOW

GMEOW is a primary downstream consumer and reference profile family for GTS artifacts. The dependency direction is one-way: GMEOW rides on GTS; GTS does not require GMEOW.

A baseline reader needs the GTS wire-format rules, the codec catalog, and RDF term/fold semantics. It does not need a GMEOW ontology checkout, GMEOW-specific vocabulary, music-domain profile knowledge, or agent-memory conventions. Domain profiles can add validation rules above the transport layer, but they do not change the core parse, verification, or fold path.

Narrow-waist architecture

Applications and profiles
generic graphs | files | evidence | images | media packages | GMEOW | agent memory
|
v
GTS narrow waist
CBOR Sequence segments
deterministic-CBOR headers and frames
BLAKE3 id/prev chains
transform catalog
deterministic fold
opaque-node degradation
|
v
Storage and transport
filesystem | HTTP range | object storage | artifact registries | message buses

GTS is the small stable waist. Profiles and applications sit above it; storage and distribution systems sit below it. See docs/positioning.md for the full framing.

Applications

GTS supports several use cases without making any of them the project frame:

  • Dataset and ontology distribution: publish a verifiable graph package with the binary assets it names.
  • GMEOW distribution: ship GMEOW ontology packages and profiles as GTS artifacts.
  • Archives and file manifests: package directory trees with graph-native metadata and content-addressed blobs.
  • Evidence and custody chains: append observations, signatures, and sealed payloads without rewriting prior history.
  • Local-first graph synchronization: concatenate independently produced segments and fold the value-union.
  • Agent memory: model belief revision with suppression frames while preserving the original signed history. See Python gts.examples.agent_memory and Rust gmeow_gts::examples::agent_memory.
  • Graph database interchange: hand the folded graph state to N-Quads, SQLite, DuckDB, Parquet, or other transform targets.

Install

Full parity engines:

Language Package Install
Rust gmeow-gts (binary gts; source dir) cargo install gmeow-gts
Python gmeow-gts (module gts; source dir) pip install gmeow-gts
Go go.blackcatinformatics.ca/gts (source dir) go install go.blackcatinformatics.ca/gts/cmd/gts@latest
TypeScript @blackcatinformatics/gmeow-gts (source dir) npm i @blackcatinformatics/gmeow-gts
Smalltalk/Pharo Tonel + Metacello source package docker build -t gmeow-gts-smalltalk smalltalk
Kotlin/JVM Gradle source project cd kotlin && gradle installDist

Rust-backed C ABI and derived wrappers:

Surface Package index / remote Source directory Entry point
C ABI gmeow-gts-capi on crates.io; capi-v* releases rust/capi/ cargo build --manifest-path rust/capi/Cargo.toml
C++ Source-only wrapper cpp/ header-only RAII wrapper over libgts
.NET NuGet Gmeow.Gts deferred for the first publication wave dotnet/ Gmeow.Gts P/Invoke wrapper
PHP blackcatinformatics/gmeow-gts Packagist search, pending generated-root tag metadata php/ PHP FFI Composer package
Lua gmeow-gts LuaRocks search, pending first upload lua/ gmeow-gts LuaRocks LuaJIT FFI module
Swift Blackcat-Informatics/gmeow-gts Swift Package Index target swift/ Swift Package Manager wrapper via root Package.swift
Ruby gmeow-gts RubyGems search, pending first gem push ruby/ gmeow-gts FFI gem
R blackcat-informatics.r-universe.dev source index; universe config repo r/ gmeowgts package
Julia JuliaRegistries/General#158733, pending General registry merge julia/ GmeowGTS.jl package

The package family consistently uses the gmeow-gts distribution identity where ecosystem naming permits. Ecosystem-specific module/package names are shown above; the CLI binary stays gts, and GTS files keep the .gts extension. The Rust engine crate is gmeow-gts; the Rust-backed C ABI source crate is gmeow-gts-capi. Use the former for Rust library/CLI consumers and the latter when the desired artifact is libgts plus the stable gts.h ABI surface.

Quick start

Every engine exposes the same shape: read bytes into a Graph, verify the chain, fold to a value, and project to N-Quads — plus a writer for producing files.

Python

import gts
from pathlib import Path

# Read + verify + fold, then project to N-Quads or TriG
graph = gts.read(Path("package.gts").read_bytes())
print(gts.to_nquads(graph))
print(gts.to_trig(graph))

# Write a minimal graph
w = gts.Writer(profile="dist")
w.add_terms([
    gts.Term(gts.TermKind.IRI, "https://example.org/Cat"),
    gts.Term(gts.TermKind.IRI, "http://www.w3.org/2000/01/rdf-schema#label"),
    gts.Term(gts.TermKind.LITERAL, "Cat", lang="en"),
])
w.add_quads([(0, 1, 2, None)])
Path("cat.gts").write_bytes(w.to_bytes())

pip install 'gmeow-gts[rdf]' adds optional rdflib interop.

Rust

Add gmeow-gts = "0.9.9" to Cargo.toml. Optional feature builds use the standard Cargo shape gmeow-gts = { version = "0.9.9", default-features = false, features = [...] }.

use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let bytes = fs::read("package.gts")?;
    // read is total: (data, allow_segments, expected_head) -> Graph (never errors;
    // undecodable frames degrade to opaque nodes surfaced as diagnostics).
    let graph = gmeow_gts::reader::read(&bytes, false, None);
    println!("{}", gmeow_gts::nquads::to_nquads(&graph));
    println!("{}", gmeow_gts::trig::to_trig(&graph));
    Ok(())
}

cargo install gmeow-gts installs the gts binary. The Rust crate uses native RDF dataset, native RDF text-codec, native RDF/XML, and native in-memory store features rather than in-crate Oxigraph/OxRDF/Sophia adapters; CI keeps the wasm32-unknown-unknown all-features library build locked and audits that dependency tree. Advanced Rust feature flags, evented projection APIs, encryption/signing, proof generation, RDF/store adapters, and database export details live in rust/README.md.

Go

package main

import (
    "fmt"
    "os"

    "go.blackcatinformatics.ca/gts/nquads"
    "go.blackcatinformatics.ca/gts/reader"
)

func main() {
    data, _ := os.ReadFile("package.gts")
    g := reader.Read(data, false, nil) // (bytes, allowSegments, expectedHead)
    fmt.Print(nquads.ToNQuads(g))
}

TypeScript

import { Read, toNQuads } from "@blackcatinformatics/gmeow-gts";
import { readFileSync } from "node:fs";

const graph = Read(readFileSync("package.gts"), false);
console.log(toNQuads(graph));

Requires Node.js ≥ 22.16.0; ships as ES modules with type declarations. Browser bundle details live in ts/README.md.

Smalltalk/Pharo

The Smalltalk engine is a Pharo source engine delivered as Tonel packages with a Metacello baseline and a pinned Docker runtime. It participates in the top-level conformance corpus and the six-engine interop gate, including native BLAKE3/zstd/libsodium support, deterministic CBOR read/write, COSE Sign1 and Encrypt0 helpers, MMR proof verification, OpenPGP key extraction, the files profile, streamable compaction, from-nq, and the common gts CLI verbs. Rust-only extension verbs such as tar, dump, OKF, TriG, and relational exports remain explicit parity deferrals.

docker build -t gmeow-gts-smalltalk smalltalk
docker run --rm -v "$PWD:/workspace" --entrypoint /bin/sh gmeow-gts-smalltalk -lc \
  'sh /workspace/smalltalk/scripts/run-tests.sh'

Kotlin/JVM

The Kotlin engine is a native JVM implementation with Java-callable library APIs, Gradle build, deterministic CBOR/BLAKE3 primitives, zstd/gzip codecs, COSE Sign1/Encrypt0 helpers, OpenPGP key extraction, MMR proof verification, the files profile, streamable compaction, from-nq, and the common gts CLI verbs.

import ca.blackcatinformatics.gts.read
import ca.blackcatinformatics.gts.toNQuads
import java.nio.file.Files
import java.nio.file.Path

fun main() {
    val graph = read(Files.readAllBytes(Path.of("package.gts")), allowSegments = false)
    print(toNQuads(graph))
}
cd kotlin && gradle test detekt installDist
./build/install/gmeow-gts-kotlin/bin/gmeow-gts-kotlin fold ../vectors/01-minimal.gts

Runtime support policy: Python >=3.13, Node.js >=22.16.0, and Go 1.26.4 are intentional manifest floors. Older runtimes are unsupported so the engines can share one current CI and release matrix and use current standard-library/toolchain behavior without compatibility shims.

C ABI and ecosystem wrappers

rust/capi/ builds libgts from the Rust engine and exposes a stable C-compatible ABI for runtimes that can load native libraries. The ABI returns JSON reports or owned byte buffers for:

  • ABI/version/build metadata and capability discovery;
  • read/fold and verify reports;
  • registry-driven RDF text conversion for N-Quads, N-Triples, Turtle, TriG, RDF/XML, and the deterministic JSON-LD-star profile;
  • files-profile pack, unpack, and diff helpers;
  • structured error status, code, and detail fields.

Files-profile path helpers inherit the C ABI path contract: paths are NUL-terminated UTF-8 char * values. On Windows this does not cover every native wide-character path; future wide-character entry points would be additive ABI symbols under the compatibility policy.

The native compatibility policy is documented in rust/capi/README.md#compatibility-policy. GTS_ABI_VERSION is separate from package versions and from JSON report schema versions: package releases can advance without an ABI bump, and JSON report schemas can evolve without changing the native function boundary.

Every wrapper copies returned gts_buffer values into ecosystem-owned strings or byte containers and releases native memory with gts_buffer_free; structured errors are copied before gts_error_free. Wrappers are thin bindings over the Rust engine, not independent parsers, writers, or CLI parity engines. Wrappers must reject unsupported GTS_ABI_VERSION values clearly when loading a system-provided libgts; they should not continue silently against an unknown native contract.

The wrapper smoke tests use the shared GTS-WRAPPER-SMOKE-MATRIX: clean read, damaged diagnostic read, empty/malformed refusal, ABI/build metadata, structured parse errors, and package dry-run linkage. The matrix deliberately remains separate from the six full-engine parity columns.

Run the C ABI and wrapper smoke tests from the repository root:

bash rust/capi/scripts/smoke.sh
bash cpp/scripts/smoke.sh
bash dotnet/scripts/smoke.sh
bash php/scripts/smoke.sh
bash lua/scripts/smoke.sh
bash swift/scripts/smoke.sh
bash ruby/scripts/smoke.sh
bash r/scripts/smoke.sh
bash julia/scripts/smoke.sh

Run the credential-free wrapper package dry-runs from the repository root:

bash scripts/package_dry_run_wrappers.sh

The dry-run builds local package artifacts or package metadata for the C ABI, C++, Conan, vcpkg, .NET, PHP, Lua, Swift, Ruby, R, and Julia wrapper family without registry credentials, then links package consumers back to the shared wrapper smoke matrix. CI uploads the resulting dist/package-dry-runs/ evidence from the wrapper-package-dry-runs job. The PHP portion also generates the Packagist package root, validates it with Composer, installs it into a temporary path-repository consumer, and runs a PHP FFI smoke test against libgts.

Each wrapper README documents local toolchain requirements, libgts discovery (GTS_LIBGTS, GTS_LIB_DIR, or platform loader defaults where supported), ownership rules, threading expectations, and fallback container behavior where practical.

Installable libgts archives are built and checked by the C ABI packaging scripts:

archive="$(bash rust/capi/scripts/package.sh)"
bash rust/capi/scripts/verify-archive.sh "${archive}"

Release archives include include/gts.h, include/gts/gts.hpp, shared/static native libraries, pkg-config and CMake metadata, license files, checksums, SBOM evidence, and provenance attestations. Wrapper packages remain source-only and resolve a locally built or separately installed libgts.

Local C/C++ package-manager dry-runs use the first-party package name gmeow-gts:

bash scripts/package_dry_run_native_managers.sh

The Conan recipe builds the Rust-backed C ABI from the source tree and packages the same install layout as the release archive. The vcpkg overlay port validates the same layout from a local checkout by setting GMEOW_GTS_SOURCE_PATH; an upstream vcpkg PR should replace that local source hook with the tagged release source and its checksum. Both package-manager checks build the shared packaging/native-consumer CMake fixture and link Gts::gts.

Command-line interface

cargo install gmeow-gts, pip install gmeow-gts, npm i -g @blackcatinformatics/gmeow-gts, go install ..., or cd kotlin && gradle installDist each install a GTS command-line engine. The common verb surface is the cross-engine contract; engine-specific extras are listed after it when present. The C ABI wrappers above are library surfaces and intentionally do not add columns to this CLI parity contract. The full API/CLI parity contract lives in docs/GTS-API-CLI-PARITY.md.

gts info <file>...            per-segment composition ledger
gts fold <file>               fold to N-Quads on stdout
gts verify <file>... [--key KID:HEXPUB]   verify chains + COSE signatures
gts verify-proof <proof.json>  verify detached MMR proof JSON without the GTS file
gts heads <file>                 emit JSON segment heads and aggregate comparison digest
gts segments <file>              emit JSON segment byte ranges and layout inventory
gts missing --from-head <head> <file>   emit JSON byte ranges needed after a peer head
gts resume --after <frame-id> <file>    emit bytes after a verified frame boundary
gts extract-key <file>        print the embedded transport/verification key
gts ls <file>...              list segment digests, sizes, and media types
gts extract <file> <digest> [-o out] [--mt TYPE] [--include-suppressed]
gts cat -o <out> <file>...    validating composer: refuse degenerate inputs, then concatenate
gts compact <file> -o <out> --streamable [--seal-original] [--timestamp ISO]
gts pack <dir|file>... -o <out>   package files/directories into a GTS files profile
gts unpack <file> [-C <dir>] [--include-suppressed]   extract a files profile
gts diff <file> <directory>       compare a files profile to a directory
gts from-nq <in.nq> [-o <out>]  build a GTS from N-Quads (inverse of fold; '-' = stdin)

Python/Rust extensions:

gts to-sqlite <file> <out>      export the folded graph to a SQLite database
gts to-duckdb <file> <out>      export to DuckDB (Rust: --features duckdb)
gts to-parquet <file> <dir>     export to Parquet (Rust: --features duckdb)

Rust-only proof creation extension:

gts prove <file> <frame-id>      emit detached JSON proof from an index.mmr root

Rust-only OKF profile extension:

gts to-okf <file> --directory <dir> [--inline-body] [--base-iri <iri>]
                                  export an OKF-profile graph to a Markdown bundle
gts from-okf <dir> [-o out] [--inline-body] [--strict-links] [--base-iri <iri>]
                                  build a GTS from an OKF Markdown bundle

Rust-only inspection export extension:

gts dump <file> --directory <dir> [--include-suppressed] [--force] [--metadata-only]
                                  expand an archive into a directory dump

Rust/Go RDF 1.2 text-codec extension:

gts to-nt <file>                fold the default graph to N-Triples
gts from-nt <in.nt> [-o out]    build a GTS from N-Triples
gts to-rdfxml <file>            fold the default graph to RDF/XML
gts from-rdfxml <in.rdf> [-o out]   build a GTS from RDF/XML
gts to-turtle <file>            fold the default graph to Turtle
gts from-turtle <in.ttl> [-o out]   build a GTS from Turtle

Rust builds expose these verbs behind --features rdf-codecs; the Go module binary ships them by default.

Rust-only tar-compatible extension:

gts tar -c[z|--zstd]f <archive.gts|archive.tar[.gz|.zst]> <dir|file>...
                                  create a GTS or tar archive by extension
gts tar -xf <archive.gts|archive.tar[.gz|.zst]> [-C <dir>]
                                  extract with refuse-dangerous defaults
gts tar -tf <archive.gts|archive.tar[.gz|.zst]>
                                  list files-profile entries
gts tar -df <archive.gts|archive.tar[.gz|.zst]> <dir>
                                  compare archive entries to a directory

Exit codes: 0 clean · 1 diagnostics or input refused · 2 usage/IO error.

verify --key and extract-key are cross-engine (all six command-line engines parse the embedded OpenPGP transport key to the same fingerprint and emojihash, and verify COSE signatures identically). For example, gts extract-key prints a key's identity three ways — the hex fingerprint for machines and an emojihash for humans to compare at a glance:

$ gts extract-key signed.gts
kid:         93F32F9F1439F0FBA266331B6F4732092D747581
fingerprint: 93F3 2F9F 1439 F0FB A266 331B 6F47 3209 2D74 7581
emojihash:   🐷 🦆 🐵 🦋 🍎 🍐 🦊 🐸 🐟 🍒 🍎
-----BEGIN PGP PUBLIC KEY BLOCK-----

The emojihash (and OpenSSH-style randomart) are also published standalone as the visual-hashing crate, which the Rust engine depends on from crates.io and re-exports as gmeow_gts::emojihash.

from-nq is common across all six engines. Python, Rust, and Go also expose to-trig/from-trig for readable TriG graph-block interchange over the same folded RDF content. Rust and Go additionally expose to-nt/from-nt, to-rdfxml/from-rdfxml, and to-turtle/from-turtle for default-graph RDF text interchange through the same RDF 1.2 codec stack. Rust builds gate these RDF text-codec verbs behind --features rdf-codecs; the Go module binary ships them by default. The Rust OKF profile extension maps Markdown bundles to verifiable GTS package bytes and back behind --features okf; see docs/GTS-OKF.md. The Rust tar extension provides tar-style -c/-x/-t/-d commands over .gts and .tar files behind --features tar, with explicit --allow-* extraction opt-ins. Tar input import and gts tar -cf out.gts ... stream regular-file payloads through bounded chunks; folded to-tar export and zstd tar output still inherit the current folded-graph/backend buffering limits. The Rust dump extension writes a versioned inspection directory with folded N-Quads, JSONL tables, unfolded frame views, blob indexes, and files-profile content without duplicating large payload bytes by default; see docs/GTS-DUMP-DIR.md. The to-* relational exports are available in Python and Rust. Python DuckDB/Parquet exports need pip install 'gmeow-gts[db]'; Rust SQLite export shells out to sqlite3 by default. Rust DuckDB/Parquet exports are behind the no-dependency Cargo feature duckdb and shell out to the duckdb binary. Rust emits relational SQL rows directly to the runtime tool instead of building a complete SQL script in memory; transformed inline blobs are decoded only while writing the blobs row required by the stable schema. The CLI parity matrix is checked in CI against the six implemented command dispatch surfaces.

cat is raw byte concatenation with validation added, transformation never: it refuses dirty inputs, contributes-nothing segments, and compositions whose suppressions hide every folded quad.

Engine feature matrix

Capability Python Rust Go TypeScript Smalltalk/Pharo Kotlin/JVM
Baseline read/fold/verify yes yes yes yes yes yes
Writer yes yes yes yes yes yes
Shared conformance corpus yes yes yes yes yes yes
Deterministic-CBOR primitive/vector tests yes yes yes yes yes yes
zstd native codec yes yes yes yes yes yes
COSE signing and verification yes yes yes yes yes yes
COSE Encrypt0 helpers yes yes yes yes yes yes
Files profile pack/unpack/diff yes yes yes yes yes yes
Streamable compaction CLI yes yes yes yes yes yes
from-nq inverse yes yes yes yes yes yes
TriG transform yes yes yes no no no
Native RDF/store adapter rdflib extra rdf feature (native dataset model); native-store feature (native in-memory store) no no no no
SQLite/DuckDB/Parquet exports yes SQLite default; DuckDB/Parquet with duckdb feature no no no no
Package registry PyPI crates.io Go module npm Tonel/Metacello source Gradle source

The frozen vector corpus remains the compatibility oracle. The matrix summarizes public package surfaces for the six full engines; it is not a replacement for conformance tests. The C ABI and derived wrappers reuse the Rust engine through libgts and are validated by their smoke tests rather than by adding new full-engine columns here. The API declaration and command-level contract are maintained in docs/GTS-API-CLI-PARITY.md and docs/api-parity.json.

The file format in one minute

A GTS file is a CBOR Sequence (application/cbor-seq) of one or more segments. Published GTS artifacts use application/vnd.blackcat.gts+cbor-seq; the +cbor-seq suffix records that the file is a CBOR Sequence, not a single CBOR item. Each segment is a header map followed by zero or more frame maps. The header identifies the segment version, profile set, codec catalog, optional layout, dictionary, metadata, and header id; it does not carry frame type or predecessor state. Frames carry their type (t), optional transform/public/recipient/payload fields, predecessor link (prev), frame id (id), and optional signature (sig).

Frame ids are id fields computed as BLAKE3-256 over deterministic CBOR frame content with id and sig excluded. Each prev names the previous frame id within the segment, producing a content-addressed chain whose segment head transitively commits to its history.

GTS file (CBOR Sequence)
├── segment 0
│   ├── header {gts, v, prof, cat, layout?, dct?, meta?, id}
│   ├── frame  {t, x?, pub?, to?, d?, prev, id, sig?}
│   ├── frame  {t, x?, pub?, to?, d?, prev, id, sig?}
│   └── ...
└── segment 1 (appended via `cat`)
    ├── header {gts, v, prof, cat, layout?, dct?, meta?, id}
    └── frame  {t, x?, pub?, to?, d?, prev, id, sig?}

fold(file) = value-union of all segment graphs

Payloads carry a stackable codec chain; unknown codecs or held-back keys degrade a frame to an opaque node rather than failing the read. The full normative format is in docs/GTS-SPEC.md, with testable tier and vector-claim rules in docs/GTS-CONFORMANCE.md.

Conformance corpus

vectors/ holds the frozen, language-neutral conformance corpus — one <name>.gts (canonical bytes) and one <name>.expected.json (oracle-folded expectation) per case (minimal files, zstd/gzip frames, unknown-codec fallback, damaged frames, torn appends, suppression, multi-segment unions, streamable compaction, …). Every engine must fold identical bytes to identical expectations — that is what makes the six implementations interchangeable.

The Python reference implementation (gts.vectors) is the single source of truth. Regenerate the committed corpus and prove it's reproducible byte-for-byte:

cd python && uv run python scripts/gen_vectors.py
git diff --exit-code vectors        # no changes ⇒ reproducible

Validate the committed aggregate/scoped manifest metadata and validator guards without stamping a release revision:

just check-vector-manifest

Conformance tiers, named vector subsets, expected-result fields, diagnostics, and read/verify modes are defined in docs/GTS-CONFORMANCE.md.

Current CI-gated conformance status:

Engine Baseline Reader Streaming / Prefix Evidence Writer Validating Tool Profile-Aware Tool
Rust wire-core, total-reader, graph-fold read_to_sink_from_reader non-materializing sink API plus corpus equivalence and memory-helper gate deterministic compact oracle 25b CLI verify diagnostics files profile pack/unpack/diff in interop
Python corpus oracle and regenerated expected JSON prefix-fold Python tests source generator and compact oracle 25b CLI verify diagnostics files profile pack/unpack/diff in interop
Go wire-core, total-reader, graph-fold reader.ReadToSink non-materializing sink API plus corpus equivalence gate; fuzz seeded from vectors writer and compact tests CLI verify diagnostics files profile pack/unpack/diff in interop
TypeScript wire-core, total-reader, graph-fold browser foldStreamToSink non-materializing sink API plus corpus equivalence and memory-helper gate; foldStream remains graph-returning writer and compact tests CLI verify diagnostics files profile pack/unpack/diff in interop
Smalltalk/Pharo wire-core, total-reader, graph-fold via SUnit top-level corpus streamable layout checks and interop evidence; no non-materializing Streaming Reader claim deterministic writer, from-nq, compact oracle 25b, and files pack byte identity CLI verify diagnostics plus COSE/MMR/OpenPGP vector tests files profile pack/unpack/diff in interop
Kotlin/JVM wire-core, total-reader, graph-fold via Gradle tests streamable layout checks and interop evidence; no non-materializing Streaming Reader claim deterministic writer, from-nq, compact oracle 25b, and files pack byte identity CLI verify diagnostics plus COSE/MMR/OpenPGP vector tests files profile pack/unpack/diff in interop

Repository layout

gmeow-gts/
├── rust/        # Rust crate `gmeow-gts` + `gts` binary (pure Rust, wasm-friendly)
├── rust/capi/   # Rust-backed C ABI (`libgts`, gts.h, pkg-config/CMake metadata)
├── python/      # Python package `gmeow-gts` (module `gts`) + reference corpus generator
├── go/          # Go module go.blackcatinformatics.ca/gts
├── ts/          # TypeScript/npm package @blackcatinformatics/gmeow-gts
├── smalltalk/   # Pharo Tonel/Metacello engine + Docker CLI runtime
├── kotlin/      # Kotlin/JVM Gradle engine + CLI runtime
├── cpp/         # Header-only C++ RAII wrapper over the C ABI
├── dotnet/      # .NET P/Invoke wrapper over the C ABI
├── php/         # PHP FFI wrapper over the C ABI
├── lua/         # LuaJIT FFI wrapper over the C ABI
├── swift/       # Swift Package wrapper over the C ABI
├── ruby/        # Ruby FFI gem wrapper over the C ABI
├── r/           # R package wrapper over the C ABI
├── julia/       # Julia package wrapper over the C ABI
├── vectors/     # Frozen conformance corpus (*.gts + *.expected.json)
├── docs/        # GTS-SPEC.md (normative) + gts-reference.md
└── .github/     # CI (six parity engines, C ABI wrapper smoke tests, release workflows)

Building from source

Each implementation builds and tests independently from its own directory:

cd rust   && cargo test                              # unit + CLI + conformance
cd go     && go test ./...                            # unit + conformance
cd ts     && npm ci && npm test                       # compiles, runs against vectors/
cd python && uv sync --extra rdf && uv run pytest     # reference + conformance
cd kotlin && gradle test detekt                       # JVM parity tests + static analysis
docker build -t gmeow-gts-smalltalk smalltalk && \
  docker run --rm -v "$PWD:/workspace" --entrypoint /bin/sh gmeow-gts-smalltalk -lc \
  'sh /workspace/smalltalk/scripts/run-tests.sh'      # Pharo parity tests
bash rust/capi/scripts/smoke.sh                       # C ABI
bash cpp/scripts/smoke.sh                             # C++ wrapper
bash dotnet/scripts/smoke.sh                          # .NET wrapper
bash php/scripts/smoke.sh                             # PHP wrapper
bash lua/scripts/smoke.sh                             # Lua wrapper
bash swift/scripts/smoke.sh                           # Swift wrapper
bash ruby/scripts/smoke.sh                            # Ruby wrapper
bash r/scripts/smoke.sh                               # R wrapper
bash julia/scripts/smoke.sh                           # Julia wrapper

Or use the justfile: just test (all engines), just lint, just fmt, just gen-vectors, just check-vector-manifest, just interop, just fuzz-rust / just fuzz-go, just property-py, just property-ts, just audit, just wasm.

The property jobs use bounded defaults on pull requests and larger scheduled budgets. Replay those local equivalents with:

cd python && GTS_PROPERTY_EXAMPLES=300 uv run pytest tests/test_properties.py
cd ts && GTS_PROPERTY_RUNS=500 GTS_PROPERTY_SEED=20260623 npm run test:property

Repo-wide hygiene (formatting, SPDX/REUSE headers, YAML/Markdown/shell, secrets) runs through pre-commit run --all-files. CI runs Rust, Python, Go, and TypeScript on Linux, macOS, and Windows, the Smalltalk/Pharo and Kotlin/JVM parity jobs on Linux, the C ABI and wrapper smoke tests where their toolchains are practical, plus a live six-engine interop check (each parity engine reads every other's output), reader fuzzing, and per-ecosystem supply-chain scanning. Changes are tracked in CHANGELOG.md.

Versioning & releases

Each engine publishes to its native registry from this repo via a tag-triggered workflow:

Engine Registry Release tag Workflow
Rust gmeow-gts on crates.io (trusted publishing) rust-v* release-cargo.yaml
Python gmeow-gts on PyPI (trusted publishing) py-v* release-pypi.yml
Go go.blackcatinformatics.ca/gts plus GitHub Releases go/v* release-go.yaml
TypeScript @blackcatinformatics/gmeow-gts on npm (provenance) npm-v* release-npm.yaml
C ABI source crate gmeow-gts-capi on crates.io (bootstrap token first publish) capi-v* release-cargo-capi.yaml
C ABI native assets capi-v* GitHub Releases (immutable archives) capi-v* release-capi.yaml
Lua wrapper gmeow-gts LuaRocks search pending first upload lua-v* release-luarocks.yaml
Ruby gmeow-gts RubyGems search pending first gem push ruby-v* release-rubygems.yaml

Rust crate publication uses crates.io Trusted Publishing through GitHub Actions OIDC. Configure the gmeow-gts Trusted Publisher entry with owner/repo Blackcat-Informatics/gmeow-gts, workflow release-cargo.yaml, and environment (none). The normal Rust release path does not require a CARGO_REGISTRY_TOKEN repository secret.

The visual-hashing crate now publishes from its standalone repository: https://github.com/Blackcat-Informatics/visual-hashing. Its Trusted Publisher entry should use owner/repo Blackcat-Informatics/visual-hashing, workflow release.yml, and environment (none). The historical monorepo visual-hashing-v* release lane is retired.

The first gmeow-gts-capi crates.io publish uses the temporary CARGO_REGISTRY_TOKEN bootstrap secret in release-cargo-capi.yaml because crates.io Trusted Publishing can be configured only after the crate exists. File and complete the follow-on Trusted Publishing migration after that first version is visible on crates.io, then remove the bootstrap-token path.

Each release workflow verifies the tag matches the manifest version before publishing. The C ABI archive lane publishes installable libgts archives for wrapper ecosystems. Registry release automation for the source-only wrapper packages is intentionally separate unless a wrapper README or future release workflow says otherwise. Release artifacts carry GitHub SLSA provenance attestations. Go archives and C ABI archives plus registry-hosted Rust, Python, and TypeScript package files also carry SPDX SBOM attestations. Go and C ABI releases are immutable GitHub Releases that attach archives, checksums, and SPDX SBOMs as durable assets. Registry-hosted package files keep their durable provenance and SBOM evidence in GitHub's attestation store. Verify provenance with gh attestation verify <file> --repo Blackcat-Informatics/gmeow-gts; verify the SBOM predicate with --predicate-type https://spdx.dev/Document/v2.3. Swift Package Manager publication uses the repository root Package.swift, the plain semantic-version tag lane such as 0.9.4, and manual Swift Package Index submission of https://github.com/Blackcat-Informatics/gmeow-gts.git after the tag exists. RubyGems publication uses RubyGems Trusted Publishing through GitHub Actions OIDC. Configure the gmeow-gts pending Trusted Publisher entry with owner/repo Blackcat-Informatics/gmeow-gts, workflow release-rubygems.yaml, and environment (none). The Ruby gem is source-only and expects libgts to be provided by the host at runtime.

Wrapper package registry names and public release surfaces are:

Wrapper surface Registry/package Version or tag shape Verification status
C ABI source crate gmeow-gts-capi on crates.io capi-v<version> Downloaded and attestation-checked by wrapper verifier
C ABI native assets capi-v<version> GitHub Release capi-v<version> Downloaded, checksum-checked, release-verified, and attestation-checked
.NET NuGet Gmeow.Gts deferred; source directory Not in the first wrapper publication wave Source-tree and CI-smoke support only
PHP blackcatinformatics/gmeow-gts Packagist search; generated package-root commit <version> or v<version> on the generated package-root commit Metadata and source reference checked after generated-root tag metadata appears
LuaJIT gmeow-gts LuaRocks search; source directory <version>-1 from lua-v<version> Root manifest and rockspec download checked after first upload
Swift Blackcat-Informatics/gmeow-gts Swift Package Index target Plain semantic version tag, such as <version> Git tag checked and canonical SPI package URL recorded
Ruby gmeow-gts RubyGems search; source directory ruby-v<version> Metadata, .gem download, provenance, and SBOM attestations checked after first gem push
R blackcat-informatics.r-universe.dev source index; gmeowgts source dir <version> PACKAGES index and source tarball checked after r-universe build pickup
Julia JuliaRegistries/General#158733, then General registry GmeowGTS after merge <version> General registry package identity and version checked after registry PR merge
Conan/vcpkg First-party package name gmeow-gts; native packaging docs and vcpkg overlay port tagged source archive when upstreamed Local dry-runs only until upstream recipes land

At first-wave tracker closeout on 2026-06-22, the C ABI distribution, wrapper dry-run, registry-prep, release-verification, and per-ecosystem child issues for the source-only wrapper wave were closed. The .NET/NuGet lane was intentionally skipped for this wave; the .NET wrapper remains supported in the source tree and smoke tests. Julia General, R-universe, Swift Package Index, Packagist, LuaRocks, RubyGems, crates.io, Conan, and vcpkg follow their documented registry or upstream-review paths after the repository-side package surfaces and dry-runs are in place.

All wrapper packages are source-only bindings over the Rust C ABI. They do not bundle libgts and they are not independent GTS engines; users must install or build the matching libgts archive from the C ABI release lane.

The current SLSA posture is documented in GTS-RELEASE-SLSA.md: artifact attestations are treated as SLSA v1.0 Build Level 2 evidence, and Build Level 3 is not claimed until release builds move behind hardened reusable workflows and artifacts verify against the intended signer workflow identity.

Maintainers can run the public release smoke verifier after all tag workflows finish:

just verify-release <version> <visual-hashing-version>

Before publication or while registries are still propagating, run the deterministic planned-check report:

just verify-release-dry-run <version> <visual-hashing-version>

After wrapper package publication, use the wrapper-aware verifier:

just verify-wrapper-release-dry-run <version> <visual-hashing-version>
just verify-wrapper-release <version> <visual-hashing-version>

The same check is available as the manual verify-release.yml workflow. Enable the include_wrapper_packages input for wrapper releases. The verifier downloads the PyPI wheel/sdist, npm tarball, crates.io packages, wrapper artifacts where a downloadable registry artifact exists, and Go/C ABI release assets; verifies registry hashes/signatures/provenance; checks package metadata repository, homepage, and source-directory links where registries expose them; checks GitHub SLSA and SPDX SBOM attestations where release lanes generate them; and writes Markdown/JSON summaries under dist/release-verification/<version>/. The structured report keeps severity separate from release status so registry lag is reported as pending, bad metadata as metadata-mismatch, absent artifacts as missing, and visible release surfaces as published. For historical releases that predate SBOM and immutable-release hardening, pass --allow-legacy-release-gaps explicitly and treat warnings as release-record caveats.

Specification & docs

GTS is the primary distribution method for GMEOW, but GTS does not depend on GMEOW. The format and these engines stand on their own.

Contributing

Issues and pull requests are welcome. See CONTRIBUTING.md for the workflow; before opening a PR, run the relevant engine's tests and pre-commit run --all-files. Please also read the CODE_OF_CONDUCT.md. To report a vulnerability, follow SECURITY.md (do not open a public issue).

Contributions are accepted under the project's open licenses (Apache-2.0 OR MIT); see LICENSING.md and CONTRIBUTING.md for the terms.

License

Triple-licensed: MIT OR Apache-2.0 OR proprietary. Use this software under the terms of MIT or Apache-2.0, at your option. A separate commercial/proprietary license is also available — see LICENSING.md.

Every source file carries an SPDX MIT OR Apache-2.0 license header.

Copyright © 2026 Blackcat Informatics® Inc.

About

A single-file, content-addressed, append-only transport for RDF 1.2 graphs — four interoperable engines (Rust, Python, Go, TypeScript), one spec, one conformance corpus.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors