docs: add draft threat model + SECURITY.md/AGENTS.md discoverability#3734
docs: add draft threat model + SECURITY.md/AGENTS.md discoverability#3734potiuk wants to merge 7 commits into
Conversation
| | --- | --- | --- | --- | | ||
| | Object-graph serialization (native, per language) | `fory.serialize` / `deserialize` | the core; instantiates registered types from bytes | **In** *(documented)* | | ||
| | Cross-language (xlang) serialization | xlang `serialize`/`deserialize` | type mapping across languages | **In** *(documented)* | | ||
| | Row format / zero-copy | row encoders | reads fields in place from a buffer | **In** *(documented)* | |
There was a problem hiding this comment.
Row Format is out of scope of untrusted data, it must be used with trusted data. In https://github.com/apache/fory/blob/main/docs/security/deserialization.md, we clarify for usage for trusted data only.
|
|
||
| 7. Confirm Fory disclaims payload integrity/authenticity/confidentiality (no MAC/sig/encryption) and is not a sandbox for registered types' own logic. → §9. | ||
| 8. Any other recurring scanner/fuzzer false positives the PMC already knows about, to seed §11a (e.g. reflection/Unsafe usage, codegen)? → §11a. | ||
| 9. **Meta:** Fory has no in-repo `SECURITY.md` and an `AGENTS.md` that is a developer/agent guide. This engagement adds `SECURITY.md` + `THREAT_MODEL.md` and wires `AGENTS.md → SECURITY.md → THREAT_MODEL.md`. Confirm the model should live in-repo (as proposed) vs. on the website, and who owns revisions. The existing config-guide "Security" section becomes a pointer to this model. → §1. |
There was a problem hiding this comment.
We have https://github.com/apache/fory/blob/main/docs/security/deserialization.md now, could the THREAD_MODEL.md be moved into docs/security/threat-model.md ?
| ## §3 Out of scope (explicit non-goals) | ||
|
|
||
| - **The integrity / authenticity / confidentiality of the serialized bytes.** Fory deserializes what it is given; it does not authenticate, MAC, or encrypt payloads. If bytes can be tampered with in transit/at rest, that is the application's problem to solve (sign/encrypt before handing to Fory). *(inferred)* | ||
| - **Anything when the caller disables class registration on an untrusted payload source.** `requireClassRegistration(false)` is a documented, deliberately-available footgun; using it against attacker-controlled bytes is out of the model's protection (see §5a/§9). *(documented — config: "Disabling may allow unknown classes to be deserialized, potentially causing security risks")* |
There was a problem hiding this comment.
Disabling class registration should take arbitrary class/gadget materialization out of the model, but not every security outcome. The current deserialization model still treats runtime safety, disproportionate allocation, cleanup/retained state, and explicit policy bypasses as security boundaries for untrusted bytes. A crash, OOM, retained-state bug, or native memory-safety issue should not become out-of-model only because the reproducer uses registration disabled.
| - **Registered-type-only instantiation (default).** With `requireClassRegistration(true)` (the default), deserialization instantiates only types the application registered, so attacker bytes cannot drive Fory to construct an arbitrary class. *Violation symptom:* an unregistered/unexpected type is instantiated from input under the default config. *Severity:* security-critical (this is the deserialization-RCE defense). *(documented that registration is required by default + that disabling causes risk; the unbypassability guarantee is the claim to confirm)* | ||
| - **Bounded recursion depth.** Deserialization beyond `maxDepth` (default 50) throws rather than recursing unbounded. *Violation symptom:* stack overflow / unbounded recursion from crafted nesting under the default. *Severity:* security-critical (DoS). *(documented — config table)* | ||
| - **Memory safety on malformed input — language-conditional.** In managed-runtime implementations, malformed bytes yield an exception, not memory corruption. For the **C++** implementation this is the load-bearing property to confirm (malformed-input fuzzing of the C++ decoder). *Violation symptom:* OOB read/write, crash. *Severity:* security-critical. *(inferred — confirm per language)* | ||
| - **Resource bounds beyond depth — UNSPECIFIED.** Whether a crafted payload can force large allocation / CPU blowup within the depth limit (e.g. huge declared collection sizes) is a bug or expected is open; the model needs a line (§14). *(inferred; maxDepth documented)* |
There was a problem hiding this comment.
This is stale against the current security doc. docs/security/deserialization.md now draws the resource line: no disproportionate allocation before bytes are supplied or proven readable, no stream buffer growth to attacker-declared sizes before exact read/skip, and proportional checks before collection preallocation. I would replace this open question with a link to that model, otherwise future triage will treat an already-settled rule as undefined.
|
|
||
| What the project treats as in scope and out of scope, the security | ||
| properties it provides and disclaims, the adversary model, and how | ||
| findings are triaged are documented in [THREAT_MODEL.md](./THREAT_MODEL.md). |
There was a problem hiding this comment.
Since main now has docs/security/, I would avoid making a new top-level threat model the only policy link from SECURITY.md. Root SECURITY.md is the right place for the private reporting channel and GitHub discoverability; the model entry point should be docs/security/index.md, with the detailed deserialization rules under docs/security/deserialization.md. That keeps scanners and humans on one security-doc hierarchy instead of splitting authority between root and docs/security.
|
@potiuk I think we can remove duplciate/stale content with deserialization.md, then move THREAT_MODEL.md into docs/security//thread-model.md? Will it still work for security scan and discoverability? |
|
Both changes are fine — go for it. The scan agent doesn't care where the file physically lives; what it follows is the discoverability chain: AGENTS.md -> SECURITY.md -> the threat model. So moving the file under docs/security/ works as long as the pointer in SECURITY.md is updated to the new path. Since AGENTS.md already lists docs/security/deserialization.md as loadable guidance, adding the threat-model path there too is the natural home for it. On the dedup: removing content that just restates deserialization.md is good hygiene — that doc can stay the authoritative source for the untrusted-deserialization boundary, and the threat model can reference it rather than duplicate it. The thing to keep in the threat model is the project-level framing the scan needs: what Fory considers in vs. out of scope, the trust boundaries and user roles, and the known false-positives — i.e. the parts deserialization.md doesn't already cover. One tiny thing: the path you wrote is "thread-model.md" — I'm assuming that's a typo for "threat-model.md". Whatever filename you settle on is fine, just keep SECURITY.md pointing at it. Happy to update the PR to match once you've decided on the final layout, or you can push directly — either works. |
Generated-by: Claude Code
Generated-by: Claude Code
a4b40e5 to
78ea160
Compare
What this is
A draft threat model for Apache Fory, proposed by the ASF Security team for the Fory PMC to review, correct, or reject. It is a starting point for discussion, not a finished document.
This PR:
THREAT_MODEL.md— the draft model, following the ASF Security threat-model rubric;SECURITY.md— a short security policy that links the threat model;## Securitysection toAGENTS.md, so the chainAGENTS.md → SECURITY.md → THREAT_MODEL.mdis mechanically discoverable by automated security scanners.How to read it
Every claim is provenance-tagged: (documented) (from Fory's own docs/repo), (inferred) (reasoned from the architecture, not yet confirmed), (maintainer) (confirmed by the PMC). This v0 is ~20 documented / ~26 inferred. The §14 Open questions section collects every inferred claim into waves for the PMC to confirm or correct — that is where review time is best spent. The highest-impact ones:
requireClassRegistration(true), only registered types are instantiated from untrusted bytes" is a committed property, and whether findings that requirerequireClassRegistration(false)are out-of-model / non-default (wave 1);maxDepth(waves 2).Nothing here is a requirement — the model is for the PMC to own. Comment inline, edit the branch, or reply on the email thread.
AI Usage Disclosure
THREAT_MODEL.mdwas drafted by the ASF Security team's threat-model tooling (Claude) from Apache Fory's public documentation and repository, following the Scovetta rubric.SECURITY.mdand theAGENTS.mdSecurity section are templated scaffolding.