Skip to content

feat(vex): synthesize matches from SBOM for affected VEX statements#3464

Open
xnox wants to merge 2 commits into
anchore:mainfrom
xnox:vex-affected
Open

feat(vex): synthesize matches from SBOM for affected VEX statements#3464
xnox wants to merge 2 commits into
anchore:mainfrom
xnox:vex-affected

Conversation

@xnox

@xnox xnox commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Lets VEX affected / under_investigation statements add findings for
packages present in the SBOM but absent from grype's vulnerability
database — the gap that makes govulncheck-style VEX docs round-trip
into grype output today.

  • OpenVEX: after the existing ignored-match promotion loop, walk
    the package catalog and synthesize a match.Match for each statement
    whose product (or product+subcomponent) purl names a package. Version
    matching is exact, or wildcard when the statement omits a version —
    no implicit ranges, matching the OpenVEX spec.
  • CSAF: same synthesis, with status-aware version semantics from
    the spec:
    • last_affected → pkg.version <= stmt.version (ceiling)
    • first_affected → pkg.version >= stmt.version (floor)
    • known_affected / recommended / under_investigation → exact
    • fixed / known_not_affected → never synthesize
      Comparisons use grype/version via pkg.VersionFormat, so they are
      ecosystem-aware (semver, deb, rpm, apk, go-module, …). Statement
      qualifiers must be a subset of the package's qualifiers; type,
      namespace and name must match exactly.
  • The vexProcessorImplementation interface and ApplyVEX now take
    []pkg.Package, plumbed through findVEXMatches.
  • Synthesis keys on (vuln ID, package purl) and skips pairs already
    present in remaining or ignored matches, so the path never duplicates
    a DB-backed finding.
  • Behavior is gated by existing config (vex-add: [affected, under_investigation] + a matching ignore rule), so default scans are
    unchanged.

Closes #3145, completes the augment phase from #1365.

Motivation

Stock grype:

$ grype ./step
No vulnerabilities found

govulncheck against the same binary:

$ govulncheck -mode=binary ./step
Vulnerability #1: GO-2026-5030
    Found in: golang.org/x/net@v0.53.0
    Fixed in: golang.org/x/net@v0.55.0
… (and 5 more in x/net)

With this change plus an OpenVEX doc declaring those vulns affected:

$ grype ./step --vex affected.vex.json -c grype.yaml
NAME              INSTALLED  TYPE       VULNERABILITY  SEVERITY
golang.org/x/net  v0.53.0    go-module  GO-2026-5025   Unknown
golang.org/x/net  v0.53.0    go-module  GO-2026-5026   Unknown
… (one row per VEX statement)

The same machinery works for CSAF documents, where last_affected
produces a ceiling match (e.g. SBOM v0.50.0 matches last_affected v0.99.0 but is excluded by last_affected v0.10.0).

Test plan

  • go test ./grype/vex/... ./grype/ — passes, including the new
    unit tests
  • go build ./... and go vet ./grype/... clean
  • End-to-end: built grype with these changes, ran against a real
    Go binary (step) with both OpenVEX and CSAF documents, confirmed
    synthesized findings appear in table and json output formats
  • Test_UnaffectedFiltering still passes (verified independently;
    any local breakage was caused by a stale auto-generated
    listing.xxh64 fixture, not by this change)
  • Coverage on touched packages: grype/vex/csaf 46.3% → 62.8%,
    grype/vex/openvex 63.4% → 77.1%; no regression elsewhere

New tests

grype/vex/openvex/implementation_test.go:

  • TestAugmentMatches_SynthesizesFromPackageCatalog — 7 cases:
    affected synthesizes; under_investigation synthesizes; not_affected /
    fixed do not; purl mismatch does not; empty catalog does not;
    non-matching vulnerability in ignore rule does not.
  • TestAugmentMatches_DoesNotDuplicateExistingMatches — dedup against
    an existing DB-backed match.

grype/vex/csaf/implementation_test.go:

  • TestPackageMatchesStatement — 16 cases covering ceiling/floor/
    exact/wildcard plus name/namespace/type mismatches across go-module
    versions.
  • TestAugmentMatches_SynthesizesFromPackageCatalog — 9 cases covering
    each affected-like status (last_affected, first_affected,
    known_affected) against lower/equal/higher SBOM versions, plus
    fixed and known_not_affected negatives.
  • TestAugmentMatches_DoesNotDuplicateExistingMatches_CSAF — dedup.

🤖 Generated with Claude Code

Previously, AugmentMatches could only promote a vulnerability that grype's
DB had already found and that another rule had filtered into the ignored
list. When the DB had no record of a (vulnerability, package) pair, an
"affected" VEX statement naming that package was silently ignored, even
though the statement is the strongest possible claim that the package is
vulnerable. This left a visible gap versus tools like govulncheck, which
report findings the grype DB simply does not carry.

This change lets VEX `affected` / `under_investigation` statements
synthesize a finding directly from the SBOM:

  * The vexProcessorImplementation interface and ApplyVEX now receive the
    package catalog, plumbed through findVEXMatches in the vulnerability
    matcher.
  * OpenVEX: after the existing ignored-match loop, walk the catalog and
    add a match for each statement that names a package by purl. Version
    matching is exact (or wildcard when the statement omits a version),
    matching the OpenVEX spec — no implicit ranges.
  * CSAF: same synthesis loop, but with status-aware version semantics
    that follow the CSAF spec:
      - last_affected   → pkg.version <= stmt.version  (ceiling)
      - first_affected  → pkg.version >= stmt.version  (floor)
      - known_affected / recommended / under_investigation → exact
      - fixed / known_not_affected → never synthesize
    Comparisons use grype/version with pkg.VersionFormat, so they are
    ecosystem-aware (semver, deb, rpm, apk, go-module, etc.). Statement
    qualifiers must be a subset of the package's qualifiers; type,
    namespace and name must match exactly.

Dedup: synthesis keys on (vulnerability ID, package purl) and skips any
pair already present in the remaining or ignored match sets, so the new
path never duplicates a DB-backed finding.

Behavior is gated by the existing VEX configuration: users still need
`vex-add: [affected, under_investigation]` plus a matching ignore rule
for the synthesized matches to surface, so default scans are unchanged.

Tests:
  * grype/vex/openvex/implementation_test.go covers exact-match
    synthesis, status filtering, purl mismatch, empty catalog,
    ignore-rule vulnerability filtering, and dedup against existing
    matches.
  * grype/vex/csaf/implementation_test.go adds TestPackageMatchesStatement
    (16 cases for ceiling/floor/exact/wildcard + identity mismatches),
    TestAugmentMatches_SynthesizesFromPackageCatalog (9 cases per status
    against lower/equal/higher SBOM versions), and a dedup test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@surgut.co.uk>
Comment thread grype/vex/csaf/csaf.go Outdated
The affected/under_investigation synthesis added in the previous commit
walked the whole package catalog for every VEX statement, and the
innermost comparison re-parsed PURLs on each iteration (OpenVEX via
go-vex PurlMatches -> 2x packageurl.FromString; CSAF via
packageMatchesStatement -> 2x packageurl.FromString). The result was an
O(statements x packages) loop with O(statements x packages) PURL parses,
so cost grew quadratically with catalog size. At ~1000 packages this
added over a second of pure VEX work, and it roughly quadrupled each time
the catalog doubled.

This commit keeps the matching semantics identical but stops re-scanning
and re-parsing:

  * Package PURLs are parsed once and bucketed by (type, namespace, name)
    identity. PurlMatches / packageMatchesStatement both require those
    three to be equal, so a statement can only ever match packages sharing
    that key. Each statement now consults just the relevant bucket(s)
    instead of the entire catalog, turning the hot path from
    O(S x P) into roughly O(S + P + matches).

  * OpenVEX: candidate packages are gathered from the statement's product
    and subcomponent purls via the index. Image-wide statements (an
    image/context product with no subcomponents), which by definition
    match every package, are detected and still fall back to the full
    catalog so behavior is unchanged.

  * CSAF: per-advisory product purls are cached so
    CollectProductIdentificationHelpers (which walks the whole product
    tree) runs once per product ID instead of once per package, the
    per-vulnerability status map allocation is replaced with a fixed
    slice, and packageMatchesStatement is split so the parsed-purl form
    (packageMatchesParsed) is reused without re-parsing.

  * existingVulnPackageKeys uses Matches.Enumerate() instead of Sorted()
    since ordering is irrelevant there.

No behavior change: all existing grype/vex unit tests pass unchanged,
including the synthesis, status-filtering, dedup, and image-wide cases.

Benchmarks
----------
Measured with throwaway benchmarks (one affected statement per package,
package-as-product for OpenVEX / known_affected for CSAF), driving
AugmentMatches over catalogs of 577/1000/2000 packages:

OpenVEX (grype/vex/openvex):
  pkgs   before ns/op    after ns/op   speedup   before allocs  after allocs
   577   432,312,762      3,870,233      ~112x     3,004,023       18,556
  1000 1,310,622,527      7,120,679      ~184x     9,013,321       32,117
  2000 5,797,006,526     14,058,239      ~412x    36,026,849       64,164

CSAF (grype/vex/csaf):
  pkgs   before ns/op    after ns/op   speedup   before allocs  after allocs
   577   388,074,243      6,079,186       ~64x     2,011,000       18,003
  1000 1,086,835,592     14,001,203       ~78x     6,023,286       31,147
  2000 5,279,739,731     35,716,893      ~148x    24,046,765       62,204

Before, doubling the catalog (1000 -> 2000) multiplied time by ~4.4x
(OpenVEX) / ~4.9x (CSAF) -- quadratic. After, it is ~2x / ~2.5x -- linear.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@surgut.co.uk>
@xnox

xnox commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Pushed a follow-up (2c9f9d52) addressing the performance concerns on the affected/under_investigation synthesis path.

The original synthesis walked the whole package catalog for every VEX statement and re-parsed PURLs on each comparison — O(statements × packages) with as many PURL parses, i.e. quadratic in catalog size. Packages are now parsed once and bucketed by (type, namespace, name) identity (which PurlMatches/packageMatchesStatement already require to be equal), so each statement only looks at the handful of packages that share its identity. CSAF additionally caches per-advisory product purls so the product tree is walked once per product instead of once per package. Semantics are unchanged (image-wide statements still fall back to the full catalog) and all existing grype/vex tests pass unchanged.

Benchmark (one affected statement per package, driving AugmentMatches):

catalog before after speedup
OpenVEX 1000 pkgs 1.31 s 7.1 ms ~184×
OpenVEX 2000 pkgs 5.80 s 14 ms ~412×
CSAF 1000 pkgs 1.09 s 14 ms ~78×
CSAF 2000 pkgs 5.28 s 36 ms ~148×

Scaling is now linear instead of quadratic (doubling the catalog ~doubles the time rather than ~quadrupling it). At ~1000 packages the synthesis step is single-digit-to-low-double-digit milliseconds, so the affected-VEX supplement is now effectively negligible on top of a normal scan (which is dominated by SBOM cataloging and DB matching that take seconds).

@kzantow kzantow left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry a bit of this feedback might be a bit abstract, ping me if you have questions.

I think we can probably move forward with this if we could at least flip the package indexes to vulnerability indexes and avoid passing a package slice to the implementations. It would be great to maybe move some of the duplicated logic into the processor: e.g. add a new interface function for each implementation to return a list of Vulnerability objects (the affected records) instead of passing []pkg.Package into each implementation, the processor handles iterating the packages in a single spot and matching against indexed vulnerabilities. I think this will fit better into a future state. If you take away the augmentation, each vex file is effectively a VulnerabilityProvider and the VEX processor is effectively a Matcher that handles all package types. We already want to merge vulnerabilities, so I think the aforementioned changes will require less refactoring later.

Comment thread grype/vex/csaf/csaf.go
// only ever match a statement that shares this key. Indexing packages by it
// lets synthesis compare each statement against the handful of packages with a
// matching identity instead of the whole catalog.
func purlIdentityKey(p packageurl.PackageURL) string {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably use comparable types instead of concat'd strings as map keys -- these can definitely add up to noticable time with the scale we have sometimes, e.g.:

type purlKey struct {
  typ, namespace, name string
}

(same comment everywhere we are making strings as map keys)

want bool
}{
// last_affected: ceiling
{"last_affected matches lower pkg version", "pkg:golang/golang.org/x/net@v0.54.0", "pkg:golang/golang.org/x/net@v0.53.0", lastAffected, true},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these would be more clear if they used the property name form, e.g.:

{
  name: ...
  stmtPURL: ...
  ...

Comment thread grype/vex/csaf/csaf.go
// synthesisCandidate describes a (vulnerability, package) pair that should be
// added to grype's results based on a CSAF advisory, when no DB-backed match
// already exists.
type synthesisCandidate struct {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like we now have a synthesisCandidate which is converted to an advisoryMatch which is converted to a match.Match... could we avoid the middlemen on these and just directly create Vulnerability, IgnoreRule/IgnoreFilter, and Match objects or similar? We could move the IgnoreFilter indexing to some shared location

Comment thread grype/vex/csaf/csaf.go

// buildPackageIndex parses every package purl once and buckets the packages by
// their (type, namespace, name) identity.
func buildPackageIndex(pkgs []pkg.Package) map[string][]indexedPackage {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like both of these implementations have similar buildPackageIndex functions that operate on the full set of packages. I think we should flip this to instead build indexes for VEX rules. It's hard to say which would be a smaller set (definitely VEX rules with no vex files), but we operate a single-package at a time in the matcher world and already have indexes for IgnoreRules and other IgnoreFilters; I see a lot of similarity to matchers here and I think a future refactoring is likely to introduce per-package streaming, which this would be incompatible with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vex document of golang subcomponents failed to be matched

2 participants