Skip to content

Exploit GLL collocation in spectral-mode sum factorisation#5159

Draft
miguelcoolchips wants to merge 2 commits into
firedrakeproject:mainfrom
Corintis:gll-collocation-sumfact
Draft

Exploit GLL collocation in spectral-mode sum factorisation#5159
miguelcoolchips wants to merge 2 commits into
firedrakeproject:mainfrom
Corintis:gll-collocation-sumfact

Conversation

@miguelcoolchips

Copy link
Copy Markdown
Contributor

Disclosure: This PR was prepared by an AI agent (Anthropic's Claude, "Claude Code"), working from a benchmark investigation and supervised by a human (@miguelcoolchips). Please review accordingly — the diagnosis, fix and tests are described in full below so they can be checked independently.

Summary

The matrix-free action (matvec) of a collocated Gauss–Lobatto–Legendre (GLL) spectral operator on a genuine d-way tensor-product cell (e.g. a hexahedron) was not sum-factorising. Instead of the expected O(p^{d+1}) work it scaled like O(p^{2d}), which in 3D made high-order operator application several times slower than it should be (and made throughput fall with order instead of rising).

This was found while benchmarking Firedrake's matrix-free Poisson operator against MFEM. After the fix the 3D high-order matvec is ~3–5× faster and essentially tracks MFEM's templated HPC kernels.

before/after

3D matvec throughput at ~5M DOFs, single thread, hexahedra, firedrake_gll config (variant="spectral" + collocated GLL quadrature), best path vs MFEM HPC:

p before (MDOF/s) after (MDOF/s) gap vs MFEM before gap vs MFEM after
2 13.1 20.0 2.86× 1.87×
3 12.9 27.9 3.12× 1.44×
4 8.1 26.5 5.48× 1.67×
5 6.9 36.2 5.34× 1.02×

3D matrix assembly of the same form also improves (it shares the spectral code path): the p=4/p=5 gap vs MFEM goes from 3.13×/4.33× to 1.16×/1.27×.

The bug

On a tensor-product cell with a collocated quadrature rule (a variant="spectral" element integrated at its own GLL nodes), the value tabulation is the identity. But FInAT/TSFC materialise the tensor-product tabulation as a dense multi-dimensional GEM Literal that, for a 3D cell, factors exactly as

T[i, q_own, q_a, q_b] = factor2d[i, q_own] · const

i.e. a genuine 1D factor (factor2d — either the value identity or the 1D derivative matrix) spuriously broadcast as a constant over the other two quadrature directions q_a, q_b.

That broadcast hides two things from the optimiser:

  1. the low-rank / sum-factorisable structure (the contraction can no longer be hoisted out of the quadrature loops it doesn't depend on), and
  2. the collocation identity (the value tabulation is Identity, which delta_elimination could cancel — but only if it is expressed as a Delta, not a dense Literal).

The result is an O(p^{2d}) contraction. Measured flops/cell for the 3D collocated GLL Laplacian action scaled as (p+1)^5.36 before the fix; per DOF that is (p+1)^2.36, growing with order, instead of the ~O(1) per DOF that proper sum factorisation gives.

Interestingly the existing test_underintegration.py::test_laplace did not catch this: it exercises quadrilateral and TensorProductCell(quadrilateral, interval) (a 2-way nesting), where the pathology does not appear. It only shows up on a genuine d-way product (hexahedron / interval×interval×interval).

The fix

Two small, purely local and mathematically exact GEM rewrites in tsfc/spectral.py, applied to the integrand before sum factorisation:

  • drop_constant_literal_axes — for Indexed(Literal(arr), multiindex), drop any axis along which arr is constant and the index is a running Index. The literal genuinely does not depend on that index, so indexing it there is redundant; the dropped quadrature index still occurs in sibling factors (weights, Jacobian, the test-function tabulation), so the surrounding IndexSum is unaffected. This uncovers the genuine 2D (1D) factors.
  • convert_identity_literals — rewrite a resulting identity tabulation Indexed(Literal(I), (i, j)) as Delta(i, j), so the existing delta elimination cancels the redundant interpolation contraction.

Surviving Deltas (e.g. a collocation delta on a test-function index that cannot be cancelled against a sum index) are lowered to identity indexing by setting finalise_options = dict(replace_delta=True) (was False).

After the fix the same flop count scales as (p+1)^3.48 (per DOF (p+1)^0.48, ~flat like MFEM).

Both passes only fire on genuinely constant / identity tabulations, so non-collocated paths are untouched.

A note on where this lives

The two passes are generic GEM transformations and arguably belong in gem.optimise. Because gem is a separate repository (firedrakeproject/fiat), I kept the change self-contained inside tsfc/spectral.py (using only gem's public API) so it is a single, testable Firedrake PR. Happy to move them into gem and split into two PRs if maintainers prefer.

Test

Adds test_laplace_action to tests/tsfc/test_underintegration.py (the natural home — it already has test_laplace for the bilinear form but no action variant). It asserts the collocated GLL Laplacian action flop rate grows no faster than O(p^{d+1}):

  • quadrilateral (2D): rate < 3 — unchanged, a regression guard
  • hexahedron (3D): rate < 4

Without this fix the hexahedron case fails with action flop rates ~5.5–5.8 (i.e. O(p^{2d})); with the fix it is ~3.7.

Correctness

Verified the matrix-free action (assemble(a, mat_type="matfree").petscmat.mult) matches the independently-assembled sparse matrix to ~1e-16 for the Laplacian, mass, Helmholtz and higher-degree forms, on simplex and tensor-product cells, scalar and vector elements, full-Gauss and collocated-GLL quadrature, p=1..5. The full existing tests/tsfc suite passes.

Checklist

  • flake8 clean (tsfc/spectral.py, tests/tsfc/test_underintegration.py)
  • New regression test that fails without the change
  • Existing tests/tsfc suite passes
  • (draft) maintainer feedback on placement (tsfc vs gem) and whether to broaden coverage

Targeting main per the contributing guidelines (this is a general improvement, not a release-critical fix).

The matrix-free action of a collocated GLL spectral operator on a genuine
d-way tensor-product cell (e.g. a hexahedron) failed to sum factorise. The
identity value tabulation was materialised as a dense tabulation broadcast
over the *other* quadrature directions, so the operator application scaled as
O(p^{2d}) instead of O(p^{d+1}); in 3D this made the high-order matvec several
times slower than necessary.

Recover the structure with two local, exact GEM rewrites in spectral mode,
applied before sum factorisation:
  * drop_constant_literal_axes - drop running indices on tabulation-literal
    axes along which the literal is constant (the spurious broadcast),
    uncovering the genuine 1D factors;
  * convert_identity_literals - rewrite a resulting identity tabulation as a
    Kronecker Delta so delta elimination cancels the redundant interpolation
    contraction.
Lower any surviving Delta via finalise_options(replace_delta=True).

drop_constant_literal_axes only drops an index that is *anchored* elsewhere in
the expression (occurs outside a constant literal axis), so it never orphans an
index that an enclosing ComponentTensor/IndexSum still binds or sums (this would
otherwise break forms such as the rigid-body near-nullspace).

Add test_laplace_action to tests/tsfc/test_underintegration.py: on a
hexahedron the action flop rate was ~5.8 (O(p^{2d})) and is now < 4
(O(p^{d+1})); the test fails without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@miguelcoolchips miguelcoolchips force-pushed the gll-collocation-sumfact branch from 282aa8b to 3ccf0bf Compare June 9, 2026 16:01
@miguelcoolchips

Copy link
Copy Markdown
Contributor Author

Pushed a fix for the CI failure (tests/firedrake/regression/test_nullspace.py::test_near_nullspace).

Cause: drop_constant_literal_axes removed a running index from a constant
tabulation literal even when that index's only occurrence was that literal. For
most forms the dropped quadrature index survives in sibling factors (test
tabulation, weights, Jacobian), but in the rigid-body near-nullspace form it did
not, leaving an enclosing ComponentTensor/IndexSum referencing a vanished
index (assert set(multiindex) <= set(expression.free_indices) in gem.py).

Fix: only drop an axis whose index is anchored — i.e. it also occurs
somewhere other than a constant literal axis, guaranteeing it remains present.
The optimisation is unchanged for the collocated Laplacian (every quadrature
index is anchored by the test tabulation), and test_near_nullspace plus the
full tests/tsfc suite now pass.

@pbrubeck

pbrubeck commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

So this fails with interval x interval x interval, but not with quadrilateral x interval?

…forms

drop_constant_literal_axes could drop a running index that an enclosing
ComponentTensor/IndexSum still binds, leaving a dangling multiindex. With
assertions enabled this is the AssertionError at gem.py
`assert set(multiindex) <= set(expression.free_indices)`; with assertions
compiled out (as in CI) it surfaces later as a KeyError in the gem Memoizer.

The anchoring analysis in _anchored_indices is global, but index binding is
scoped and GEM is a shared DAG: the same Index object can occur non-constantly
under one binder yet appear only on a constant literal axis within the scope of
another binder, and the Memoizer dedups subtrees so it cannot tell the scopes
apart. RTCF mixed-space hybridisation (Concatenate of broken/trace spaces)
triggers this, e.g. test_slate_hybridization[1-RTCF-True].

Never anchor an index bound by a ComponentTensor/IndexSum (return
`anchored - bound`). The broadcast quadrature directions the pass targets are
free at this stage (the quadrature index_sum is applied after the drop in
Integrals), so the collocation optimisation is preserved: test_laplace_action
still passes and the full test_slate_hybridization suite is green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread tsfc/spectral.py
Comment on lines +190 to +191
"""Rewrite ``Indexed(Literal(I), (i, j))`` as ``Delta(i, j)`` for identity
tabulation matrices, exposing collocation structure to sum factorisation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to be adding a translator from an identity Literal into Delta, since by design we should not have generated the Literal in the first place.

GLL elements should tabulate to a gem.Delta. If this is not the case, then either the GLL rule is not being properly constructed/detected.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix should go in finat (under the FIAT repo). Ask Claude to attempt to reproduce an MFE by tabulating a hexahedral GLL element on a GLL quadrature, this should give a gem.Delta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants