Exploit GLL collocation in spectral-mode sum factorisation by miguelcoolchips · Pull Request #5159 · firedrakeproject/firedrake

miguelcoolchips · 2026-06-09T14:21:22Z

Disclosure: This PR was prepared by an AI agent (Anthropic's Claude, "Claude Code"), working from a benchmark investigation and supervised by a human (@miguelcoolchips). Please review accordingly — the diagnosis, fix and tests are described in full below so they can be checked independently.

Summary

The matrix-free action (matvec) of a collocated Gauss–Lobatto–Legendre (GLL) spectral operator on a genuine d-way tensor-product cell (e.g. a hexahedron) was not sum-factorising. Instead of the expected O(p^{d+1}) work it scaled like O(p^{2d}), which in 3D made high-order operator application several times slower than it should be (and made throughput fall with order instead of rising).

This was found while benchmarking Firedrake's matrix-free Poisson operator against MFEM. After the fix the 3D high-order matvec is ~3–5× faster and essentially tracks MFEM's templated HPC kernels.

3D matvec throughput at ~5M DOFs, single thread, hexahedra, firedrake_gll config (variant="spectral" + collocated GLL quadrature), best path vs MFEM HPC:

p	before (MDOF/s)	after (MDOF/s)	gap vs MFEM before	gap vs MFEM after
2	13.1	20.0	2.86×	1.87×
3	12.9	27.9	3.12×	1.44×
4	8.1	26.5	5.48×	1.67×
5	6.9	36.2	5.34×	1.02×

3D matrix assembly of the same form also improves (it shares the spectral code path): the p=4/p=5 gap vs MFEM goes from 3.13×/4.33× to 1.16×/1.27×.

The bug

On a tensor-product cell with a collocated quadrature rule (a variant="spectral" element integrated at its own GLL nodes), the value tabulation is the identity. But FInAT/TSFC materialise the tensor-product tabulation as a dense multi-dimensional GEM Literal that, for a 3D cell, factors exactly as

T[i, q_own, q_a, q_b] = factor2d[i, q_own] · const

i.e. a genuine 1D factor (factor2d — either the value identity or the 1D derivative matrix) spuriously broadcast as a constant over the other two quadrature directions q_a, q_b.

That broadcast hides two things from the optimiser:

the low-rank / sum-factorisable structure (the contraction can no longer be hoisted out of the quadrature loops it doesn't depend on), and
the collocation identity (the value tabulation is Identity, which delta_elimination could cancel — but only if it is expressed as a Delta, not a dense Literal).

The result is an O(p^{2d}) contraction. Measured flops/cell for the 3D collocated GLL Laplacian action scaled as (p+1)^5.36 before the fix; per DOF that is (p+1)^2.36, growing with order, instead of the ~O(1) per DOF that proper sum factorisation gives.

Interestingly the existing test_underintegration.py::test_laplace did not catch this: it exercises quadrilateral and TensorProductCell(quadrilateral, interval) (a 2-way nesting), where the pathology does not appear. It only shows up on a genuine d-way product (hexahedron / interval×interval×interval).

The fix

Two small, purely local and mathematically exact GEM rewrites in tsfc/spectral.py, applied to the integrand before sum factorisation:

drop_constant_literal_axes — for Indexed(Literal(arr), multiindex), drop any axis along which arr is constant and the index is a running Index. The literal genuinely does not depend on that index, so indexing it there is redundant; the dropped quadrature index still occurs in sibling factors (weights, Jacobian, the test-function tabulation), so the surrounding IndexSum is unaffected. This uncovers the genuine 2D (1D) factors.
convert_identity_literals — rewrite a resulting identity tabulation Indexed(Literal(I), (i, j)) as Delta(i, j), so the existing delta elimination cancels the redundant interpolation contraction.

Surviving Deltas (e.g. a collocation delta on a test-function index that cannot be cancelled against a sum index) are lowered to identity indexing by setting finalise_options = dict(replace_delta=True) (was False).

After the fix the same flop count scales as (p+1)^3.48 (per DOF (p+1)^0.48, ~flat like MFEM).

Both passes only fire on genuinely constant / identity tabulations, so non-collocated paths are untouched.

A note on where this lives

The two passes are generic GEM transformations and arguably belong in gem.optimise. Because gem is a separate repository (firedrakeproject/fiat), I kept the change self-contained inside tsfc/spectral.py (using only gem's public API) so it is a single, testable Firedrake PR. Happy to move them into gem and split into two PRs if maintainers prefer.

Test

Adds test_laplace_action to tests/tsfc/test_underintegration.py (the natural home — it already has test_laplace for the bilinear form but no action variant). It asserts the collocated GLL Laplacian action flop rate grows no faster than O(p^{d+1}):

quadrilateral (2D): rate < 3 — unchanged, a regression guard
hexahedron (3D): rate < 4

Without this fix the hexahedron case fails with action flop rates ~5.5–5.8 (i.e. O(p^{2d})); with the fix it is ~3.7.

Correctness

Verified the matrix-free action (assemble(a, mat_type="matfree").petscmat.mult) matches the independently-assembled sparse matrix to ~1e-16 for the Laplacian, mass, Helmholtz and higher-degree forms, on simplex and tensor-product cells, scalar and vector elements, full-Gauss and collocated-GLL quadrature, p=1..5. The full existing tests/tsfc suite passes.

Checklist

flake8 clean (tsfc/spectral.py, tests/tsfc/test_underintegration.py)
New regression test that fails without the change
Existing tests/tsfc suite passes
(draft) maintainer feedback on placement (tsfc vs gem) and whether to broaden coverage

Targeting main per the contributing guidelines (this is a general improvement, not a release-critical fix).

The matrix-free action of a collocated GLL spectral operator on a genuine d-way tensor-product cell (e.g. a hexahedron) failed to sum factorise. The identity value tabulation was materialised as a dense tabulation broadcast over the *other* quadrature directions, so the operator application scaled as O(p^{2d}) instead of O(p^{d+1}); in 3D this made the high-order matvec several times slower than necessary. Recover the structure with two local, exact GEM rewrites in spectral mode, applied before sum factorisation: * drop_constant_literal_axes - drop running indices on tabulation-literal axes along which the literal is constant (the spurious broadcast), uncovering the genuine 1D factors; * convert_identity_literals - rewrite a resulting identity tabulation as a Kronecker Delta so delta elimination cancels the redundant interpolation contraction. Lower any surviving Delta via finalise_options(replace_delta=True). drop_constant_literal_axes only drops an index that is *anchored* elsewhere in the expression (occurs outside a constant literal axis), so it never orphans an index that an enclosing ComponentTensor/IndexSum still binds or sums (this would otherwise break forms such as the rigid-body near-nullspace). Add test_laplace_action to tests/tsfc/test_underintegration.py: on a hexahedron the action flop rate was ~5.8 (O(p^{2d})) and is now < 4 (O(p^{d+1})); the test fails without the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

miguelcoolchips · 2026-06-09T16:02:19Z

Pushed a fix for the CI failure (tests/firedrake/regression/test_nullspace.py::test_near_nullspace).

Cause: drop_constant_literal_axes removed a running index from a constant
tabulation literal even when that index's only occurrence was that literal. For
most forms the dropped quadrature index survives in sibling factors (test
tabulation, weights, Jacobian), but in the rigid-body near-nullspace form it did
not, leaving an enclosing ComponentTensor/IndexSum referencing a vanished
index (assert set(multiindex) <= set(expression.free_indices) in gem.py).

Fix: only drop an axis whose index is anchored — i.e. it also occurs
somewhere other than a constant literal axis, guaranteeing it remains present.
The optimisation is unchanged for the collocated Laplacian (every quadrature
index is anchored by the test tabulation), and test_near_nullspace plus the
full tests/tsfc suite now pass.

pbrubeck · 2026-06-09T20:44:57Z

So this fails with interval x interval x interval, but not with quadrilateral x interval?

…forms drop_constant_literal_axes could drop a running index that an enclosing ComponentTensor/IndexSum still binds, leaving a dangling multiindex. With assertions enabled this is the AssertionError at gem.py `assert set(multiindex) <= set(expression.free_indices)`; with assertions compiled out (as in CI) it surfaces later as a KeyError in the gem Memoizer. The anchoring analysis in _anchored_indices is global, but index binding is scoped and GEM is a shared DAG: the same Index object can occur non-constantly under one binder yet appear only on a constant literal axis within the scope of another binder, and the Memoizer dedups subtrees so it cannot tell the scopes apart. RTCF mixed-space hybridisation (Concatenate of broken/trace spaces) triggers this, e.g. test_slate_hybridization[1-RTCF-True]. Never anchor an index bound by a ComponentTensor/IndexSum (return `anchored - bound`). The broadcast quadrature directions the pass targets are free at this stage (the quadrature index_sum is applied after the drop in Integrals), so the collocation optimisation is preserved: test_laplace_action still passes and the full test_slate_hybridization suite is green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pbrubeck · 2026-06-13T16:34:10Z

+    """Rewrite ``Indexed(Literal(I), (i, j))`` as ``Delta(i, j)`` for identity
+    tabulation matrices, exposing collocation structure to sum factorisation.


I don't think we need to be adding a translator from an identity Literal into Delta, since by design we should not have generated the Literal in the first place.

GLL elements should tabulate to a gem.Delta. If this is not the case, then either the GLL rule is not being properly constructed/detected.

This fix should go in finat (under the FIAT repo). Ask Claude to attempt to reproduce an MFE by tabulating a hexahedral GLL element on a GLL quadrature, this should give a gem.Delta.

miguelcoolchips force-pushed the gll-collocation-sumfact branch from 282aa8b to 3ccf0bf Compare June 9, 2026 16:01

pbrubeck reviewed Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploit GLL collocation in spectral-mode sum factorisation#5159

Exploit GLL collocation in spectral-mode sum factorisation#5159
miguelcoolchips wants to merge 2 commits into
firedrakeproject:mainfrom
Corintis:gll-collocation-sumfact

miguelcoolchips commented Jun 9, 2026

Uh oh!

miguelcoolchips commented Jun 9, 2026

Uh oh!

pbrubeck commented Jun 9, 2026

Uh oh!

pbrubeck Jun 13, 2026

Uh oh!

pbrubeck Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"""Rewrite ``Indexed(Literal(I), (i, j))`` as ``Delta(i, j)`` for identity
		tabulation matrices, exposing collocation structure to sum factorisation.

Conversation

miguelcoolchips commented Jun 9, 2026

Summary

The bug

The fix

A note on where this lives

Test

Correctness

Checklist

Uh oh!

miguelcoolchips commented Jun 9, 2026

Uh oh!

pbrubeck commented Jun 9, 2026

Uh oh!

pbrubeck Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

pbrubeck Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants