Exploit GLL collocation in spectral-mode sum factorisation#5159
Exploit GLL collocation in spectral-mode sum factorisation#5159miguelcoolchips wants to merge 2 commits into
Conversation
The matrix-free action of a collocated GLL spectral operator on a genuine
d-way tensor-product cell (e.g. a hexahedron) failed to sum factorise. The
identity value tabulation was materialised as a dense tabulation broadcast
over the *other* quadrature directions, so the operator application scaled as
O(p^{2d}) instead of O(p^{d+1}); in 3D this made the high-order matvec several
times slower than necessary.
Recover the structure with two local, exact GEM rewrites in spectral mode,
applied before sum factorisation:
* drop_constant_literal_axes - drop running indices on tabulation-literal
axes along which the literal is constant (the spurious broadcast),
uncovering the genuine 1D factors;
* convert_identity_literals - rewrite a resulting identity tabulation as a
Kronecker Delta so delta elimination cancels the redundant interpolation
contraction.
Lower any surviving Delta via finalise_options(replace_delta=True).
drop_constant_literal_axes only drops an index that is *anchored* elsewhere in
the expression (occurs outside a constant literal axis), so it never orphans an
index that an enclosing ComponentTensor/IndexSum still binds or sums (this would
otherwise break forms such as the rigid-body near-nullspace).
Add test_laplace_action to tests/tsfc/test_underintegration.py: on a
hexahedron the action flop rate was ~5.8 (O(p^{2d})) and is now < 4
(O(p^{d+1})); the test fails without the fix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
282aa8b to
3ccf0bf
Compare
|
Pushed a fix for the CI failure ( Cause: Fix: only drop an axis whose index is anchored — i.e. it also occurs |
|
So this fails with interval x interval x interval, but not with quadrilateral x interval? |
…forms drop_constant_literal_axes could drop a running index that an enclosing ComponentTensor/IndexSum still binds, leaving a dangling multiindex. With assertions enabled this is the AssertionError at gem.py `assert set(multiindex) <= set(expression.free_indices)`; with assertions compiled out (as in CI) it surfaces later as a KeyError in the gem Memoizer. The anchoring analysis in _anchored_indices is global, but index binding is scoped and GEM is a shared DAG: the same Index object can occur non-constantly under one binder yet appear only on a constant literal axis within the scope of another binder, and the Memoizer dedups subtrees so it cannot tell the scopes apart. RTCF mixed-space hybridisation (Concatenate of broken/trace spaces) triggers this, e.g. test_slate_hybridization[1-RTCF-True]. Never anchor an index bound by a ComponentTensor/IndexSum (return `anchored - bound`). The broadcast quadrature directions the pass targets are free at this stage (the quadrature index_sum is applied after the drop in Integrals), so the collocation optimisation is preserved: test_laplace_action still passes and the full test_slate_hybridization suite is green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| """Rewrite ``Indexed(Literal(I), (i, j))`` as ``Delta(i, j)`` for identity | ||
| tabulation matrices, exposing collocation structure to sum factorisation. |
There was a problem hiding this comment.
I don't think we need to be adding a translator from an identity Literal into Delta, since by design we should not have generated the Literal in the first place.
GLL elements should tabulate to a gem.Delta. If this is not the case, then either the GLL rule is not being properly constructed/detected.
There was a problem hiding this comment.
This fix should go in finat (under the FIAT repo). Ask Claude to attempt to reproduce an MFE by tabulating a hexahedral GLL element on a GLL quadrature, this should give a gem.Delta.
Summary
The matrix-free action (matvec) of a collocated Gauss–Lobatto–Legendre (GLL) spectral operator on a genuine d-way tensor-product cell (e.g. a hexahedron) was not sum-factorising. Instead of the expected
O(p^{d+1})work it scaled likeO(p^{2d}), which in 3D made high-order operator application several times slower than it should be (and made throughput fall with order instead of rising).This was found while benchmarking Firedrake's matrix-free Poisson operator against MFEM. After the fix the 3D high-order matvec is ~3–5× faster and essentially tracks MFEM's templated HPC kernels.
3D matvec throughput at ~5M DOFs, single thread, hexahedra,
firedrake_gllconfig (variant="spectral"+ collocated GLL quadrature), best path vs MFEM HPC:3D matrix assembly of the same form also improves (it shares the spectral code path): the p=4/p=5 gap vs MFEM goes from 3.13×/4.33× to 1.16×/1.27×.
The bug
On a tensor-product cell with a collocated quadrature rule (a
variant="spectral"element integrated at its own GLL nodes), the value tabulation is the identity. But FInAT/TSFC materialise the tensor-product tabulation as a dense multi-dimensional GEMLiteralthat, for a 3D cell, factors exactly asi.e. a genuine 1D factor (
factor2d— either the value identity or the 1D derivative matrix) spuriously broadcast as a constant over the other two quadrature directionsq_a, q_b.That broadcast hides two things from the optimiser:
Identity, whichdelta_eliminationcould cancel — but only if it is expressed as aDelta, not a denseLiteral).The result is an
O(p^{2d})contraction. Measured flops/cell for the 3D collocated GLL Laplacian action scaled as(p+1)^5.36before the fix; per DOF that is(p+1)^2.36, growing with order, instead of the~O(1)per DOF that proper sum factorisation gives.Interestingly the existing
test_underintegration.py::test_laplacedid not catch this: it exercisesquadrilateralandTensorProductCell(quadrilateral, interval)(a 2-way nesting), where the pathology does not appear. It only shows up on a genuine d-way product (hexahedron/interval×interval×interval).The fix
Two small, purely local and mathematically exact GEM rewrites in
tsfc/spectral.py, applied to the integrand before sum factorisation:drop_constant_literal_axes— forIndexed(Literal(arr), multiindex), drop any axis along whicharris constant and the index is a runningIndex. The literal genuinely does not depend on that index, so indexing it there is redundant; the dropped quadrature index still occurs in sibling factors (weights, Jacobian, the test-function tabulation), so the surroundingIndexSumis unaffected. This uncovers the genuine 2D (1D) factors.convert_identity_literals— rewrite a resulting identity tabulationIndexed(Literal(I), (i, j))asDelta(i, j), so the existing delta elimination cancels the redundant interpolation contraction.Surviving
Deltas (e.g. a collocation delta on a test-function index that cannot be cancelled against a sum index) are lowered to identity indexing by settingfinalise_options = dict(replace_delta=True)(wasFalse).After the fix the same flop count scales as
(p+1)^3.48(per DOF(p+1)^0.48, ~flat like MFEM).Both passes only fire on genuinely constant / identity tabulations, so non-collocated paths are untouched.
A note on where this lives
The two passes are generic GEM transformations and arguably belong in
gem.optimise. Becausegemis a separate repository (firedrakeproject/fiat), I kept the change self-contained insidetsfc/spectral.py(using onlygem's public API) so it is a single, testable Firedrake PR. Happy to move them intogemand split into two PRs if maintainers prefer.Test
Adds
test_laplace_actiontotests/tsfc/test_underintegration.py(the natural home — it already hastest_laplacefor the bilinear form but no action variant). It asserts the collocated GLL Laplacian action flop rate grows no faster thanO(p^{d+1}):quadrilateral(2D): rate < 3 — unchanged, a regression guardhexahedron(3D): rate < 4Without this fix the hexahedron case fails with action flop rates ~5.5–5.8 (i.e.
O(p^{2d})); with the fix it is ~3.7.Correctness
Verified the matrix-free action (
assemble(a, mat_type="matfree").petscmat.mult) matches the independently-assembled sparse matrix to ~1e-16 for the Laplacian, mass, Helmholtz and higher-degree forms, on simplex and tensor-product cells, scalar and vector elements, full-Gauss and collocated-GLL quadrature, p=1..5. The full existingtests/tsfcsuite passes.Checklist
flake8clean (tsfc/spectral.py,tests/tsfc/test_underintegration.py)tests/tsfcsuite passestsfcvsgem) and whether to broaden coverageTargeting
mainper the contributing guidelines (this is a general improvement, not a release-critical fix).