Skip to content

Clone appctx for the cached adjoint replay solvers#5170

Open
sghelichkhani wants to merge 6 commits into
JHopeCollins/nlvs-hessian-fixfrom
sghelichkhani/appctx-on-cached-solver-block
Open

Clone appctx for the cached adjoint replay solvers#5170
sghelichkhani wants to merge 6 commits into
JHopeCollins/nlvs-hessian-fixfrom
sghelichkhani/appctx-on-cached-solver-block

Conversation

@sghelichkhani

@sghelichkhani sghelichkhani commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

When an annotated NonlinearVariationalSolver is created with an appctx, the replay machinery deep-copies the form coefficients but passes the appctx through untouched (forward solver) or drops it entirely (adjoint and TLM solvers). Any preconditioner that reads UFL expressions out of the appctx, MassInvPC being the obvious one for Schur complement Stokes, ends up looking at the user's original coefficients rather than the clones the tape keeps in sync, so during replay it builds the preconditioner from stale end-of-forward-run values.

This applies the forward coefficient replace map to UFL entries in the appctx when the cached solvers are built. Since the tangent and adjoint solvers derive their forms from the cloned forward problem, all three share the one map and so point at the same cloned coefficients that update_dependencies updates at replay time. The adjoint and TLM solvers fall back to the forward appctx when their own kwargs don't carry one; David confirmed this is always safe since the adjoint equation uses J^T (https://firedrakeproject.slack.com/archives/C1Q0Y6H8A/p1776171359074139). The appctx pop in solve_init_params stays, because the legacy GenericSolveBlock adjoint path solves with an assembled matrix and can't accept the kwarg; there's a regression test pinning that path too.

The new tests in tests/firedrake/adjoint/test_appctx.py check object identity between the appctx and the cached forms, and use a recording preconditioner to verify the values seen during forward and adjoint replay match the forward trajectory.

Note this targets the #4638 refactor branch, so it only reaches main when that merges. The same bug exists in the current release through the old _ad_problem_clone machinery, so we should probably do a separate PR basing the release branch to fix released Firedrake as well; I have that version of the fix ready on a separate branch.

Disclosure: parts of this change were prepared with the help of Claude Code (Opus 4.8).

angus-g and others added 5 commits June 8, 2026 17:25
A bit of a doubling up on the already-present adj_sol on
solver blocks.
* reduce nlvs adjoint test size

* nlvs adjoint caching - bundle up recompute objects
When a NonlinearVariationalSolver is created with an appctx dict, the
cached solvers used for tape replay deep-copy the form coefficients,
but the appctx was passed through unchanged (forward solver) or
dropped outright (adjoint and TLM solvers). Preconditioners that read
from appctx (e.g. MassInvPC for Schur complement Stokes) would see
stale end-of-forward-run values instead of the tape-replayed values.

The forward cache now exposes its coefficient replace map, and a new
_ad_clone_kwargs helper applies ufl.replace to every UFL expression in
appctx using that map. Since the tangent and adjoint solvers build
their forms from the cloned forward problem, all three cached solvers
share the single forward replace map, so their appctx entries point at
the same cloned coefficients that update_dependencies keeps in sync
with the tape. The tangent and adjoint solvers fall back to the
forward kwargs' appctx when their own kwargs do not carry one: their
operators are linearisations of the forward operator, so a
preconditioner that needs appctx for the forward solve needs the same
for them.

The appctx pop in solve_init_params is kept: the legacy
GenericSolveBlock adjoint path solves with an assembled matrix through
firedrake.solve, which rejects an appctx kwarg, so appctx must still
be dropped there.
@sghelichkhani

Copy link
Copy Markdown
Contributor Author

Since this only reaches main when #4638 merges and the bug also affects released Firedrake, I've opened #5171 with the same fix expressed against the current release machinery.

@connorjward

Copy link
Copy Markdown
Contributor

Why are there two PRs for this? If this goes into release then it will make its way into main soon enough (potentially immediately after if needed). Can't this one just be closed?

@sghelichkhani

Copy link
Copy Markdown
Contributor Author

The two are not the same patch landing twice, so closing this one would drop the fix rather than defer it. #5171 against release repairs the legacy cloning path, namely _ad_problem_clone and _ad_adj_lvs_problem, threading the replace maps back out so that the appctx can be cloned alongside the form coefficients.

This PR is the same fix expressed against the world that @JHopeCollins's nlvs-hessian-fix branch introduces, where those methods are superseded by the _ad_forward_cache, _ad_tangent_cache and _ad_adjoint_cache machinery. Once release reaches main and that branch rebases on top, the #5171 edits would land on code that has already been rewritten there, so the appctx fix does not carry over of its own accord and has to be reinstated against the cache, which is what this PR does. It additionally covers the tangent path, which has no counterpart in the legacy structure. I thought of it less as a duplicate of #5171 and more as a contribution that keeps the fix alive once that refactor supersedes the old cloning code, and I am happy to fold it into the branch in whatever form suits.

The previous comment claimed the assembled adjoint solve does not accept
appctx, which is misleading now that LinearSolver is a LinearVariationalSolver
subclass and forwards appctx. The actual rejection is one layer up: the
solve(A, x, b) dispatch validates kwargs in _extract_linear_solver_args and
refuses a top-level appctx, reading it only from solver_parameters.
@sghelichkhani sghelichkhani force-pushed the sghelichkhani/appctx-on-cached-solver-block branch from aea71cb to e9af276 Compare June 15, 2026 13:57
@angus-g angus-g force-pushed the JHopeCollins/nlvs-hessian-fix branch from 6f6fd15 to 440f386 Compare June 16, 2026 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants