Update to OrdinaryDiffEq.jl v7 and related SciML packages#2910
Update to OrdinaryDiffEq.jl v7 and related SciML packages#2910github-actions[bot] wants to merge 45 commits intomainfrom
Conversation
5d38e62 to
4a12427
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2910 +/- ##
==========================================
- Coverage 97.13% 96.87% -0.26%
==========================================
Files 625 626 +1
Lines 48512 48542 +30
==========================================
- Hits 47121 47025 -96
- Misses 1391 1517 +126
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
JoshuaLampert
left a comment
There was a problem hiding this comment.
Needs dependencies like SummationByPartsOperators.jl to be updated first.
|
We need jlchan/StartUpDG.jl#219 to be merged and released before. |
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
|
@termi-official we are trying to adapt to OrdinaryDiff.jl v7 etc. in a backward-compatible way if possible. Could you help with the controller changes? I ported the changes from your PR #2773 with some small changes, but using Trixi, OrdinaryDiffEqSSPRK
trixi_include("examples/tree_2d_dgsem/elixir_advection_restart.jl", alg = SSPRK43(), base_elixir = "elixir_advection_timeintegration_adaptive.jl");gives [...]
ERROR: LoadError: FieldError: type OrdinaryDiffEqCore.ODEIntegrator has no field `qold`, available fields: `sol`, `u`, `du`, `k`, `t`, `dt`, `f`, `p`, `uprev`, `uprev2`, `duprev`, `tprev`, `alg`, `dtcache`, `dtchangeable`, `dtpropose`, `tdir`, `eigen_est`, `controller_cache`, `success_iter`, `iter`, `saveiter`, `saveiter_dense`, `cache`, `callback_cache`, `kshortsize`, `force_stepfail`, `last_stepfail`, `just_hit_tstop`, `next_step_tstop`, `tstop_target`, `do_error_check`, `event_last_time`, `vector_event_last_time`, `last_event_error`, `accept_step`, `isout`, `reeval_fsal`, `derivative_discontinuity`, `reinitialize`, `isdae`, `opts`, `stats`, `initializealg`, `differential_vars`, `fsalfirst`, `fsallast`, `rng`, `W`, `P`, `sqdt`, `noise`, `c`, `rate_constants`
Stacktrace:
[1] setproperty!(x::OrdinaryDiffEqCore.ODEIntegrator{…}, f::Symbol, v::Float64)
@ Base ./Base_compiler.jl:56
[2] load_controller!(integrator::OrdinaryDiffEqCore.ODEIntegrator{…}, controller::OrdinaryDiffEqCore.PIControllerCache{…}, file::HDF5.File)
@ TrixiOrdinaryDiffEqCoreExt ~/.julia/dev/Trixi/ext/TrixiOrdinaryDiffEqCoreExt.jl:33
[...]What are the fields we need to set in the new controller interface? Edit: I think I figured it out. But is it expected that the results change slightly (with a relative difference of the order of ~1e-6), see the first point of #2910 (comment). |
|
Can you first please test if the solution is already different before storing it? The controller API change was a refactor and no observable changes should be expected. If there are any, then we need to investigate and fux these, as it essentially means I messed up copying the code around in some place. All dynamic stuff should be contained in the cache now, so only these have to be filled during load. Have you compared the integrator state after loading it vs before saving it (as I could not find tests related to this)? Related to the observed difference here might be that for the first time step iteration is now allowed to change dt more drastically. Can you also store/load
I probably misunderstand this one. Is the reference analytical and are you controller tolerances (atol and rtol) small enough to be close enough to the real solution in these integration tests given the test tolerances? |
|
Sorry, I should have been clearer. The main changes in the result (the "first two elixirs" from point 1 in the list above) don't have anything to do with restarting, and saving/loading of the controller. What makes them different compared to almost all other tests is that we explicitly pass a julia> trixi_include("examples/tree_2d_dgsem/elixir_euler_astro_jet_amr_scO2.jl", tspan = (0.0, 1e-7));
[...]
julia> analysis_callback(sol)
(l2 = [0.011443084580409251, 10.24145497473291, 0.0036170868356439675, 4089.506746048727], linf = [3.4543954534547026, 3142.418224193642, 7.33755561607338, 1.2527944643119895e6])
(run) pkg> st
Status `~/.julia/dev/Trixi/run/Project.toml`
[bbf590c4] OrdinaryDiffEqCore v4.0.0
[669c94d9] OrdinaryDiffEqSSPRK v2.0.0
[a7f1ee26] Trixi v0.16.7-DEV `..`
Info Packages marked with ⌃ have new versions available and may be upgradable.With old OrdinaryDiffEq*.jl versions: julia> trixi_include("examples/tree_2d_dgsem/elixir_euler_astro_jet_amr_scO2.jl", tspan = (0.0, 1e-7));
[...]
julia> analysis_callback(sol)
(l2 = [0.011443079784345214, 10.241451663574523, 0.003617158399146704, 4089.5052326255754], linf = [3.4543899636994606, 3142.4152494780465, 7.33772972834869, 1.2527931359048043e6])
julia> analysis_callback(sol).l2 .- [0.011443084580409251, 10.24145497473291, 0.0036170868356439675, 4089.506746048727]
4-element StaticArraysCore.SizedVector{4, Float64, Vector{Float64}} with indices SOneTo(4):
-4.796064037235204e-9
-3.3111583874756434e-6
7.156350273658518e-8
-0.0015134231516640284
(run_v6) pkg> st
Status `~/.julia/dev/Trixi/run_v6/Project.toml`
[7d9f7c33] Accessors v0.1.44
⌅ [bbf590c4] OrdinaryDiffEqCore v3.33.1
⌃ [669c94d9] OrdinaryDiffEqSSPRK v1.14.0
[a7f1ee26] Trixi v0.16.7-DEV `..`
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`Regarding the reference values for the L2 and Linf errors we use in the tests: In our tests we compute these errors with respect to the analytical solution if available and with respect to the initial condition otherwise. In this case, there is no known analytical solution. So these are not analytical values, but come from manually validated previous runs. I.e., these are only regression tests to catch cases like this where changes, which are not expected to change the numerical result, actually do change them. However, it is not easy to say, which of the two versions gives a "more correct" solution. In this example we use default tolerances for the adaptive time stepping. |
|
I see, thanks for elaborating. Note that the test is not really safe then, because even minor changes in the implementation, like changing some intermediate computation from 10 // 1 to 10.0 can tank the test. I am a bit puzzled that it only happens if you pass the PIDController manually. This probably means that one of the parameters is not picked up correctly anymore. Can you check if the other parameters (like qsteady_min/max) differ now? Also, try to put To bisect this, the commit on master right before the controller interface update is 1dfa9ba8b248720ddd741b3d92bc76aba17235d1 and the one with update is 80097def27da339f712a28d7a7bb10d54ec5a250 . Just to confirm that this is really the interface update being faulty or something else happened. |
|
Also note that the results differ by you tolerance (about 1e-6), which tells me that the controllers are on both versions intact. |
|
First of all, thanks for your help!
Yes, but we allow for small deviations, which come from different architectures or order of floating point operations. Whenever a version update of downstream packages exceeds this small tolerance, it is very likely due to a true change in the downstream package, which yields different numerical solutions. Even though the new version still might satisfy the tolerances and might be a valid solution, we want to first know there are changes affecting the numerical result (that's what these regression tests tell us), then understand what exactly caused this to finally approve it after validating the changes are fine (or file a bug report/fix the underlying issue if we do not approve the changes). If you say numerical results should not change at all since it is only refactoring (up to maybe floating point differences), but we still see differences (within or not within time stepping tolerances) and do not understand where they come from, we should investigate. I will do that with the hints you have given. In addition, the jobs, where MPI gets stuck (maybe because the controllers on different ranks have a different state?) definitely need to be debugged. Since we talked about that @ranocha, I want to note that currently, the restarted elixir gives exactly the same results as if we would have run the simulation for the complete time span: julia> trixi_include(joinpath("examples", "tree_2d_dgsem", "elixir_advection_timeintegration_adaptive.jl"),
alg = SSPRK43(), tspan = (0.0, 10.0));
[...]
julia> l2_expected, linf_expected = analysis_callback(sol)
(l2 = [0.00010686315659397209], linf = [0.001133742851433972])
julia> trixi_include(joinpath("examples", "tree_2d_dgsem", "elixir_advection_restart.jl"),
alg = SSPRK43(), base_elixir = "elixir_advection_timeintegration_adaptive.jl");
[...]
julia> l2_actual, linf_actual = analysis_callback(sol)
(l2 = [0.00010686315659397209], linf = [0.001133742851433972])
julia> linf_expected == linf_actual
true
julia> l2_expected == l2_actual
trueSo to me it looks like saving and loading of the controller is actually fine. |
|
Looking at the MPI situations, it looks like we have a barrier mismatch together with t8code. One process is in: The other two processes are in: The backtraces are lacking Julia details. |
Yes, that was it. On the old OrdinaryDiffEq.jl version, the default for Edit: Ah, digging a bit deeper I see there is |
Please note that I can only speak for the controller refactoring here. The v7 update has a lot more changes and fixes.
I had to decide for either 1 or 1.2 (see https://github.com/SciML/OrdinaryDiffEq.jl/blob/cc0a47bb3418e4f4ce7ff875c4a05c88853fc167/lib/OrdinaryDiffEqCore/src/alg_utils.jl#L468) and arbitrarily picked the 1.2 to have some play in the controller. If you have some suggestion how that part of the decision process for the parameters should be incorporated, then lets discuss. I guess the only sane way to recover the v6 behavior is to use this constructor instead: https://github.com/SciML/OrdinaryDiffEq.jl/blob/cc0a47bb3418e4f4ce7ff875c4a05c88853fc167/lib/OrdinaryDiffEqCore/src/integrators/controllers.jl#L574 |
As far as I can tell this is the line where we are hitting the 1.2.
Looks like we cannot pass different values for the betas as we do in |
|
I am fine with passing |
Yes, but since we do not have the
|
Ah, true. I misread that. Thanks! |
|
@vchuravy, what did you run to debug the MPI issues? When I execute example/tree_2d_dgsem/elixir_advection_basic.jl or examples/tree_2d_dgsem/elixir_advection_restart.jl locally on Ubuntu with MPI it works just fine. From the CI logs it also looks like the tests get already stuck when trying to run the first elixir, which is example/tree_2d_dgsem/elixir_advection_basic.jl. |
|
|
They shouldn't change the default values? What's a case where the default is changed? |
E.g. in this constructor https://github.com/SciML/OrdinaryDiffEq.jl/blob/cc0a47bb3418e4f4ce7ff875c4a05c88853fc167/lib/OrdinaryDiffEqCore/src/integrators/controllers.jl#L559 we do not have access to the algorithm and hence for some algorithms the default changes. We could default here to nothing and reresolve the missing pieces in a second step inside init. |
|
It should default to |
|
So the new getters in SciML/OrdinaryDiffEq.jl#3570 should resolve these or the cache setup step? |
|
Yes, that would be the direction. |
This pull request changes the compat entry for the
RecursiveArrayToolspackage from3.37to3.37, 4.This keeps the compat entries for earlier versions.
Note: I have not tested your package with this new compat entry.
It is your responsibility to make sure that your package tests pass before you merge this pull request.
Closes #2918, closes #2919, closes #2949, closes #2950, closes #2960, closes #2961, closes #2962, closes #2963, closes #2965, closes #2966, closes #2967, closes #2968, closes #2969, closes #2970, closes #2971, closes #2972, closes #2973, closes #2981, closes #2983, closes #2984.
The required changes are:
u_modified!byderivative_discontinuity!VectorOfArrayofSVectorsnow returns the underlying floating values, not theSVector, fixed by replacingubyparent(u)threadis now passed asFastBroadcast.Serial()/FastBroadcast.Threaded()instead ofTrue()/False().