Skip to content

DAG stage split: skip downstream-only blocks during RK stages #3

Description

@milanofthe

Summary

Optimization idea: during RK solver stages, only evaluate blocks that are upstream of dynamic blocks (blocks with solver engines). Downstream-only blocks (Scopes, output chains) would be evaluated once after the step is accepted, not at every intermediate stage.

Expected Gain

  • 5-15% for typical models (10-50 blocks)
  • 20-30% for models with many observation points or post-processing chains
  • 0% for models where all blocks are in feedback loops

Approach

Reverse reachability analysis from engine blocks (Integrator, ODE, StateSpace, etc.) through the DAG and algebraic loops. Blocks that are NOT reachable are classified as "post" and skipped during RK stages.

Classification table

Block type In loop? Upstream of engine? Per RK stage After step
Engine block - (is engine) step() sample()
Algebraic No Yes update() -
Algebraic No No - update()
Algebraic Yes (loop upstream of engine) Yes FPI solve per stage -
Algebraic Yes (loop not upstream) No - FPI solve once

Key distinction

The Graph currently classifies blocks as alg_blocks (len > 0) and dyn_blocks (len == 0). But dyn_blocks includes Constants, Sources, Scopes — not just engine blocks. The stage split needs a third category: engine_blocks (blocks with actual solver state).

Learnings from prototype

A prototype was attempted on branch feature/dag-stage-split (now deleted). Key findings:

  1. Graph::new needs has_engine flag(NodeId, bool) must become (NodeId, bool, bool) to distinguish algebraic vs dynamic vs engine blocks.

  2. Reverse BFS must start from engine blocks only — starting from all dyn_blocks (which includes Constants/Scopes) makes everything reachable and defeats the optimization.

  3. Loop propagation is tricky — if any block in a loop is upstream of an engine, the ENTIRE loop must be solved per stage, AND all blocks upstream of the loop become stage-critical too.

  4. Connection classification — a connection is "dyn" if its target is dyn-reachable. Need to iterate connections per edge to classify.

  5. Silent correctness bugs — if a block is wrongly classified as "post", the simulation produces subtly wrong results (wrong step count, wrong trajectory) without errors. This makes the optimization risky.

  6. Subsystem interaction — Subsystems are treated as blocks by the outer simulation. If a Subsystem contains engine blocks, it has an engine itself and is automatically classified correctly. Internal optimization within the Subsystem would be Phase 2.

Decision

Deferred. The 10-15% gain does not justify the added complexity given that fastsim already achieves ~30ns/block/eval (competitive with compiled Simulink). The risk of silent correctness bugs from misclassification outweighs the performance benefit.

If revisited

  • Add has_engine to Graph::new signature
  • Implement classify_dyn_post() with reverse BFS from engine blocks
  • Add dag_dyn_iter() / dag_post_iter() to Graph
  • Add _update_dyn() to Simulation (used in RK stage loop and implicit solve only)
  • Keep _update() for post-step, event handling, and init/cleanup
  • Add comprehensive tests that verify trajectory match for models with downstream-only blocks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions