feat!(cdt): add dimensional observables#119
Conversation
- Add volume-profile, Hausdorff-dimension, and spectral-dimension observables for CDT analysis. - Record per-slice volume profiles in simulation measurements and expose aggregate observable summaries on simulation results. - Add a runnable observables example, academic references, and updated code-organization and roadmap documentation. - Apply saturating simplex-count conversions consistently while keeping conversion helpers crate-internal. BREAKING CHANGE: The broad prelude now keeps only common quick-start imports; specialized foliation, move, proposal, and analysis APIs should be imported from scoped preludes or root exports. The `saturating_usize_to_i32` helper is no longer public API.
|
Caution Review failedFailed to post review comments WalkthroughAdds per-slice volume-profile recording, user-facing Hausdorff and spectral dimension estimators, SimulationResultsBackend and Measurement extensions, triangulation foliation/volume APIs, iterator and safety refactors, prelude/public API reorganization, examples/benchmarks/tests for observables, and broad documentation/tooling updates. ChangesVolume profile, observables, results, and API wiring
Sequence DiagramsequenceDiagram
participant Client as CLI / Example
participant Sim as Metropolis Simulation
participant Tri as Triangulation (CdtTriangulation2D)
participant Res as SimulationResultsBackend
participant Obs as Observable Estimators
Client->>Sim: run_simulation(config)
Sim->>Tri: build initial triangulation / validate foliation
loop each measurement
Sim->>Tri: propose/mutate (moves)
Tri-->>Sim: updated triangulation
Sim->>Tri: triangulation.volume_profile()
Tri-->>Sim: Vec<u32> per-slice volumes
Sim->>Res: record Measurement{..., volume_profile}
end
Sim->>Res: collect equilibrium measurements
Sim->>Obs: estimate_hausdorff_dimension(&final_triangulation)
Obs->>Tri: dual_adjacency(), BFS ball growth
Obs-->>Sim: Option<f64> d_H
Sim->>Obs: estimate_spectral_dimension(&final_triangulation)
Obs->>Tri: dual_adjacency(), random-walk return probs
Obs-->>Sim: Option<f64> d_S
Sim->>Res: Res.average_volume_profile()
Res-->>Client: results + estimates
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #119 +/- ##
==========================================
- Coverage 92.53% 92.50% -0.03%
==========================================
Files 13 19 +6
Lines 7794 8557 +763
==========================================
+ Hits 7212 7916 +704
- Misses 582 641 +59
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
CONTRIBUTING.md (1)
202-203:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign branch naming example with repository convention.
Line 202 currently uses
feature/your-feature-name, which conflicts with the repo’s preferred{type}/{issue}-descriptor-or-twopattern. Updating this example avoids mixed guidance for contributors.Based on learnings: “When suggesting branch names, prefer
{type}/{issue}-descriptor-or-twoformat (e.g.,fix/307-topology-validation,perf/315-bench-profile).”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CONTRIBUTING.md` around lines 202 - 203, Replace the branch naming example "git checkout -b feature/your-feature-name" with the repository convention using the {type}/{issue}-descriptor-or-two pattern; update the example line to show the new format (referencing the literal string "git checkout -b feature/your-feature-name" to locate it) and provide example branch names such as "fix/307-topology-validation" and "perf/315-bench-profile" to demonstrate the preferred convention.
🧹 Nitpick comments (8)
src/cdt/triangulation.rs (1)
2005-2017: ⚡ Quick winAvoid O(T) slab scan per face in toroidal
face_time_sliceLine 2009 introduces a full
0..time_slicesscan for every face. Sincevolume_profile()calls this for all faces, this becomes O(F·T). You can derive the slab in O(1) from the two distinct labels aftercell_type()succeeds.Suggested direction
match self.metadata.topology { CdtTopology::OpenBoundary => Some(labels[0].min(labels[1]).min(labels[2])), CdtTopology::Toroidal => { let total = self.metadata.time_slices; - for slice in 0..total { - let next = (slice + 1) % total; - let spans_slab = labels.iter().all(|&label| label == slice || label == next) - && labels.contains(&slice) - && labels.contains(&next); - if spans_slab { - return Some(slice); - } - } - None + let mut uniq = [labels[0], labels[1], labels[2]]; + uniq.sort_unstable(); + uniq.dedup(); + let (a, b) = match uniq { + [a, b, ..] if a != b => (a, b), + _ => return None, + }; + if (a + 1) % total == b { + Some(a) + } else if (b + 1) % total == a { + Some(b) + } else { + None + } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cdt/triangulation.rs` around lines 2005 - 2017, The toroidal branch currently scans all time_slices per face (O(T)), make it O(1) by deriving the slab from the two distinct labels instead of iterating: in the toroidal arm of face_time_slice (called from volume_profile() after cell_type() succeeds), collect the distinct values from labels (e.g., via a small set or compare values), ensure there are exactly two distinct labels l0 and l1, then check adjacency modulo total = self.metadata.time_slices (if l1 == (l0+1)%total return Some(l0), else if l0 == (l1+1)%total return Some(l1), else return None); leave other branches unchanged.src/cdt/metropolis.rs (3)
192-198: ⚡ Quick winAdding a public field is a breaking change for direct struct construction.
Measurementispubwith all public fields, so addingvolume_profilebreaks any downstream code that constructsMeasurement { … }via struct literal (no#[non_exhaustive]). This is acceptable here because the PR isfeat!and the change is captured in the breaking-change footer, but consider markingMeasurement#[non_exhaustive]going forward to absorb similar future additions without a major bump.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cdt/metropolis.rs` around lines 192 - 198, The Measurement struct was made extensible by adding the new public field volume_profile which is a breaking change for direct struct literal construction; to avoid future breaking changes, mark the Measurement struct with #[non_exhaustive] so downstream crates cannot exhaustively construct it and will not break when new fields like volume_profile are added — locate the Measurement declaration in src/cdt/metropolis.rs and add the #[non_exhaustive] attribute above the struct definition (ensuring any existing public visibility remains unchanged) and update any internal construction sites to use the struct literal within the same crate or provide a constructor if needed.
1267-1304: 💤 Low valueDimension estimates only use the final triangulation, not the equilibrium ensemble.
hausdorff_dimension_estimateandspectral_dimension_estimateevaluate the estimator solely onself.triangulation(the final post-run state). For CDT observables an ensemble average over equilibrium measurements is the more physically meaningful quantity, but reproducing intermediate triangulations would require keeping snapshots inMeasurement(or rerunning the chain). The current single-state approach is reasonable as a first cut and is consistent with issue#58's "simple log-log fit", but consider clarifying in the doc that the estimate is not averaged over the equilibrium ensemble so users don't conflate it withaverage_volume_profile.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cdt/metropolis.rs` around lines 1267 - 1304, Update the docstrings for hausdorff_dimension_estimate and spectral_dimension_estimate to explicitly state that these methods compute the estimator only on the final triangulation (self.triangulation) and do not perform an ensemble average over equilibrium measurements; mention that reproducing an equilibrium average would require storing snapshots in Measurement or rerunning the chain and optionally point users to average_volume_profile for ensemble-style information so they don't conflate the single-state estimate with an equilibrium ensemble average.
1207-1234: 💤 Low valuePopulation standard deviation used for
volume_fluctuations.
volume_fluctuationsdivides variance bymeasurements.len()rather thanlen() - 1, i.e. it returns the population (biased) standard deviation rather than the sample (unbiased) one. For CDT-style equilibrium observables this is a defensible convention, but it's worth either documenting the convention explicitly or switching to the sample form (and guarding againstlen() < 2) so consumers don't misinterpret error bars.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cdt/metropolis.rs` around lines 1207 - 1234, The function volume_fluctuations currently computes the population stddev by dividing by measurements.len(); change it to compute the sample (unbiased) stddev: obtain n = measurements.len(), if n < 2 return an empty Vec (or Vec of zeros if preferred), otherwise use denominator (n - 1) when normalizing the accumulated variances (convert n-1 to f64 via NumCast like NumCast::from(n - 1).unwrap_or(1.0)); update the normalization in the map to use (variance / denom).sqrt(). Reference equilibrium_measurements() and average_volume_profile() when locating the accumulation and the count logic to add the guard.src/cdt/observables.rs (2)
268-317: ⚖️ Poor tradeoffReasonable random-walk implementation; bipartite oscillation may limit usable samples in practice.
For nearly bipartite dual graphs (e.g. small toroidal CDTs at low refinement) the return probability oscillates between ~1 and ~0 across consecutive steps.
fit_spectral_dimensionfilters values outside(0, 1), so usable points can drop below the 2-sample threshold and the estimator will returnNoneeven though there is real diffusion data. This is acceptable for a first implementation (and is consistent with the stretch-goal status of spectral dimension in#58), but it's worth either noting the limitation in the doc or considering a lazy-walk variant (e.g. self-loop probability1/2) in a follow-up to break the parity.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cdt/observables.rs` around lines 268 - 317, average_return_probabilities currently does a simple random walk that causes parity oscillation on nearly-bipartite graphs; change it to a lazy random walk by giving each node a self-loop probability (e.g. self_loop = 0.5) so a fraction of probability stays at the node each step and the rest is distributed to neighbors. Concretely, inside average_return_probabilities (the inner loop over current probabilities) compute stay = probability * self_loop, add stay to next[index], then distribute probability * (1.0 - self_loop) equally among live neighbors (using the same neighbor_count/NumCast logic); keep sums and swapping logic the same so return probabilities are averaged as before. Optionally make self_loop a constant or parameter for future tuning.
235-250: 💤 Low valueDocument the
volume > 1.0filter rationale.The filter drops samples where ball volume is exactly
1.0(i.e. only the root face is reachable at that radius), which produces aln(volume) = 0point that biases the slope toward zero. Worth a one-line comment explaining that intent — otherwise it reads as an arbitrary threshold and could be "fixed" later to> 0.0or>= 1.0, silently regressing the fit on small/disconnected components.Suggested clarifying comment
fn fit_log_log_slope(ball_volumes: &[f64]) -> Option<f64> { fit_linear_slope( ball_volumes .iter() .enumerate() .skip(1) .filter_map(|(radius, &volume)| { + // Drop volume == 1.0 (root only): ln(1) = 0 anchors the + // slope at the origin and biases the fit on disconnected + // or trivially small dual graphs. if volume > 1.0 && volume.is_finite() {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cdt/observables.rs` around lines 235 - 250, Add a one-line comment next to the volume filter in fit_log_log_slope explaining why we use volume > 1.0 (i.e., to exclude samples where volume == 1.0 because ln(1.0) == 0 and such “root-only” reachable points would bias the log-log slope toward zero); place the comment adjacent to the filter_map closure (referencing the volume > 1.0 check) so future readers won’t change it to > 0.0 or >= 1.0 by mistake.src/lib.rs (1)
244-269: 💤 Low valueDocument why
prelude::observablesre-exportsCdtTriangulationtypes.
CdtTriangulationandCdtTriangulation2Dare already available inprelude::triangulationandprelude::geometry. Re-exporting them here is reasonable for ergonomics (the doctest needsfrom_cdt_stripin scope), but the duplication should be called out in the module doc so it doesn't drift over time. As per coding guidelines/learnings: "Keep scoped preludes minimal and orthogonal; do not duplicate specialized APIs across scoped preludes unless the overlap is intentionally documented."Suggested doc clarification
/// Focused exports for CDT observables and post-simulation analysis. /// /// This prelude is intended for measuring triangulations without importing /// simulation runner, telemetry, proposal, or move APIs. + /// + /// `CdtTriangulation` and `CdtTriangulation2D` are intentionally re-exported + /// from this prelude (also available via `prelude::triangulation` / + /// `prelude::geometry`) so a single `use prelude::observables::*;` is + /// sufficient for typical analysis workflows.Based on learnings: "Keep scoped preludes minimal and orthogonal; do not duplicate specialized APIs across scoped preludes unless the overlap is intentionally documented."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib.rs` around lines 244 - 269, Add a short module-level doc comment to prelude::observables explaining why it re-exports CdtTriangulation and CdtTriangulation2D (even though those types are exported by prelude::triangulation and prelude::geometry) to document the intentional overlap for ergonomics (e.g., to make methods like CdtTriangulation::from_cdt_strip available to the doctest) and to signal this duplication is deliberate and should be maintained; update the docstring at the top of the observables mod (referencing prelude::observables, CdtTriangulation, CdtTriangulation2D, and the doctest use of from_cdt_strip) with one or two sentences describing this rationale.benches/cdt_benchmarks.rs (1)
341-358: 💤 Low valueConsider benchmarking the new observable APIs.
The
bench_simulation_analysisgroup already exercisesacceptance_rate,average_action, andequilibrium_measurements. Adding bench points foraverage_volume_profile,volume_fluctuations, and the dimension estimators would catch performance regressions in the new analysis surface —hausdorff_dimension_estimatein particular isO(F·(F+E_d))and worth keeping an eye on.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benches/cdt_benchmarks.rs` around lines 341 - 358, The benchmark group bench_simulation_analysis currently measures acceptance_rate, average_action, and equilibrium_measurements but omits the new observable APIs; update benches/cdt_benchmarks.rs to add benchmark points that call average_volume_profile, volume_fluctuations, and the dimension estimators (e.g., hausdorff_dimension_estimate and any other estimator functions) inside the same bench_simulation_analysis harness using the existing Measurement/test dataset so they run alongside acceptance_rate/average_action; when adding hausdorff_dimension_estimate ensure the benchmark input size is reasonable (it is O(F*(F+E_d))) to avoid excessive runtime and label the benches clearly so regressions for those specific functions are tracked.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/util.rs`:
- Around line 146-148: The test test_saturating_usize_to_u32_overflow uses
`u32::MAX as usize + 1` which can overflow on 32-bit targets; update the test to
be pointer-width safe by guarding the overflow assertion with a cfg so it only
runs on platforms where usize is wider than u32 (e.g. add
#[cfg(target_pointer_width = "64")] around the
`assert_eq!(saturating_usize_to_u32(u32::MAX as usize + 1), u32::MAX);` line or
otherwise compute the overflow value in a way that doesn't overflow on 32-bit,
keeping the second assertion for usize::MAX intact and referencing the test
function name `test_saturating_usize_to_u32_overflow` and the helper
`saturating_usize_to_u32`.
---
Outside diff comments:
In `@CONTRIBUTING.md`:
- Around line 202-203: Replace the branch naming example "git checkout -b
feature/your-feature-name" with the repository convention using the
{type}/{issue}-descriptor-or-two pattern; update the example line to show the
new format (referencing the literal string "git checkout -b
feature/your-feature-name" to locate it) and provide example branch names such
as "fix/307-topology-validation" and "perf/315-bench-profile" to demonstrate the
preferred convention.
---
Nitpick comments:
In `@benches/cdt_benchmarks.rs`:
- Around line 341-358: The benchmark group bench_simulation_analysis currently
measures acceptance_rate, average_action, and equilibrium_measurements but omits
the new observable APIs; update benches/cdt_benchmarks.rs to add benchmark
points that call average_volume_profile, volume_fluctuations, and the dimension
estimators (e.g., hausdorff_dimension_estimate and any other estimator
functions) inside the same bench_simulation_analysis harness using the existing
Measurement/test dataset so they run alongside acceptance_rate/average_action;
when adding hausdorff_dimension_estimate ensure the benchmark input size is
reasonable (it is O(F*(F+E_d))) to avoid excessive runtime and label the benches
clearly so regressions for those specific functions are tracked.
In `@src/cdt/metropolis.rs`:
- Around line 192-198: The Measurement struct was made extensible by adding the
new public field volume_profile which is a breaking change for direct struct
literal construction; to avoid future breaking changes, mark the Measurement
struct with #[non_exhaustive] so downstream crates cannot exhaustively construct
it and will not break when new fields like volume_profile are added — locate the
Measurement declaration in src/cdt/metropolis.rs and add the #[non_exhaustive]
attribute above the struct definition (ensuring any existing public visibility
remains unchanged) and update any internal construction sites to use the struct
literal within the same crate or provide a constructor if needed.
- Around line 1267-1304: Update the docstrings for hausdorff_dimension_estimate
and spectral_dimension_estimate to explicitly state that these methods compute
the estimator only on the final triangulation (self.triangulation) and do not
perform an ensemble average over equilibrium measurements; mention that
reproducing an equilibrium average would require storing snapshots in
Measurement or rerunning the chain and optionally point users to
average_volume_profile for ensemble-style information so they don't conflate the
single-state estimate with an equilibrium ensemble average.
- Around line 1207-1234: The function volume_fluctuations currently computes the
population stddev by dividing by measurements.len(); change it to compute the
sample (unbiased) stddev: obtain n = measurements.len(), if n < 2 return an
empty Vec (or Vec of zeros if preferred), otherwise use denominator (n - 1) when
normalizing the accumulated variances (convert n-1 to f64 via NumCast like
NumCast::from(n - 1).unwrap_or(1.0)); update the normalization in the map to use
(variance / denom).sqrt(). Reference equilibrium_measurements() and
average_volume_profile() when locating the accumulation and the count logic to
add the guard.
In `@src/cdt/observables.rs`:
- Around line 268-317: average_return_probabilities currently does a simple
random walk that causes parity oscillation on nearly-bipartite graphs; change it
to a lazy random walk by giving each node a self-loop probability (e.g.
self_loop = 0.5) so a fraction of probability stays at the node each step and
the rest is distributed to neighbors. Concretely, inside
average_return_probabilities (the inner loop over current probabilities) compute
stay = probability * self_loop, add stay to next[index], then distribute
probability * (1.0 - self_loop) equally among live neighbors (using the same
neighbor_count/NumCast logic); keep sums and swapping logic the same so return
probabilities are averaged as before. Optionally make self_loop a constant or
parameter for future tuning.
- Around line 235-250: Add a one-line comment next to the volume filter in
fit_log_log_slope explaining why we use volume > 1.0 (i.e., to exclude samples
where volume == 1.0 because ln(1.0) == 0 and such “root-only” reachable points
would bias the log-log slope toward zero); place the comment adjacent to the
filter_map closure (referencing the volume > 1.0 check) so future readers won’t
change it to > 0.0 or >= 1.0 by mistake.
In `@src/cdt/triangulation.rs`:
- Around line 2005-2017: The toroidal branch currently scans all time_slices per
face (O(T)), make it O(1) by deriving the slab from the two distinct labels
instead of iterating: in the toroidal arm of face_time_slice (called from
volume_profile() after cell_type() succeeds), collect the distinct values from
labels (e.g., via a small set or compare values), ensure there are exactly two
distinct labels l0 and l1, then check adjacency modulo total =
self.metadata.time_slices (if l1 == (l0+1)%total return Some(l0), else if l0 ==
(l1+1)%total return Some(l1), else return None); leave other branches unchanged.
In `@src/lib.rs`:
- Around line 244-269: Add a short module-level doc comment to
prelude::observables explaining why it re-exports CdtTriangulation and
CdtTriangulation2D (even though those types are exported by
prelude::triangulation and prelude::geometry) to document the intentional
overlap for ergonomics (e.g., to make methods like
CdtTriangulation::from_cdt_strip available to the doctest) and to signal this
duplication is deliberate and should be maintained; update the docstring at the
top of the observables mod (referencing prelude::observables, CdtTriangulation,
CdtTriangulation2D, and the doctest use of from_cdt_strip) with one or two
sentences describing this rationale.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 50ffda73-171a-4d8f-9a3a-24ec003d058c
📒 Files selected for processing (21)
AGENTS.mdCONTRIBUTING.mdCargo.tomlREADME.mdREFERENCES.mdbenches/cdt_benchmarks.rsdocs/PERFORMANCE_TESTING.mddocs/code_organization.mddocs/dev/rust.mddocs/project.mddocs/roadmap.mddocs/testing.mdexamples/observables.rssrc/cdt/action.rssrc/cdt/metropolis.rssrc/cdt/observables.rssrc/cdt/triangulation.rssrc/config.rssrc/geometry/operations.rssrc/lib.rssrc/util.rs
💤 Files with no reviewable changes (1)
- docs/project.md
🔴 Performance Regression DetectedPerformance Analysis ReportCDT Performance Analysis ReportGenerated: 2026-05-06T05:21:16+00:00 Summary
🔴 Performance Regressions
🆕 New Benchmarks
Performance analysis powered by Criterion.rs |
- Move CDT triangulation builders, foliation logic, mutation hooks, validation, and simulation result summaries into focused modules. - Validate Metropolis target and proposal parameters before constructing reusable MCMC components. - Reject non-finite geometry generator inputs and invalid toroidal domains before they reach Delaunay predicates. - Make Delaunay backend mutations restore prior state on upstream errors or malformed flip outputs. - Return concrete topology iterators from TriangulationQuery to avoid boxed iterator dispatch in repeated scans. - Add validated example output markers and production Rust Semgrep rules for panic-prone error paths. BREAKING CHANGE: CdtTarget::new, CdtProposal::new, and CdtProposal::with_seed now return CdtResult, and TriangulationQuery iterator methods now return concrete impl Iterator values instead of boxed trait objects.
🔴 Performance Regression DetectedPerformance Analysis ReportCDT Performance Analysis ReportGenerated: 2026-05-06T18:23:28+00:00 Summary
🔴 Performance Regressions
🟢 Performance Improvements
🆕 New Benchmarks
✅ Stable BenchmarksNo significant changes detected in 10 benchmarks. Performance analysis powered by Criterion.rs |
BREAKING CHANGE: The broad prelude now keeps only common quick-start imports; specialized foliation, move, proposal, and analysis APIs should be imported from scoped preludes or root exports. The
saturating_usize_to_i32helper is no longer public API.Closes #58