Proof of concept: task eviction after snapshot for turbo-tasks-backend#91790
Proof of concept: task eviction after snapshot for turbo-tasks-backend#91790lukesandberg wants to merge 4 commits intoracy_snapshot_encodingfrom
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Tests Passed |
Stats from current PR✅ No significant changes detected📊 All Metrics📖 Metrics GlossaryDev Server Metrics:
Build Metrics:
Change Thresholds:
⚡ Dev Server
📦 Dev Server (Webpack) (Legacy)📦 Dev Server (Webpack)
⚡ Production Builds
📦 Production Builds (Webpack) (Legacy)📦 Production Builds (Webpack)
📦 Bundle SizesBundle Sizes⚡ TurbopackClient Main Bundles
Server Middleware
Build DetailsBuild Manifests
📦 WebpackClient Main Bundles
Polyfills
Pages
Server Edge SSR
Middleware
Build DetailsBuild Manifests
Build Cache
🔄 Shared (bundler-independent)Runtimes
📎 Tarball URL |
Merging this PR will not alter performance
Comparing Footnotes
|
c194c61 to
fa50df9
Compare
fa50df9 to
056be10
Compare
| /// snapshot → evict → restore cycle works correctly. | ||
| /// | ||
| /// Returns `(snapshot_had_new_data, eviction_counts)`. | ||
| pub fn snapshot_and_evict( |
There was a problem hiding this comment.
apply cfg(test)?
There was a problem hiding this comment.
| pub fn snapshot_and_evict( | |
| pub fn snapshot_and_evict_for_testing( |
116d63d to
ae6878d
Compare
35da0fa to
290c8d7
Compare
890a1ba to
da87628
Compare
642cf11 to
77631e6
Compare
5f7ce69 to
c27d432
Compare
bc661cd to
852ab00
Compare
## What
Tightens the value-type persistence API and sets the table for a future eviction policy. Two user-visible changes on the `#[turbo_tasks::value(...)]` macro:
- **`serialization = "none"` → `serialization = "skip"`** — imperative ("skip persisting") instead of descriptive. Making it clear that it isn't that we are missing a feature but rather that we are choosing to not persist (of course persisting might be impossible but that is generally rare)
- **New `evict = "always" | "last" | "never"` attribute** — replaces the old overloaded `"none"` semantic. Only valid with `serialization = "skip"`. Defaults to `"always"`.
Internally this collapses `persistent_cell_data` + `transient_cell_data` into one `cell_data` map and replaces the old `bincode: Option<(enc, dec)>` field with a four-variant `ValueTypePersistence` enum. Eviction machinery itself is a follow-up PR; this PR just gives each value type a precise, queryable persistence/eviction descriptor.
## Why break out a new parameter?
`serialization = "none"` on canary conflated three different intents:
1. **Cheap recomputable outputs** (SWC ASTs, codegen `Rope`s) — fine to evict, recompute is cheap.
2. **Expensive recomputable outputs** (WASM modules, Node process pools) — re-derivable but costly.
3. **Session-scoped state** (`State<>` cells, `Arc<Mutex<_>>` dedup histories) — can't be recomputed without losing accumulated mutations.
They all produced identical runtime behavior (stored in transient_cell_data), so eviction can't tell them apart. The fix is two orthogonal attributes:
```rust
// A cheap skip — default evict = "always"
#[turbo_tasks::value(serialization = "skip")]
// Expensive recompute — evict last under pressure
#[turbo_tasks::value(serialization = "skip", evict = "last")]
// Session-scoped state — never evict
#[turbo_tasks::value(serialization = "skip", evict = "never")]
```
The macro rejects `evict` on any other `serialization` mode.
## `ValueTypePersistence` enum
Replaces `ValueType.bincode: Option<(enc, dec)>`:
```rust
pub enum ValueTypePersistence {
Persistable(AnyEncodeFn, AnyDecodeFn<SharedReference>), // auto, custom
SkipPersist { expensive: bool }, // skip (+ evict = last)
HashOnly, // hash
SessionStateful, // skip + evict = never
}
```
The existing `"hash"` mode gets its own `HashOnly` variant rather than being folded into `SkipPersist`, which lets the backend gate its hash-writing and hash-comparison paths precisely.
## Unified `cell_data` storage
`persistent_cell_data: AutoMap<CellId, TypedSharedReference>` + `transient_cell_data: AutoMap<CellId, SharedReference>` collapse into `cell_data: CellData`. `CellData` is a newtype over `AutoMap<CellId, SharedReference>` with a custom bincode impl that filters non-`Persistable` entries at encode time. This removes the `is_serializable_cell_content: bool` parameter that was threading through ~14 read/write call sites.
Uses `SharedReference` instead of `TypedSharedReference` — `CellId` already carries the `ValueTypeId`.
## Annotation sweep
All prior `serialization = "none"` sites move to either `serialization = "skip", evict = "never"` or `serialization = "skip", evict = "last"` based on a per-site audit. Summary:
**`evict = "last"` (6 sites)** — re-derivable but expensive:
- `SwcPluginModule`, `EvaluatePool`, `ChildProcessPool`, `WorkerThreadPool`, `EffectInstance`, `Effects`
**`evict = "never"` (2 sites)** — interior-mutable state accumulated across the session:
- `ConsoleUi` (`Arc<Mutex<SeenIssues>>`), `VersionState` (`State<VersionRef>` with HMR invalidators)
The distinguishing rule: `evict = "never"` only when the value holds interior mutability accumulated across the session. Everything else can be re-derived (possibly expensively) by re-running the producing task.
## Follow-ups (separate PRs)
- Wire an eviction policy that consumes `ValueTypePersistence` — respects `SessionStateful` (never evict), prefers cheap `SkipPersist` over `expensive: true` ones. Either as part of #91790 or afterwards depending on when things land
<!-- NEXT_JS_LLM_PR -->
852ab00 to
765be4f
Compare
054b887 to
6964cfd
Compare
765be4f to
c5f0530
Compare
6964cfd to
e98a934
Compare
c5f0530 to
8b16ef4
Compare
e98a934 to
06eef17
Compare
8b16ef4 to
6ba65cc
Compare
06eef17 to
dffcc78
Compare
6ba65cc to
46b10ce
Compare
Stats from current PR✅ No significant changes detected📊 All Metrics📖 Metrics GlossaryDev Server Metrics:
Build Metrics:
Change Thresholds:
⚡ Dev Server
📦 Dev Server (Webpack) (Legacy)📦 Dev Server (Webpack)
⚡ Production Builds
📦 Production Builds (Webpack) (Legacy)📦 Production Builds (Webpack)
📦 Bundle SizesBundle Sizes⚡ TurbopackClient Main Bundles
Server Middleware
Build DetailsBuild Manifests
📦 WebpackClient Main Bundles
Polyfills
Pages
Server Edge SSR
Middleware
Build DetailsBuild Manifests
Build Cache
🔄 Shared (bundler-independent)Runtimes
📎 Tarball URLCommit: 46b10ce |
Failing test suitesCommit: 46b10ce | About building and testing Next.js
Expand output● evict-after-snapshot › should serve correct content after eviction and HMR ● evict-after-snapshot › should handle client component HMR after eviction
Expand output● evict-after-snapshot › should serve correct content after eviction and HMR ● evict-after-snapshot › should handle client component HMR after eviction
Expand output● evict-after-snapshot › should serve correct content after eviction and HMR ● evict-after-snapshot › should handle client component HMR after eviction |
| /// - `SkipPersist { expensive: false }` — cheap to re-derive by re-running the task. | ||
| /// - `HashOnly` — the hash lives in `cell_data_hash`; value is re-derived. |
There was a problem hiding this comment.
I think the split is a bit weird here:
I would make it SkipPersist { expensive: bool, hash: bool } instead of SkipPersist and HashOnly.
Currently you are limiting HashOnly to never be expensive: true, but that might not be true. We could also have an expensive to recompute value that is stored with hash.
| if self.len() < len_start { | ||
| self.shrink_to_fit(); | ||
| } |
There was a problem hiding this comment.
| if self.len() < len_start { | |
| self.shrink_to_fit(); | |
| } | |
| self.shrink_to_fit(); |
Why do we want to do that conditionally? I guess it's always good to shrink at this point.
|
|
||
| /// Configurable idle timeout for snapshot persistence. | ||
| /// Defaults to 2 seconds if not set or if the value is invalid. | ||
| /// Defaults to 10 seconds if not set or if the value is invalid. |
There was a problem hiding this comment.
| /// Defaults to 10 seconds if not set or if the value is invalid. | |
| /// Defaults to 2 seconds if not set or if the value is invalid. |
There was a problem hiding this comment.
That was a reviewer honeypot, wasn't it?
| /// then re-check the specific task's `restoring`/`restored` bits after waking. | ||
| pub(crate) restored: Event, | ||
| /// Maps `CachedTaskType` → `TaskId` for deduplication of persistent task creation. | ||
| /// This is backed by the TaskCache table in the backend. |
There was a problem hiding this comment.
| /// This is backed by the TaskCache table in the backend. | |
| /// This is backed by the TaskCache table in the database. |

Summary
Implements memory eviction for the turbo-tasks engine. After a persistence snapshot completes, tasks that are safe to remove are evicted from in-memory storage and transparently restored from disk on next access.
Eviction levels
current_session_clean, aggregated session-clean counts).Data and meta evictability are computed independently — if one category is modified but the other is clean, the clean category can still be dropped.
Eviction is gated behind
BackendOptions::evict_after_snapshot(off by default), and can be enabled in Next.js via theTURBO_ENGINE_EVICT_AFTER_SNAPSHOT=1env var for testing.Key changes
Orthogonal eviction decision tree (
storage_schema.rs): Data and meta evictability are computed independently. Full eviction additionally requires no meaningful transient state (session-clean flags, aggregated session-clean counts). Replaces the previous sequential bail-out approach which was too aggressive on full eviction (losing transient session state on leaf tasks) and not aggressive enough on partial eviction (blocking all eviction when only one category was modified).Transient-dependent scanning (
storage_schema.rs): Evictability checks whetheroutput_dependent,cell_dependents,collectibles_dependents, anduppercontain transient task references via O(n) scans. This is correct but not optimal — a future improvement would maintaindata_contains_transient_state/meta_contains_transient_statebits to make these O(1). TODOs are left in the code.drop_data(),drop_meta()anddrop_data_and_meta()codegen (task_storage_macro.rs): New generated methods.drop_data_and_meta()is a combined single-pass version that scanslazyonce instead of twice.Session-stateful values (
#[turbo_tasks::value(session_stateful)]): Attribute marking value types whose cells accumulate non-serializable runtime state (e.g.DiskFileSystem, which holds file-watcher handles). Tasks writing these cells are blocked from data eviction mid-session. Thesession_statefulproperty is encoded as a high bit inValueTypeIdso eviction checks are a single integer test with no extra memory per task.task_cachemoved intoStorage(storage.rs): TheCachedTaskType → TaskIddeduplication map was previously a separate field onTurboTasksBackendInner. It is now owned byStorageso eviction can remove entries when a task is fully evicted. Becausetask_cacheis a pure performance cache (entries are re-populated bytask_by_type()on miss once the task type is persisted to backing storage), evicting entries is safe. After bulk eviction the map is shrunk when it is less than half full.Parallel shard eviction (
storage.rs): Eviction iterates all storage shards in parallel after snapshot, applying the appropriate eviction level per task. Each shard is shrunk after bulk eviction to reclaim slack capacity.Design notes
current_session_cleanis set we prevent full eviction to avoid rechecking. Within a session the file-watchers are responsible for invalidations after settingcurrent_session_clean.Known limitations (proof of concept)
serialization=nonevalue cells, see [turbopack] Unify Cell Storage #92974last_succesful_parsedev mode optimization which means parse errors become more costly, see [turbopack] Persistent last_successful_parse #92852 for a fix