rust-lang · traviscross · Jun 18, 2024 · Aug 1, 2023 · Aug 1, 2023 · Aug 1, 2023
diff --git a/text/0000-unsafe-aliased.md b/text/0000-unsafe-aliased.md
@@ -0,0 +1,337 @@
+# `unsafe_aliased`
+
+- Feature Name: `unsafe_aliased`
+- Start Date: 2022-11-05
+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
+
+# Summary
+[summary]: #summary
+
+Add a type `UnsafeAliased` that acts on `&mut` in a similar way to how `UnsafeCell` acts on `&`: it opts-out of the aliasing guarantees.
+However, `&mut UnsafeAliased` can still be `mem::swap`ed, so this is not a free ticket for arbitrary aliasing of mutable references.
+Abstractions built on top of `UnsafeAliased` must ensure proper encapsulation when handing such aliases references to clients (e.g. via pinning).
+
+This type is then used in generator lowering, finally fixing [#63818](https://github.com/rust-lang/rust/issues/63818).
+
+# Motivation
+[motivation]: #motivation
+
+Let's say you want to write a type with a self-referential pointer:
+
+```rust
+#![feature(pin_macro)]
+#![feature(negative_impls)]
+use std::ptr;
+use std::pin::{pin, Pin};
+
+pub struct S {
+    data: i32,
+    ptr_to_data: *mut i32,
+}
+
+impl !Unpin for S {}
+
+impl S {
+    pub fn new() -> Self {
+        S { data: 42, ptr_to_data: ptr::null_mut() }
+    }
+
+    pub fn get_data(self: Pin<&mut Self>) -> i32 {
+        // SAFETY: We're not moving anything.
+        let this = unsafe { Pin::get_unchecked_mut(self) };
+        if this.ptr_to_data.is_null() {
+            this.ptr_to_data = ptr::addr_of_mut!(this.data);
+        }
+        // SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
+        unsafe { this.ptr_to_data.read() }
+    }
+
+    pub fn set_data(self: Pin<&mut Self>, data: i32) {
+        // SAFETY: We're not moving anything.
+        let this = unsafe { Pin::get_unchecked_mut(self) };
+        if this.ptr_to_data.is_null() {
+            this.ptr_to_data = ptr::addr_of_mut!(this.data);
+        }
+        // SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
+        unsafe { this.ptr_to_data.write(data) }
+    }
+}
+
+fn main() {
+    let mut s = pin!(S::new());
+    s.as_mut().set_data(42);
+    println!("{}", s.as_mut().get_data());
+}
+```
+
+This kind of code is implicitly generated by rustc all the time when an `async fn` has a local variable of reference type that is live across a yield point.
+The problem is that this code has UB under our aliasing rules: the `&mut S` inside the `self` argument of `get_data` aliases with `ptr_to_data`!
+
+This simple code only has UB under Stacked Borrows but not under the LLVM aliasing rules; more complex variants of this -- still in the realm of what `async fn` generates -- also have UB under the LLVM aliasing rules.
+
+<details>
+
+<summary>A more complex variant</summary>
+
+```rust
+#![feature(pin_macro)]
+#![feature(negative_impls)]
+use std::ptr;
+use std::pin::{pin, Pin};
+use std::task::Poll;
+
+pub struct S {
+    state: i32,
+    data: i32,
+    ptr_to_data: *mut i32,
+}
+
+impl !Unpin for S {}
+
+impl S {
+    pub fn new() -> Self {
+        S { state: 0, data: 42, ptr_to_data: ptr::null_mut() }
+    }
+
+    fn poll(self: Pin<&mut Self>) -> Poll<()> {
+        // SAFETY: We're not moving anything.
+        let this = unsafe { Pin::get_unchecked_mut(self) };
+        match this.state {
+            0 => {
+                // The first time, set up the pointer.
+                this.ptr_to_data = ptr::addr_of_mut!(this.data);
+                // Now yield.
+                this.state += 1;
+                Poll::Pending
+            }
+            1 => {
+                // After coming back from the yield, write to the pointer.
+                unsafe { this.ptr_to_data.write(42) };
+                // And read our local variable `data`.
+                // THIS IS UB! `this` is derived from the `noalias` pointer
+                // `self` but we did a write to `this.data` in the previous
+                // line when writing to `ptr_to_data`. The compiler is allowed
+                // to reorder this and the previous line and then the output
+                // would change.
+                println!("{}", this.data);
+                // Done!
+                Poll::Ready(())
+            }
+            _ => unreachable!(),
+        }
+    }
+}
+
+fn main() {
+    let mut s = pin!(S::new());
+    while let Poll::Pending = s.as_mut().poll() {}
+}
+```
+
+</details>
+
+<br>
+
+Beyond self-referential types, a similar problem also comes up with intrusive linked lists: the nodes of such a list often live on the stack frames of the functions participating in the list, but also have incoming pointers from other list elements.
+When a function takes a mutable reference to its stack-allocated node, that will alias the pointers from the neighboring elements.
+`Pin` is used to ensure that the list elements don't just move elsewhere (which would invalidate those incoming pointers), but there still is the problem that an `&mut Node` is actually not a unique pointer due to these aliases -- so we need a way for the to opt-out of the aliasing rules.
+
+<br>
+
+The goal of this RFC is to offer a way of writing such self-referential types and intrusive collections without UB.
+We don't want to change the rules for mutable references in general (that would also affect all the code that doesn't do anything self-referential), instad we want to be able to tell the compiler that this code is doing funky aliasing and that should be taken into account for optimizations.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+To write this code in a UB-free way, wrap the fields that are *targets* of self-referential pointers in an `UnsafeAliased`:
+
+```rust
+pub struct S {
+    data: UnsafeAliased<i32>, // <!---- here
+    ptr_to_data: *mut i32,
+}
+```
+
+This lets the compiler know that mutable references to `data` might still have aliases, and so optimizations cannot assume that no aliases exist. That's entirely analogous to how `UnsafeCell` lets the compiler know that *shared* references to this field might undergo mutation.
+
+However, what is *not* analogous is that `&mut S`, when handed to safe code *you do not control*, must still be unique pointers!
+This is because of methods like `mem::swap` that can still assume that their two arguments are non-overlapping.
+(`mem::swap` on two `&mut UnsafeAliased<i32>` may soundly assume that they do not alias.)
+In other words, the safety invariant of `&mut S` still requires full ownership of the entire memory range `S` is stored at; for the duration that a function holds on to the borrow, nobody else may read and write this memory.
+But again, this is a *safety invariant*; it only applies to safe code you do not control. You can write your own code handling `&mut S` and as long as that code is careful not to make use of this memory in the wrong way, potential aliasing is fine.
+You can also hand out a `Pin<&mut S>` to external safe code; the `Pin` wrapper imposes exactly the restrictions needed to ensure that this remains sound (e.g., it prevents `mem::swap`).
+
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+API sketch:
+
+```rust
+/// The type `UnsafeAliased<T>` lets unsafe code violate
+/// the rule that `&mut UnsafeAliased<T>` may never alias anything else.
+///
+/// However, it is still very risky to have two `&mut UnsafeAliased<T>` that alias
+/// each other. For instance, passing those two references to `mem::swap`
+/// would cause UB. The general advice is to still have a single mutable
+/// reference, and then a bunch of raw pointers that alias it.
+/// If you want to pass such a reference to external safe code,
+/// you need to use techniques such as pinning (`Pin<&mut TypeWithUnsafeAliased>`)
+/// which ensure that both the mutable reference and its aliases remain valid.
+///
+/// Further note that this does *not* lift the requirement that shared references
+/// must be read-only! Use `UnsafeCell` for that.
+#[lang = "unsafe_cell_mut"]
+#[repr(transparent)]
+struct UnsafeAliased<T: ?Sized> {
+    value: T,
+}
+
+impl<T: ?Sized> !Send for UnsafeAliased<T> {}
+
+impl<T> UnsafeCell<T> {
+    /// Constructs a new instance of `UnsafeCell` which will wrap the specified
+    /// value.
+    pub fn new(value: T) -> UnsafeAliased<T> {
+        UnsafeAliased { value }
+    }
+
+    pub fn into_inner(self) -> T {
+        self.value
+    }
+
+    /// Get read-write access to the contents of an `UnsafeAliased`.
+    pub fn get_mut(&mut self) -> *mut T {
+        self as *mut _ as *mut T
+    }
+
+    /// Get read-only access to the contents of a shared `UnsafeAliased`.
+    /// Note that `&UnsafeAliased<T>` is read-only if `&T` is read-only.
+    /// This means that if there is mutation of the `T`, future reads from the
+    /// `*const T` returned here are UB!
+    ///
+    /// ```rust
+    /// unsafe {
+    ///     let mut x = UnsafeAliased::new(0);
+    ///     let ref1 = &mut *addr_of_mut!(x);
+    ///     let ref2 = &mut *addr_of_mut!(x);
+    ///     let ptr = ref1.get(); // read-only pointer, assumes immutability
+    ///     ref2.get_mut.write(1);
+    ///     ptr.read(); // UB!
+    /// }
+    /// ```
+    pub fn get(&self) -> *const T {
+        self as *const _ as *const T
+    }
+
+    pub fn raw_get_mut(this: *mut Self) -> *mut T {
+        this as *mut T
+    }
+
+    pub fn raw_get(this: *const Self) -> *const T {
+        this as *const T
+    }
+}
+```
+
+The comment about aliasing `&mut` being "risky" refers to the fact that their safety invariant still asserts exclusive ownership.
+This implies that `duplicate` in the following example, while not causing immediate UB, is still unsound:
+
+```rust
+pub struct S {
+    data: UnsafeAliased<i32>,
+}
+
+impl S {
+    fn new(x: i32) -> Self {
+        S { data: UnsafeAliased::new(x) }
+    }
+
+    fn duplicate<'a>(s: &'a mut S) -> (&'a mut S, &'a mut S) {
+        let s1 = unsafe { (&s).transmute_copy() };
+        let s2 = s;
+        (s1, s2)
+    }
+}
+```
+
+The unsoundness is easily demonstrated by using safe code to cause UB:
+
+```rust
+let mut s = S::new(42);
+let (s1, s2) = s.duplicate(); // no UB
+mem::swap(s1, s2); // UB
+```
+
+We could even soundly make `get_mut` return an `&mut T`, given that the safety invariant is not affected by `UnsafeAliased`.
+But that would probably not be useful and only cause confusion.
+
+Reference diff:
+
+```diff
+  * Breaking the [pointer aliasing rules]. `&mut T` and `&T` follow LLVM’s scoped
+-   [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`].
++   [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`] or
++   the `&mut T` contains an [`UnsafeAliased<U>`].
+```
+
+Async generator lowering changes:
+- Fields that represent local variables whose address is taken across a yield point must be wrapped in `UnsafeAliased`.
+
+Codegen and Miri changes:
+
+- We have a `Unique` auto trait similar to `Freeze` that is implemented if the type does not contain any by-val `UnsafeAliased`.
+- `noalias` on mutable references is only emitted for `Unique` types. (This replaces the current hack where it is only emitted for `Unpin` types.)
+- Miri will do `SharedReadWrite` retagging inside `UnsafeAliased` similar to what it does inside `UnsafeCell` already. (This replaces the current `Unpin`-based hack.)
+
+### Comparison with some other types that affect aliasing
+
+- `UnsafeCell`: disables aliasing (and affects but does not fully disable dereferenceable) behind shared refs, i.e. `&UnsafeCell<T>` is special. `UnsafeCell<&T>` (by-val, fully owned) is not special at all and basically like `&T`; `&mut UnsafeCell<T>` is also not special.
+- `UnsafeAliased`: disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e. `&mut UnsafeAliased<T>` is special. `UnsafeAliased<&mut T>` (by-val, fully owned) is not special at all and basically like `&mut T`; `&UnsafeAliased<T>` is also not special.
+- [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336): disables aliasing and dereferencable *of all references (and boxes) directly inside it*, i.e. `MaybeDanling<&[mut] T>` is special. `&[mut] MaybeDangling<T>` is not special at all and basically like `&[mut] T`.
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+It's yet another wrapper type adjusting our aliasing rules and very easy to mix up with `UnsafeCell` or [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336).
+Furthermore, it is an extremely subtle wrapper type, as the `duplicate` example shows.
+`Unique` is a very strange trait, since it does not come with much of a safety requirement -- unlike `Freeze`, which quite clearly expresses some statement about the invariant of a type when shared, it is much less clear what the corresponding statement would be for `Unique`. (More work on RustBelt is needed to determine the details here.)
+
+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives
+
+The proposal in this RFC matches [what was discussed with the lang team a long time ago](https://github.com/rust-lang/rust/issues/63818#issuecomment-526579977).
+
+However, of course one could imagine alternatives:
+
+- Keep the status quo. The current sitaution is that we only make aliasing requirements on mutable references if the type they point to is `Unpin`. This is unsatisfying: `Unpin` was never meant to have this job. A consequence is that a stray `impl Unpin` on a `Wrapper<T>`-style type can [lead to subtle miscompilations](https://internals.rust-lang.org/t/surprising-soundness-trouble-around-pollfn/17484/15) since it re-adds aliasing requirements for the inner `T`. Contrast this with the `UnsafeCell` situation, where it is not possible for (stable) code to just `impl Freeze for T` in the wrong way -- `UnsafeCell` is *always* recognized by the compiler.
+
+   On the other hand, `UnsafeAliased` is rather quirky in its behavior and having two marker traits (`Unique` and `Unpin`) might be too confusing, so sticking with `Unpin` might not be too bad in comparison.
+
+  If we do that, however, it seems preferrable to transition `Unpin` to an unsafe trait. There *is* a clear statement about the types' invariants associated with `Unpin`, so an `impl Unpin` already comes with a proof obligation. It just happens to be the case that in a module without unsafe, one can always arrange all the pieces such that the proof obligation is satisifed. This is mostly a coincidence  and related to the fact that we don't have safe field projections on `Pin`. That said, solving this also requires solving the trouble around `Drop` and `Pin`, where effectively an `impl Drop` does an implicit `get_mut_unchecked`, i.e., implicitly assumes the type is `Unpin`.
+- `UnsafeAliased` could affect aliasing guarantees *both* on mutable and shared references. This would avoid the currently rather subtle situation that arises when one of many aliasing `&mut UnsafeAliased<T>` is cast or coerced to `&UnsafeAliased<T>`: that is a read-only shared reference and all aliases must stop writing! It would make this type strictly more 'powerful' than `UnsafeCell` in the sense that replacing `UnsafeCell` by `UnsafeAliased` would always be correct. (Under the RFC as written, `UnsafeCell` and `UnsafeAliased` can be nested to remove aliasing requirements from both shared and mutable references.)
+
+  If we don't do this, we could consider removing `get` since since it seems too much like a foot-gun. But that makes shared references to `UnsafeAliased` fairly pointless. Shared references to generators/futures are basically useless so it is unclear what the potential use-cases here are.
+- Instead of introducing a new type, we could say that `UnsafeCell` affects *both* shared and mutable references. That would lose some optimization potential on types like `&mut Cell<T>`, but would avoid the footgun of coercing an `&mut UnsafeAliased<T>` to `&UnsafeAliased<T>`. That said, so far the author is not aware of Miri detecting code that would run into this footgun (and Miri is [able to do detect such issues](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=aab417b535f7dbd266fbfe470ea208c7)).
+
+
+# Prior art
+[prior-art]: #prior-art
+
+This is somewhat like `UnsafeCell`, but for mutable instead of shared references.
+
+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+- How do we transition code that relies on `Unpin` opting-out of aliasing guarantees, to the new type? Futures and generators just need a compiler patch, but there is probably other code that needs adjusting (e.g., Rust-for-Linux uses pinning to handle all sorts of self-referntial things in the Linux Kernel). Note that all such code is explicitly unsupported right now; the `Unpin` loophole has always explicitly been declared as temporary, unstable, and not something that we promise will actually work.
+- The name of the type needs to be bikeshed. `UnsafeAliased` might be too close to `UnsafeCell`, but that is a deliberate choice to indicate that this type has an effect when it appears in the *pointee*, unlike types like `MaybeUninit` or [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336) that have an effect when wrapped around the *pointer*. Like `UnsafeCell`, the aliasing allowed here is "interior". Other possible names: `UnsafeSelfReferential`, `UnsafePinned`, ...
+- Relatedly, in which module should this type live?
+- Should this type `derive(Copy)`? `UnsafeCell` does not, which is unfortunate because it now means some people might use `T: Copy` as indication that there is no `UnsafeCell` in `T`.
+- `Unpin` [also affects the `dereferenceable` attribute](https://github.com/rust-lang/rust/pull/106180), so the same would happen for this type. Is that something we want to guarantee, or do we hope to get back `dereferenceable` when better semantics for it materialize on the LLVM side?
+
+# Future possibilities
+[future-possibilities]: #future-possibilities
+
+- Similar to how we might want the ability to project from `&UnsafeCell<Struct>` to `&UnsafeCell<Field>`, the same kind of projection could also be interesting for `UnsafeAliased`.