Fix dead surface hook#3404
Conversation
6b25a7f to
704fabe
Compare
|
With this PR, I'm consistently getting pairs of: by running gtk4-demo in a terminal then Ctrl-C-ing it. Suggesting that in this case, WlSurface::destroyed() runs before toplevel_destroyed(). Is this intended on the Smithay side? It seems that when I wrote the code I definitely did not intend for this ordering to be possible. If this is intended, then I'll need to think if there's anything else that needs adjusting then. |
|
Note that I also found some destruction order weirdness when working on wlr-screencopy: https://github.com/YaLTeR/niri/blob/549148d27779d024255a84535b42b947f1c2a113/src/protocols/screencopy.rs#L445-L472 I don't know if this is relevant here though, or if toplevel_destroyed() just generally is expected to sometimes run after WlSurface::destroyed() |
I just checked with kitty and the |
|
I wonder if it could be caused by this code here: https://github.com/Smithay/wayland-rs/blob/9c17fa20ef275e214e97020a46d2ec3bda1c2f77/wayland-backend/src/sys/server_impl/mod.rs#L1624 |
|
Hm, |
|
Okay, after a lot of debugging it looks like some clients will destroy the resources in the correct order, but not wait for destruction. On client disconnect libwayland then seems to call destroy in id order, so surface destroy will be called before toplevel destroy. The same seems to be true in case a client is disconnected because of a protocol error. |
|
Thanks. If this is the case, then let's clearly document it in Smithay/wayland-rs (e.g. in toplevel_destroyed() docs say that it will sometimes be called after WlSurface::destroyed(), etc.) |
I fear there is not much we can do and it makes sense that there can't be any guarantee of destruction order in case the client just exits. (I mean technically we could try to delay destruction in a way that keeps protocol order, but that does sound like a nightmare to do right). Anyway, I opened a PR to at least warn about this in the handlers where smithay forwards the destruction callbacks: Smithay/smithay#1924 |
|
@YaLTeR I looked through the code and it looks like surface/toplevel destruction should be fine with these changes. |
|
What was actually causing the VRAM leak? So we have |
Yes, afaict the strong reference to the surface is what keeps the gpu resource alive. The surface still has a reference to the texture according to my valgrind lokgs. Not using the dmabuf hooks (actually not inserting the surface in the hook list) or using weak surface references also fixes the issue (at least the issue I was able to reproduce). |
|
I have also found that it might not be consistent, sometimes the destruction handler run in the correct order. |
|
Thanks, I'll merge it a bit later. I'll rearrange some conditionals |
this re-uses the already existing remove_default_dmabuf_pre_commit_hook during surface destruction instead of just removing the hook from the stored list of hooks.
this prevents surfaces getting stored indefinitely in case some logic tries to add the hook for an already destroyed surface.
resource destruction has undefined order in case the client does not explicitly destroy the resourced and wait for destruction to complete. the same applies for clients exiting unexpectedly.
f457499 to
11134b9
Compare
|
Thanks |
* remove pre-commit hook when surface is destroyed this re-uses the already existing remove_default_dmabuf_pre_commit_hook during surface destruction instead of just removing the hook from the stored list of hooks. * do not add dmabuf pre-commit hook for destroyed surfaces this prevents surfaces getting stored indefinitely in case some logic tries to add the hook for an already destroyed surface. * align surface/toplevel destruction order for client destruction resource destruction has undefined order in case the client does not explicitly destroy the resourced and wait for destruction to complete. the same applies for clients exiting unexpectedly. * rearrange some things --------- Co-authored-by: Ivan Molodetskikh <yalterz@gmail.com>
So, it looks like niri ends up adding dmabuf hooks for already destroyed surfaces. I haven't looked too much into why it does this, but my guess is that somehow
toplevel_destroyedgets called after the surface is destroyed. Thetoplevel_destoryedhandler does try to install the hook, which will insert the surface in the list for tracking, but mightnever get removed again keeping a reference to the last buffer/dmabuf.
I will also open a PR in smithay to prevent this, but that will be a breaking change and might take a bit longer.
So I am opening this PR here first to see if it really fixes the issue and maybe get a short term fix.
fixes #1869
(Draft PR in smithay: Smithay/smithay#1921)