8382338: Various serviceability agent tests fail on Linux x86_64 with LTO enabled#30771
8382338: Various serviceability agent tests fail on Linux x86_64 with LTO enabled#30771MBaesken wants to merge 2 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back mbaesken! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
The linker option setting is similar to what I did here https://bugs.openjdk.org/browse/JDK-8378838 . |
Webrevs
|
|
The total number of required reviews for this PR has been set to 2 based on the presence of this label: |
|
@plummercj , could you maybe have a look at the metadata change ?
|
| Metadata::Metadata() { | ||
| NOT_PRODUCT(_valid = 0;) | ||
| } | ||
|
|
There was a problem hiding this comment.
We need for this to work with PRODUCT builds also. It's unclear to me why _valid was initially introduced, and why only for NOT_PRODUCT builds.
There was a problem hiding this comment.
okay so should I remove the NOT_PRODUCT ?
There was a problem hiding this comment.
We need for this to work with PRODUCT builds also. It's unclear to me why _valid was initially introduced, and why only for NOT_PRODUCT builds.
Seems it is used in some debug code, at least removing it breaks the fastdebug build.
|
Hmm, why do we need both the linker flag and C++ code changes here? Didn't the other fix work with just the linker flag? |
|
/label add serviceability |
|
@plummercj |
We need only one approach. The linker flag coding is in '#' so I can remove it (if we go for the Metaspace class change). |
I'm not particularly fond of the Metaspace change, it's not very clear from just looking at it what it does, and it looks a bit messy. There's a risk someone will just end up reverting it without realizing. I do remember that there was a different way someone tried to keep the vtable back when an attempt was made to disable RTTI for Windows, but it never went through. I'll go find it and see what that change was and whether it'll help with this issue. |
|
https://github.com/openjdk/jdk/pull/12743/changes Seems like that one went with introducing a new virtual method and overriding it in the subclass. Also not great, but it's a starting point to see what is needed to stop the vtable from being removed. |
So we derive from class MetaspaceObj jdk/src/hotspot/share/memory/allocation.hpp Line 246 in abd3d46 Do we really want such a dummy (like virtual bool is_metadata() const) there ? |
|
I don't understand this change. Could you explain why the vtable is removed, and how this change helps retain it? |
Why it is removed - in standard compile+link mode it is not removed. But with more aggressive linking (lto/ltgc , enabled by additional configure flags) the linker seems to notice the vtable of this class as 'not needed' and just kicks it out. This can be worked around with adding '-Wl,--undefined=_ZTV8Metadata' to the libjvm linking step but the setting looks a bit like a hack so something different might be prefered. Usual recommended adjustments I found (additionally to the one in this PR) use typeid in the class to keep the vtable alive(but we cannot do it because we set -fno-rtti in the build) or modify the destructor or constructor. Making the destructor virtual fails in the HS build because it seems not to work with derived classes. |
|
It's a bit unfortunate that the virtual destructor trick doesn't work, I was just about to suggest it since that's what Kim proposed in the Windows RTTI Pull Request. There is technically another way, to use the optimize pragma with the string no-devirtualize, but besides the fact that I didn't test it, it's also a bit too heavy since that disables all optimizations related to vtables and doesn't just keep the vtable, while we still want the optimizations, we only want the vtable to be kept without disabling optimization, since I think Metadata is a performance critical class. In the meantime I'll keep testing to see if there are any clean ways to keep the vtable. I think it's a shame that none of the compilers have something like [[gnu::vtable(true)]] that you can use to force the vtable to be kept though. |
|
IIUC, the vtable is removed because all virtual calls on that class all removed either because they are dead, or they can be devirtualized to a virtual call on a subtype. As a result, is there an annotation that tells the compiler that a virtual method should not be removed? It should preserve the vtable, right?
That sounds confusing to me, normally it should be the opposite, are we even sure the destructors are called properly? |
|
vtables can be shared if identical. I think the issue with #12743 is that NotificationThread is an override of JavaThread, but provides no additional virtual methods, so the NotificationThread vtable was stripped and instances of NotificationThread just point to the JavaThread vtable. This confuses SA, which needs to know about all JavaThread subtypes. Adding a virtual method to NotificationThread fixed this. The issue with Metadata seems to be a bit different. I think the problem is that it is an abstract class. Since you can't have an instance of an abstract class, you don't need its vtable, and it gets stripped. However, SA does need it. The references comes from the following SA code: It's not clear to me why it needs this mapping. You might want to try to remove it and see if SA still works. Regarding the proposed fix, it's not clear to me how adding a constructor forces the vtable to be retained. Regarding enabling the proposed fix for PRODUCT builds, that means adding the _valid field (and maybe the is_valid() method) to PRODUCT builds. This field will take up space in every instance of every Metadata subtype (every class, field, method, and constant pool currently loaded). |
There are some attributes like attribute 'retain' , e.g. here is some discussion But from what I saw it did not really help with linker-step based elimination. |
|
@MBaesken I think I was probably incorrect, the real reason seems to be what @plummercj suggested instead. Anyway, if what I said turns out to be actually true, I guess you can add If what @plummercj said is true, but you can't make it so that we don't need a vtable for |
|
The docs on [[gnu::retain]] seem promising and specifically mentions linker garbage collection. It can only be applied to methods, though, not the Metadata class itself. |
With the NOINLINE, the constructor of Metadata is kept in the libjvm.so binary (even with lto).
So the remaining constructor touches (inits?) the vtable. The lto 'sees' this, and so it better keeps it (the vtable of Metadata) in the binary. nm -C images/jdk/lib/server/libjvm.so | grep "vtable for Metadata" |
I could put it additionally on the constructor 'Metadata::Metadata()' , just in case .... |
|
Yeah, unfortunately just gcc and clang. But we do not allow any compiler besides VC on Windows, and VC doesn't seem to have this issue. What error is shown when the virtual destructor approach is used, out of curiosity? |
|
metadata.cpp has something related to the vtable as well, jdk/src/hotspot/share/oops/metadata.cpp Line 31 in a1dee68 |
gcc reports this |
|
I could add the retain e.g. like this Does that bring a benefit? If so, do we need to check for gcc / clang versions too ? |
When building hotspot on linuxx86_64/gcc with LTO enabled (--enable-jvm-feature-link-time-opt), we get various test errors in the serviceability/sa area.
Example serviceability/sa/CDSJMapClstats.java
Seems we have to avoid elimination of the Metadata vtable ; this can be achieved by linker flags or by modifying class Metadata.
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30771/head:pull/30771$ git checkout pull/30771Update a local copy of the PR:
$ git checkout pull/30771$ git pull https://git.openjdk.org/jdk.git pull/30771/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 30771View PR using the GUI difftool:
$ git pr show -t 30771Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30771.diff
Using Webrev
Link to Webrev Comment