Skip to content

8382338: Various serviceability agent tests fail on Linux x86_64 with LTO enabled#30771

Open
MBaesken wants to merge 2 commits intoopenjdk:masterfrom
MBaesken:JDK-8382338
Open

8382338: Various serviceability agent tests fail on Linux x86_64 with LTO enabled#30771
MBaesken wants to merge 2 commits intoopenjdk:masterfrom
MBaesken:JDK-8382338

Conversation

@MBaesken
Copy link
Copy Markdown
Member

@MBaesken MBaesken commented Apr 16, 2026

When building hotspot on linuxx86_64/gcc with LTO enabled (--enable-jvm-feature-link-time-opt), we get various test errors in the serviceability/sa area.
Example serviceability/sa/CDSJMapClstats.java

finding class loader instances ..java.lang.InternalError: Metadata does not appear to be polymorphic
at jdk.hotspot.agent/sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:223)
at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:104)
at jdk.hotspot.agent/sun.jvm.hotspot.oops.Metadata.instantiateWrapperFor(Metadata.java:77)
at jdk.hotspot.agent/sun.jvm.hotspot.memory.SystemDictionary.getClassLoaderKlass(SystemDictionary.java:102)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.ClassLoaderStats.printClassLoaderStatistics(ClassLoaderStats.java:93)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.ClassLoaderStats.run(ClassLoaderStats.java:78)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.run(JMap.java:121)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134)
at jdk.hotspot.agent/sun.jvm.hotspot.tools.JMap.main(JMap.java:202)
at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJMAP(SALauncher.java:344)
at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:507)

Seems we have to avoid elimination of the Metadata vtable ; this can be achieved by linker flags or by modifying class Metadata.



Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8382338: Various serviceability agent tests fail on Linux x86_64 with LTO enabled (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30771/head:pull/30771
$ git checkout pull/30771

Update a local copy of the PR:
$ git checkout pull/30771
$ git pull https://git.openjdk.org/jdk.git pull/30771/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30771

View PR using the GUI difftool:
$ git pr show -t 30771

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30771.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link
Copy Markdown

bridgekeeper Bot commented Apr 16, 2026

👋 Welcome back mbaesken! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 16, 2026

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk Bot changed the title JDK-8382338: Various serviceability agent tests fail on Linux x86_64 with LTO enabled 8382338: Various serviceability agent tests fail on Linux x86_64 with LTO enabled Apr 16, 2026
@MBaesken
Copy link
Copy Markdown
Member Author

The linker option setting is similar to what I did here https://bugs.openjdk.org/browse/JDK-8378838 .
Changing Metadata might be 'cleaner' (the linker flag setting is toolchain dependent).

@openjdk openjdk Bot added build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Apr 16, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 16, 2026

@MBaesken The following labels will be automatically applied to this pull request:

  • build
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk Bot added the rfr Pull request is ready for review label Apr 16, 2026
@mlbridge
Copy link
Copy Markdown

mlbridge Bot commented Apr 16, 2026

Webrevs

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 16, 2026

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot. This can be overridden with the /reviewers command.

@MBaesken
Copy link
Copy Markdown
Member Author

MBaesken commented Apr 17, 2026

@plummercj , could you maybe have a look at the metadata change ?
Other C++ 'tricks' to keep the vtable seem not to work

  • typeid usage is not compiling because we set -fno-rtti in the hotspot compile flags
  • some 'virtual destructor tricks' fail because of the class hierarchy class Metadata is in

Metadata::Metadata() {
NOT_PRODUCT(_valid = 0;)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need for this to work with PRODUCT builds also. It's unclear to me why _valid was initially introduced, and why only for NOT_PRODUCT builds.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay so should I remove the NOT_PRODUCT ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need for this to work with PRODUCT builds also. It's unclear to me why _valid was initially introduced, and why only for NOT_PRODUCT builds.

Seems it is used in some debug code, at least removing it breaks the fastdebug build.

@TheShermanTanker
Copy link
Copy Markdown
Contributor

Hmm, why do we need both the linker flag and C++ code changes here? Didn't the other fix work with just the linker flag?

@plummercj
Copy link
Copy Markdown
Contributor

/label add serviceability

@openjdk openjdk Bot added the serviceability serviceability-dev@openjdk.org label Apr 19, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 19, 2026

@plummercj
The serviceability label was successfully added.

@MBaesken
Copy link
Copy Markdown
Member Author

MBaesken commented Apr 20, 2026

Hmm, why do we need both the linker flag and C++ code changes here? Didn't the other fix work with just the linker flag?

We need only one approach. The linker flag coding is in '#' so I can remove it (if we go for the Metaspace class change).

@TheShermanTanker
Copy link
Copy Markdown
Contributor

Hmm, why do we need both the linker flag and C++ code changes here? Didn't the other fix work with just the linker flag?

We need only one approach. The linker flag coding is in '#' so I can remove it (if we go for the Metaspace class change).

I'm not particularly fond of the Metaspace change, it's not very clear from just looking at it what it does, and it looks a bit messy. There's a risk someone will just end up reverting it without realizing. I do remember that there was a different way someone tried to keep the vtable back when an attempt was made to disable RTTI for Windows, but it never went through. I'll go find it and see what that change was and whether it'll help with this issue.

@TheShermanTanker
Copy link
Copy Markdown
Contributor

https://github.com/openjdk/jdk/pull/12743/changes Seems like that one went with introducing a new virtual method and overriding it in the subclass. Also not great, but it's a starting point to see what is needed to stop the vtable from being removed.

@MBaesken
Copy link
Copy Markdown
Member Author

MBaesken commented Apr 20, 2026

https://github.com/openjdk/jdk/pull/12743/changes Seems like that one went with introducing a new virtual method and overriding it in the subclass. Also not great, but it's a starting point to see what is needed to stop the vtable from being removed.

So we derive from class MetaspaceObj

class MetaspaceObj {
.
Do we really want such a dummy (like virtual bool is_metadata() const) there ?

@merykitty
Copy link
Copy Markdown
Member

I don't understand this change. Could you explain why the vtable is removed, and how this change helps retain it?

@MBaesken
Copy link
Copy Markdown
Member Author

MBaesken commented Apr 20, 2026

I don't understand this change. Could you explain why the vtable is removed, and how this change helps retain it?

Why it is removed - in standard compile+link mode it is not removed. But with more aggressive linking (lto/ltgc , enabled by additional configure flags) the linker seems to notice the vtable of this class as 'not needed' and just kicks it out. This can be worked around with adding '-Wl,--undefined=_ZTV8Metadata' to the libjvm linking step but the setting looks a bit like a hack so something different might be prefered.

Usual recommended adjustments I found (additionally to the one in this PR) use typeid in the class to keep the vtable alive(but we cannot do it because we set -fno-rtti in the build) or modify the destructor or constructor. Making the destructor virtual fails in the HS build because it seems not to work with derived classes.
Seems the changed constructor does something to/references the vtable and is not so easily eliminated/inlined by the tools.
But the adjustment Julian pointed to might indeed be better, it looks more reliable (while the constructor adjustment worked for me, I am not 100% sure it will work with all compilers/compiler versions) and was used already in the HS codebase.

@TheShermanTanker
Copy link
Copy Markdown
Contributor

It's a bit unfortunate that the virtual destructor trick doesn't work, I was just about to suggest it since that's what Kim proposed in the Windows RTTI Pull Request. There is technically another way, to use the optimize pragma with the string no-devirtualize, but besides the fact that I didn't test it, it's also a bit too heavy since that disables all optimizations related to vtables and doesn't just keep the vtable, while we still want the optimizations, we only want the vtable to be kept without disabling optimization, since I think Metadata is a performance critical class. In the meantime I'll keep testing to see if there are any clean ways to keep the vtable. I think it's a shame that none of the compilers have something like [[gnu::vtable(true)]] that you can use to force the vtable to be kept though.

@merykitty
Copy link
Copy Markdown
Member

IIUC, the vtable is removed because all virtual calls on that class all removed either because they are dead, or they can be devirtualized to a virtual call on a subtype. As a result, is there an annotation that tells the compiler that a virtual method should not be removed? It should preserve the vtable, right?

Making the destructor virtual fails in the HS build because it seems not to work with derived classes.

That sounds confusing to me, normally it should be the opposite, are we even sure the destructors are called properly?

@plummercj
Copy link
Copy Markdown
Contributor

vtables can be shared if identical. I think the issue with #12743 is that NotificationThread is an override of JavaThread, but provides no additional virtual methods, so the NotificationThread vtable was stripped and instances of NotificationThread just point to the JavaThread vtable. This confuses SA, which needs to know about all JavaThread subtypes. Adding a virtual method to NotificationThread fixed this.

The issue with Metadata seems to be a bit different. I think the problem is that it is an abstract class. Since you can't have an instance of an abstract class, you don't need its vtable, and it gets stripped. However, SA does need it. The references comes from the following SA code:

metadataConstructor.addMapping("Metadata", Metadata.class);

It's not clear to me why it needs this mapping. You might want to try to remove it and see if SA still works.

Regarding the proposed fix, it's not clear to me how adding a constructor forces the vtable to be retained.

Regarding enabling the proposed fix for PRODUCT builds, that means adding the _valid field (and maybe the is_valid() method) to PRODUCT builds. This field will take up space in every instance of every Metadata subtype (every class, field, method, and constant pool currently loaded).

@MBaesken
Copy link
Copy Markdown
Member Author

As a result, is there an annotation that tells the compiler that a virtual method should not be removed?

There are some attributes like attribute 'retain' , e.g. here is some discussion
https://stackoverflow.com/questions/73887908/prevent-gcc-from-optimization-removal-of-variables-when-using-wl-gc-sections

But from what I saw it did not really help with linker-step based elimination.
And how would I use it with the vtable, I have to write the attribute next to some variables/methods etc. ?

@merykitty
Copy link
Copy Markdown
Member

@MBaesken I think I was probably incorrect, the real reason seems to be what @plummercj suggested instead.

Anyway, if what I said turns out to be actually true, I guess you can add __attribute__((retain)) on a random virtual method.

If what @plummercj said is true, but you can't make it so that we don't need a vtable for Metadata, then I think you can de-abstract Metadata by implementing the pure virtual methods. If even then, lto notices that no instance of Metadata is actually created and does not create the vtable, then you can try adding __attribute__((retain)) to Metadata, or actually creating a Metadata static instance.

@TheShermanTanker
Copy link
Copy Markdown
Contributor

The docs on [[gnu::retain]] seem promising and specifically mentions linker garbage collection. It can only be applied to methods, though, not the Metadata class itself.

@MBaesken
Copy link
Copy Markdown
Member Author

MBaesken commented Apr 21, 2026

Regarding the proposed fix, it's not clear to me how adding a constructor forces the vtable to be retained.

With the NOINLINE, the constructor of Metadata is kept in the libjvm.so binary (even with lto).
If I disassemble it, I see the following

objdump -d --demangle --disassemble="Metadata::Metadata()" images/jdk/lib/server/libjvm.so

images/jdk/lib/server/libjvm.so:     file format elf64-x86-64


Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000bca790 <Metadata::Metadata()>:
  bca790:	48 8d 05 c1 81 94 00 	lea    0x9481c1(%rip),%rax        # 1512958 <vtable for Metadata+0x10>
  bca797:	48 89 07             	mov    %rax,(%rdi)
  bca79a:	c3                   	ret

So the remaining constructor touches (inits?) the vtable. The lto 'sees' this, and so it better keeps it (the vtable of Metadata) in the binary.
See

nm -C images/jdk/lib/server/libjvm.so | grep "vtable for Metadata"
0000000001512948 d vtable for Metadata

@MBaesken
Copy link
Copy Markdown
Member Author

MBaesken commented Apr 21, 2026

The docs on [[gnu::retain]] seem promising and specifically mentions linker garbage collection. It can only be applied to methods, though, not the Metadata class itself.

I could put it additionally on the constructor 'Metadata::Metadata()' , just in case ....
But I guess it is not cross platform (just gcc/clang), right ?

@TheShermanTanker
Copy link
Copy Markdown
Contributor

Yeah, unfortunately just gcc and clang. But we do not allow any compiler besides VC on Windows, and VC doesn't seem to have this issue. What error is shown when the virtual destructor approach is used, out of curiosity?

@TheShermanTanker
Copy link
Copy Markdown
Contributor

metadata.cpp has something related to the vtable as well,

// Can't inline because this materializes the vtable on some C++ compilers.

@MBaesken
Copy link
Copy Markdown
Member Author

What error is shown when the virtual destructor approach is used, out of curiosity?

gcc reports this


/jdk/src/hotspot/share/oops/klass.hpp:62:7: error: deleted function 'virtual Klass::~Klass()' overriding non-deleted function
   62 | class Klass : public Metadata {
      |       ^~~~~

@MBaesken
Copy link
Copy Markdown
Member Author

I could add the retain e.g. like this

  // Keep the vtable alive under LTGC dead-section removal / LTO
#if defined(__GNUC__) || defined(__clang__)
  [[gnu::retain]]
#endif
  NOINLINE Metadata();

Does that bring a benefit? If so, do we need to check for gcc / clang versions too ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review serviceability serviceability-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

4 participants