Skip to content

8380060: C2: Wrong execution with COH and arraycopy#30775

Open
rkennke wants to merge 8 commits intoopenjdk:masterfrom
rkennke:JDK-8380060
Open

8380060: C2: Wrong execution with COH and arraycopy#30775
rkennke wants to merge 8 commits intoopenjdk:masterfrom
rkennke:JDK-8380060

Conversation

@rkennke
Copy link
Copy Markdown
Contributor

@rkennke rkennke commented Apr 16, 2026

C2's generate_block_arraycopy miscompiles when compact object headers are enabled. With compact headers, arrayOopDesc::base_offset_in_bytes(T_BYTE) returns 12 instead of 16. Since 12 is not 8-byte aligned, the block arraycopy optimization picks off an initial 32-bit word via a typed LoadI/StoreI pair before switching to a long-word copy.

The LoadI reads from the source array using the typed memory slice. When the source array was initialized by a prior arraycopy (e.g. Arrays.copyOfRange optimized to a clone), that arraycopy wrote to raw memory. The typed memory slice does not reflect these raw writes. The IGVN folds the typed load to zero, corrupting the first 4 bytes of the copied array.

The initial-word pick-off path is only reached when the array data offset is not 8-byte aligned. Without compact headers the offset is 16 (aligned), so this path is never taken through standard library code. With compact headers it is always taken for byte/boolean/short/char/int/float arrays copied via tightly-coupled allocations.

The proposed fix is to make the clone AC update all memory slices, not just raw. This is already done for all other places where we create an ArrayCopyNode.

I also turned the Test.java test into a jtreg test (with some improvements to cover more scenarios).

Special thanks to Luca Ardueser, BSI Software AG who found and reported the issue, and helped with initial investigation and came up with a test-case.

Testing:

  • the originally submitted test-case
  • the new derived jtreg test
  • tier1 (default)
  • tier1 (+UseCompactObjectHeaders)


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8380060: C2: Wrong execution with COH and arraycopy (Bug - P3)

Reviewers

Contributors

  • Christian Hagedorn <chagedorn@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30775/head:pull/30775
$ git checkout pull/30775

Update a local copy of the PR:
$ git checkout pull/30775
$ git pull https://git.openjdk.org/jdk.git pull/30775/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30775

View PR using the GUI difftool:
$ git pr show -t 30775

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30775.diff

Using Webrev

Link to Webrev Comment

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Apr 16, 2026

/contributor add chagedorn

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Apr 16, 2026

/contributor add Luca Ardueser, BSI Software AG

@bridgekeeper
Copy link
Copy Markdown

bridgekeeper Bot commented Apr 16, 2026

👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 16, 2026

@rkennke This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8380060: C2: Wrong execution with COH and arraycopy

Co-authored-by: Christian Hagedorn <chagedorn@openjdk.org>
Reviewed-by: kvn, qamai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 111 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 16, 2026

@rkennke
Contributor Christian Hagedorn <chagedorn@openjdk.org> successfully added.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 16, 2026

@rkennke Luca Ardueser, BSI Software AG was not found in the census.

Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <duke@openjdk.org>

User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address.

@openjdk openjdk Bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 16, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 16, 2026

@rkennke The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 16, 2026

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot-compiler. This can be overridden with the /reviewers command.

@merykitty
Copy link
Copy Markdown
Member

Just a drive-by comment: I don't really understand. The comment talks about the possibility that src is unintialized, but the code checks dest_uninitialized. Besides, if generating a Load from the result of an arraycopy leads to incorrect folding, how do normal loads work? It seems that this load we generate here reads the incorrect memory slice.

A better fix would have been to turn the picked-off LoadI to load from raw slice, but this doesn't easily work, because we can't do raw accesses from oops (or I could not figure out how to do it).

An AllocateNode has a raw pointer output, which is CheckCastPPed into a typed oop. If you want to do a raw access, you need to look for that raw pointer, you need to ensure that those accesses lie between the AllocateNode and the InitializeNode, too.

@rwestrel recently work with this, I believe he should take a look and/or have a better idea.

@openjdk openjdk Bot added the hotspot hotspot-dev@openjdk.org label Apr 17, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 17, 2026

@rkennke hotspot has been added to this pull request based on files touched in new commit(s).

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Apr 17, 2026

Just a drive-by comment: I don't really understand. The comment talks about the possibility that src is unintialized, but the code checks dest_uninitialized. Besides, if generating a Load from the result of an arraycopy leads to incorrect folding, how do normal loads work? It seems that this load we generate here reads the incorrect memory slice.

Right. The dest_initialized check did nothing. That path is only ever taken when dest_initialized=true to begin with, meaning the change disabled the pick-off load/store wholesale. And it is not the right fix.

The real problem (as far as my limited understanding of C2 goes) is that the preceding clone/arraycopy only feeds into the 'raw' memory slice. The pick-off load/store would (via complicated ways) not see this, but instead see the initialize node, and get turned into constant 0.

I am not sure why regular Java-level loads are not affected by this. Maybe a lucky coincidence? The trailing membar after clone might also have helped?

Anyhow, the real fix seems to be that the AC in clone needs to update all memory slices, not just raw. This is also what all other occurances of ArrayCopyNode do (e.g. copyOf/copyOfRange/arraycopy). See updated proposed change. It fixes the test-case for me. I'm currently running tier1 tests...

Comment thread src/hotspot/share/opto/arraycopynode.cpp Outdated
@rkennke rkennke marked this pull request as ready for review April 20, 2026 08:49
@openjdk openjdk Bot added the rfr Pull request is ready for review label Apr 20, 2026
@mlbridge
Copy link
Copy Markdown

mlbridge Bot commented Apr 20, 2026

Webrevs

// Expected pattern: out_mem → MergeMem → MemBar, or
// out_mem → MemBar (when MergeMem was folded away).
Node* mem_use = out_mem->raw_out(0);
if (out_mem->outcnt() != 1 ||
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This De Morgan is quite hard to grasp, do you think !(mem_use->is_MemBar() || (mem_use->is_MergeMem() && mem_use->outcnt() == 1 && mem_use->raw_out(0)->is_MemBar()) would be a little easier to read? Not so sure tbh.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it is easier to read when it's mostly positive tests, with a single leading negation. Also, I am pulling out the outcnt == 1 test to before we actually fetch out_mem->raw_out(0), while that should be safe, it looks brittle and is definitely safer if checked before.

return false;
}

igvn->replace_node(out_mem->raw_out(0), mem);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this piece, why do we try to replace only the first output of out_mem, what happens if out_mem has multiple outputs? If out_mem should only have 1 output, then an assert seems more suitable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assumption is that out_mem only has one use. We check that in the big expression above and it was pre-existing. I also think this should be an assertion, but I'm a little hesitant tbh.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah wait, I think the assertion there already answers this: the known case where the expectation outcnt == 1 might not hold true is with card-marking, and we bail out and not decompose the clone to begin with.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I misparsed that.

return phase->C->top();
}

bool ArrayCopyNode::finish_transform(PhaseGVN *phase, bool can_reshape,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful if there is a description about what is mem, and what memory states it captures.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a comment.

Copy link
Copy Markdown
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwestrel recently work with this, I believe he should take a look and/or have a better idea.

I completely agree with this. Lets Roland to look on this - he is expert in this part.

Current fix does not look right based on the issue, @rkennke, you described.

@merykitty
Copy link
Copy Markdown
Member

@vnkozlov Can you clarify? The fix looks pretty correct to me.

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Apr 20, 2026

@rwestrel recently work with this, I believe he should take a look and/or have a better idea.

I completely agree with this. Lets Roland to look on this - he is expert in this part.

Current fix does not look right based on the issue, @rkennke, you described.

The original fix was not right, as @merykitty pointed out last week, but I think this new fix should be mostly correct. It'd be good if @rwestrel would look at it anyway.

Comment on lines +745 to +753
const TypePtr* raw_adr_type = TypeRawPtr::BOTTOM;
ac->set_adr_type(TypeRawPtr::BOTTOM);
kit->set_predefined_output_for_runtime_call(ac, ac->in(TypeFunc::Memory), raw_adr_type);
// The clone copies the entire object content. Set all memory slices
// to the clone's output so that subsequent typed loads see the
// clone's writes. Using set_predefined_output_for_runtime_call with
// TypeRawPtr::BOTTOM would only update the raw slice, leaving typed
// per-instance slices (from escape analysis) pointing to the
// pre-clone memory state via the Initialize's NarrowMemProj.
kit->set_predefined_output_for_runtime_call(ac);
kit->set_all_memory_call(ac);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we can't use ac->connect_outputs(kit); here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connect_outputs() also handles IO and exception paths, which are not needed/used here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking on set_predefined_output_for_runtime_call(ac) and it is only do set_control() and set_all_memory_call(call) for your case. So you don't actually need call memory set again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, that is redundant. I'll remove it.

Comment thread src/hotspot/share/opto/arraycopynode.cpp Outdated
Copy link
Copy Markdown
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I will submit testing

Copy link
Copy Markdown
Member

@merykitty merykitty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for making the changes and answering my questions.

return false;
}

igvn->replace_node(out_mem->raw_out(0), mem);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I misparsed that.

*
* @run main/othervm -Xbatch -XX:+UseCompactObjectHeaders
* -XX:CompileCommand=compileonly,compiler.arraycopy.TestBlockArrayCopyMisaligned::test*
* compiler.arraycopy.TestBlockArrayCopyMisaligned
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @run main/othervm -Xbatch -XX:+UseCompactObjectHeaders
*                   -XX:CompileCommand=compileonly,${test.main.class}::test*
*                   ${test.main.class}

This should be better.

@openjdk openjdk Bot added the ready Pull request is ready to be integrated label Apr 21, 2026
@rwestrel
Copy link
Copy Markdown
Contributor

Does this only reproduce with product builds?
I could only reproduce it with a product build and was puzzled that compilation would be different with a debug build I found that this:

diff --git a/src/hotspot/share/opto/escape.cpp b/src/hotspot/share/opto/escape.cpp
index a05ad0ef99a..19bdb3bc389 100644
--- a/src/hotspot/share/opto/escape.cpp
+++ b/src/hotspot/share/opto/escape.cpp
@@ -4539,7 +4539,7 @@ void ConnectionGraph::split_unique_types(GrowableArray<Node *>  &alloc_worklist,
           const TypePtr* adr_type = proj->adr_type();
           const TypePtr* new_adr_type = tinst->add_offset(adr_type->offset());
           if (adr_type != new_adr_type && !init->already_has_narrow_mem_proj_with_adr_type(new_adr_type)) {
-            DEBUG_ONLY( uint alias_idx = _compile->get_alias_index(new_adr_type); )
+            uint alias_idx = _compile->get_alias_index(new_adr_type);
             assert(_compile->get_general_index(alias_idx) == _compile->get_alias_index(adr_type), "new adr type should be narrowed down from existing adr type");
             NarrowMemProjNode* new_proj = new NarrowMemProjNode(init, new_adr_type);
             igvn->set_type(new_proj, new_proj->bottom_type());

explains why (get_alias_index() has the side effect of allocating a new alias so if we don't perform the call then code that iterates over all aliases until num_alias_types() to fix the memory graph during EA skips the newly added NarrowMemProjNode. With this fixed, the test that's attached to the bug now passes.

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Apr 21, 2026

Does this only reproduce with product builds? I could only reproduce it with a product build and was puzzled that compilation would be different with a debug build I found that this:

diff --git a/src/hotspot/share/opto/escape.cpp b/src/hotspot/share/opto/escape.cpp
index a05ad0ef99a..19bdb3bc389 100644
--- a/src/hotspot/share/opto/escape.cpp
+++ b/src/hotspot/share/opto/escape.cpp
@@ -4539,7 +4539,7 @@ void ConnectionGraph::split_unique_types(GrowableArray<Node *>  &alloc_worklist,
           const TypePtr* adr_type = proj->adr_type();
           const TypePtr* new_adr_type = tinst->add_offset(adr_type->offset());
           if (adr_type != new_adr_type && !init->already_has_narrow_mem_proj_with_adr_type(new_adr_type)) {
-            DEBUG_ONLY( uint alias_idx = _compile->get_alias_index(new_adr_type); )
+            uint alias_idx = _compile->get_alias_index(new_adr_type);
             assert(_compile->get_general_index(alias_idx) == _compile->get_alias_index(adr_type), "new adr type should be narrowed down from existing adr type");
             NarrowMemProjNode* new_proj = new NarrowMemProjNode(init, new_adr_type);
             igvn->set_type(new_proj, new_proj->bottom_type());

explains why (get_alias_index() has the side effect of allocating a new alias so if we don't perform the call then code that iterates over all aliases until num_alias_types() to fix the memory graph during EA skips the newly added NarrowMemProjNode. With this fixed, the test that's attached to the bug now passes.

I just checked and I could not reproduce with debug build. Likewise, if I comment that line in debug, then it reproduces there, too. And yes, your change fixes the problem for me, too. That's a nasty side-effect...

Do you think the way that clone is currently wired up, feeding into the raw memory, instead of all memory, is correct, though? Because other arraycopy cases seems to set all memory. Or does it not matter?

@rwestrel
Copy link
Copy Markdown
Contributor

Do you think the way that clone is currently wired up, feeding into the raw memory, instead of all memory, is correct, though? Because other arraycopy cases seems to set all memory. Or does it not matter?

The wiring is correct AFAIU. There's a MemBar node following the ArrayCopy node. That MemBar should guarantee correct ordering.

But, the graph here after array copy expansion is also broken, because even with the patch above, the memory input of the LoadI ends up above that MemBar which is not correct even though, it appears to be harmless. The reason the memory input of the LoadI is allowed to be above the MemBar is that C2 treats loads from non escaping allocations specially wrt to MemBars. There used to be code ArrayCopyNode::may_modify() that would pattern match the expanded ArrayCopy shapes so a load from an array would not bypass an ArrayCopy that writes to that array but I reworked it with : https://github.com/openjdk/jdk/pull/23465/changes (and likely broke this particular case). The expectation now is that we would run into a MemBar that's flagged TrailingExpandedArrayCopy when we walk the memory graph after expansion. But we don't add one in this particular case MemBar. So I would suggest that as a fix:

diff --git a/src/hotspot/share/opto/macroArrayCopy.cpp b/src/hotspot/share/opto/macroArrayCopy.cpp
index 139775506db..6c8cb467d53 100644
--- a/src/hotspot/share/opto/macroArrayCopy.cpp
+++ b/src/hotspot/share/opto/macroArrayCopy.cpp
@@ -1248,6 +1248,21 @@ void PhaseMacroExpand::expand_arraycopy_node(ArrayCopyNode *ac) {
   MergeMemNode* merge_mem = nullptr;
 
   if (ac->is_clonebasic()) {
+    if (ac->_dest_type != TypeOopPtr::BOTTOM) {
+      MemBarNode* membar = ac->proj_out(TypeFunc::Control)->unique_ctrl_out()->as_MemBar();
+      MergeMemNode* mm = membar->in(TypeFunc::Memory)->as_MergeMem();
+      assert(mm->memory_at(Compile::AliasIdxRaw)->as_Proj()->in(0) == ac, "");
+      int mem_bar_alias_idx = C->get_alias_index(ac->_dest_type->add_offset(Type::OffsetBot)->is_ptr());
+      Node* ctl = membar->in(0);
+      Node* mem = mm;
+      insert_mem_bar(&ctl, &mem, Op_MemBarCPUOrder, mem_bar_alias_idx);
+      _igvn.replace_input_of(membar, 0, ctl);
+      _igvn.replace_input_of(membar, TypeFunc::Memory, mem);
+      assert(ctl->is_Proj(), "MemBar control projection");
+      assert(ctl->in(0)->isa_MemBar(), "MemBar node");
+      ctl->in(0)->isa_MemBar()->set_trailing_expanded_array_copy();
+    }
+
     BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2();
     bs->clone_at_expansion(this, ac);
     return;

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Apr 21, 2026

Do you think the way that clone is currently wired up, feeding into the raw memory, instead of all memory, is correct, though? Because other arraycopy cases seems to set all memory. Or does it not matter?

The wiring is correct AFAIU. There's a MemBar node following the ArrayCopy node. That MemBar should guarantee correct ordering.

But, the graph here after array copy expansion is also broken, because even with the patch above, the memory input of the LoadI ends up above that MemBar which is not correct even though, it appears to be harmless. The reason the memory input of the LoadI is allowed to be above the MemBar is that C2 treats loads from non escaping allocations specially wrt to MemBars. There used to be code ArrayCopyNode::may_modify() that would pattern match the expanded ArrayCopy shapes so a load from an array would not bypass an ArrayCopy that writes to that array but I reworked it with : https://github.com/openjdk/jdk/pull/23465/changes (and likely broke this particular case). The expectation now is that we would run into a MemBar that's flagged TrailingExpandedArrayCopy when we walk the memory graph after expansion. But we don't add one in this particular case MemBar. So I would suggest that as a fix:

Ok, great - thank you for your suggestions! Do you want to reassign the bug to yourself and propose a fix? Because what you are suggesting is totally disjoint from my changes, and it seems wrong that I am pushing your changes? Whatever you prefer, really.

@rwestrel
Copy link
Copy Markdown
Contributor

Ok, great - thank you for your suggestions! Do you want to reassign the bug to yourself and propose a fix? Because what you are suggesting is totally disjoint from my changes, and it seems wrong that I am pushing your changes? Whatever you prefer, really.

Ok. Let me assign it to myself.

@vnkozlov
Copy link
Copy Markdown
Contributor

Ok. Let me assign it to myself.

Thank you, @rwestrel

I assume you will have separate PR and we can close this one.

@rwestrel
Copy link
Copy Markdown
Contributor

I assume you will have separate PR and we can close this one.

Yes, I will do that.

@krk
Copy link
Copy Markdown

krk commented Apr 21, 2026

Is #30075 a potential solution to this as well?

@ysramakrishna
Copy link
Copy Markdown
Member

Ok, great - thank you for your suggestions! Do you want to reassign the bug to yourself and propose a fix? Because what you are suggesting is totally disjoint from my changes, and it seems wrong that I am pushing your changes? Whatever you prefer, really.

Ok. Let me assign it to myself.

Could you link the new PR here when you create it, @rwestrel. FWIW, the linked JBS ticket isn't visible at this time, as far as I can tell; not sure if that's intentional.

@rwestrel
Copy link
Copy Markdown
Contributor

Is #30075 a potential solution to this as well?

I think so. The issue from:
#30775 (comment)
needs to be fixed as well. I'll take care of that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org ready Pull request is ready to be integrated rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

7 participants