8380060: C2: Wrong execution with COH and arraycopy#30775
8380060: C2: Wrong execution with COH and arraycopy#30775rkennke wants to merge 8 commits intoopenjdk:masterfrom
Conversation
|
/contributor add chagedorn |
|
/contributor add Luca Ardueser, BSI Software AG |
|
👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into |
|
@rkennke This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 111 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@rkennke |
|
@rkennke Syntax:
User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address. |
|
The total number of required reviews for this PR has been set to 2 based on the presence of this label: |
|
Just a drive-by comment: I don't really understand. The comment talks about the possibility that
An @rwestrel recently work with this, I believe he should take a look and/or have a better idea. |
|
@rkennke |
Right. The dest_initialized check did nothing. That path is only ever taken when dest_initialized=true to begin with, meaning the change disabled the pick-off load/store wholesale. And it is not the right fix. The real problem (as far as my limited understanding of C2 goes) is that the preceding clone/arraycopy only feeds into the 'raw' memory slice. The pick-off load/store would (via complicated ways) not see this, but instead see the initialize node, and get turned into constant 0. I am not sure why regular Java-level loads are not affected by this. Maybe a lucky coincidence? The trailing membar after clone might also have helped? Anyhow, the real fix seems to be that the AC in clone needs to update all memory slices, not just raw. This is also what all other occurances of ArrayCopyNode do (e.g. copyOf/copyOfRange/arraycopy). See updated proposed change. It fixes the test-case for me. I'm currently running tier1 tests... |
Webrevs
|
| // Expected pattern: out_mem → MergeMem → MemBar, or | ||
| // out_mem → MemBar (when MergeMem was folded away). | ||
| Node* mem_use = out_mem->raw_out(0); | ||
| if (out_mem->outcnt() != 1 || |
There was a problem hiding this comment.
This De Morgan is quite hard to grasp, do you think !(mem_use->is_MemBar() || (mem_use->is_MergeMem() && mem_use->outcnt() == 1 && mem_use->raw_out(0)->is_MemBar()) would be a little easier to read? Not so sure tbh.
There was a problem hiding this comment.
I agree, it is easier to read when it's mostly positive tests, with a single leading negation. Also, I am pulling out the outcnt == 1 test to before we actually fetch out_mem->raw_out(0), while that should be safe, it looks brittle and is definitely safer if checked before.
| return false; | ||
| } | ||
|
|
||
| igvn->replace_node(out_mem->raw_out(0), mem); |
There was a problem hiding this comment.
I don't quite follow this piece, why do we try to replace only the first output of out_mem, what happens if out_mem has multiple outputs? If out_mem should only have 1 output, then an assert seems more suitable.
There was a problem hiding this comment.
I think the assumption is that out_mem only has one use. We check that in the big expression above and it was pre-existing. I also think this should be an assertion, but I'm a little hesitant tbh.
There was a problem hiding this comment.
ah wait, I think the assertion there already answers this: the known case where the expectation outcnt == 1 might not hold true is with card-marking, and we bail out and not decompose the clone to begin with.
| return phase->C->top(); | ||
| } | ||
|
|
||
| bool ArrayCopyNode::finish_transform(PhaseGVN *phase, bool can_reshape, |
There was a problem hiding this comment.
It would be helpful if there is a description about what is mem, and what memory states it captures.
There was a problem hiding this comment.
I'm adding a comment.
|
@vnkozlov Can you clarify? The fix looks pretty correct to me. |
The original fix was not right, as @merykitty pointed out last week, but I think this new fix should be mostly correct. It'd be good if @rwestrel would look at it anyway. |
| const TypePtr* raw_adr_type = TypeRawPtr::BOTTOM; | ||
| ac->set_adr_type(TypeRawPtr::BOTTOM); | ||
| kit->set_predefined_output_for_runtime_call(ac, ac->in(TypeFunc::Memory), raw_adr_type); | ||
| // The clone copies the entire object content. Set all memory slices | ||
| // to the clone's output so that subsequent typed loads see the | ||
| // clone's writes. Using set_predefined_output_for_runtime_call with | ||
| // TypeRawPtr::BOTTOM would only update the raw slice, leaving typed | ||
| // per-instance slices (from escape analysis) pointing to the | ||
| // pre-clone memory state via the Initialize's NarrowMemProj. | ||
| kit->set_predefined_output_for_runtime_call(ac); | ||
| kit->set_all_memory_call(ac); |
There was a problem hiding this comment.
Why we can't use ac->connect_outputs(kit); here?
There was a problem hiding this comment.
connect_outputs() also handles IO and exception paths, which are not needed/used here?
There was a problem hiding this comment.
Looking on set_predefined_output_for_runtime_call(ac) and it is only do set_control() and set_all_memory_call(call) for your case. So you don't actually need call memory set again.
There was a problem hiding this comment.
Correct, that is redundant. I'll remove it.
vnkozlov
left a comment
There was a problem hiding this comment.
Good. I will submit testing
merykitty
left a comment
There was a problem hiding this comment.
Looks good to me, thanks for making the changes and answering my questions.
| return false; | ||
| } | ||
|
|
||
| igvn->replace_node(out_mem->raw_out(0), mem); |
| * | ||
| * @run main/othervm -Xbatch -XX:+UseCompactObjectHeaders | ||
| * -XX:CompileCommand=compileonly,compiler.arraycopy.TestBlockArrayCopyMisaligned::test* | ||
| * compiler.arraycopy.TestBlockArrayCopyMisaligned |
There was a problem hiding this comment.
* @run main/othervm -Xbatch -XX:+UseCompactObjectHeaders
* -XX:CompileCommand=compileonly,${test.main.class}::test*
* ${test.main.class}
This should be better.
|
Does this only reproduce with product builds? explains why ( |
I just checked and I could not reproduce with debug build. Likewise, if I comment that line in debug, then it reproduces there, too. And yes, your change fixes the problem for me, too. That's a nasty side-effect... Do you think the way that clone is currently wired up, feeding into the raw memory, instead of all memory, is correct, though? Because other arraycopy cases seems to set all memory. Or does it not matter? |
The wiring is correct AFAIU. There's a But, the graph here after array copy expansion is also broken, because even with the patch above, the memory input of the |
Ok, great - thank you for your suggestions! Do you want to reassign the bug to yourself and propose a fix? Because what you are suggesting is totally disjoint from my changes, and it seems wrong that I am pushing your changes? Whatever you prefer, really. |
Ok. Let me assign it to myself. |
Thank you, @rwestrel I assume you will have separate PR and we can close this one. |
Yes, I will do that. |
|
Is #30075 a potential solution to this as well? |
Could you link the new PR here when you create it, @rwestrel. FWIW, the linked JBS ticket isn't visible at this time, as far as I can tell; not sure if that's intentional. |
I think so. The issue from: |
C2's generate_block_arraycopy miscompiles when compact object headers are enabled. With compact headers, arrayOopDesc::base_offset_in_bytes(T_BYTE) returns 12 instead of 16. Since 12 is not 8-byte aligned, the block arraycopy optimization picks off an initial 32-bit word via a typed LoadI/StoreI pair before switching to a long-word copy.
The LoadI reads from the source array using the typed memory slice. When the source array was initialized by a prior arraycopy (e.g. Arrays.copyOfRange optimized to a clone), that arraycopy wrote to raw memory. The typed memory slice does not reflect these raw writes. The IGVN folds the typed load to zero, corrupting the first 4 bytes of the copied array.
The initial-word pick-off path is only reached when the array data offset is not 8-byte aligned. Without compact headers the offset is 16 (aligned), so this path is never taken through standard library code. With compact headers it is always taken for byte/boolean/short/char/int/float arrays copied via tightly-coupled allocations.
The proposed fix is to make the clone AC update all memory slices, not just raw. This is already done for all other places where we create an ArrayCopyNode.
I also turned the Test.java test into a jtreg test (with some improvements to cover more scenarios).
Special thanks to Luca Ardueser, BSI Software AG who found and reported the issue, and helped with initial investigation and came up with a test-case.
Testing:
Progress
Issue
Reviewers
Contributors
<chagedorn@openjdk.org>Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30775/head:pull/30775$ git checkout pull/30775Update a local copy of the PR:
$ git checkout pull/30775$ git pull https://git.openjdk.org/jdk.git pull/30775/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 30775View PR using the GUI difftool:
$ git pr show -t 30775Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30775.diff
Using Webrev
Link to Webrev Comment