Redesign intrinsic-test to use simple comparison#2063
Redesign intrinsic-test to use simple comparison#2063sayantn wants to merge 16 commits intorust-lang:mainfrom
intrinsic-test to use simple comparison#2063Conversation
feb1dcd to
6ef8b8f
Compare
This seems weird ( unsafe extern "C" {
fn vdup_n_f16_wrapper(value: f16) -> float16x4_t;
}In fact most Edit: To work around this issue I have modified the tool to communicate with C via pointers (e.g. the C wrapper for @Amanieu is this intended behavior or a bug? |
e2346ff to
db1b2ca
Compare
|
Btw the time gains are significant, it reduces the Arm and aarch64 times to 2-3 minutes, and the full x86 run (we did 20% previously) to around 12 mins for release and 17 mins for dev |
|
Great work. Quick sanity check on |
|
@folkertdev ooh, that makes sense. I don't particularly care about windows, but we are using LLVM20 in the CI. I can change it to use the build from kernel.org |
|
I'm seeing |
|
yeah, but I can use the LLVM github builds or the kernel.org builds |
ce53e81 to
76dd339
Compare
|
Can f16 tests just be gated with |
|
@tgross35 the f16 tests are mostly fine now. More concerning is that a lot of tests are failing in all 3 arm archs, e.g. edit: sorry, my mistake, they are still failing in ARMv7. I will gate them against the flag |
|
With LLVM 22 |
|
FTZ/DAZ-related perhaps? |
I don't really think so, the outputs seem completely distinct. I noticed that use core::arch::aarch64::*;
#[unsafe(no_mangle)]
#[target_feature(enable = "neon")]
pub unsafe extern "C" fn foo(dst: *mut uint8x16x2_t, a: *const uint8x16_t, b: *const uint8x16_t) {
unsafe {
*dst = vzipq_u8(*a, *b);
}
}produces foo:
ld1 { v0.16b }, [x1]
ld1 { v1.16b }, [x2]
add x8, x0, #16
zip1 v2.16b, v0.16b, v1.16b
zip2 v0.16b, v0.16b, v1.16b
st1 { v2.16b }, [x0]
st1 { v0.16b }, [x8]
retBut the C code seemingly has different behavior on GCC and clang https://godbolt.org/z/T3YnrejjG @adamgemmell can you help in this? |
|
I'm not sure it will fix your issue but the difference in instructions comes from the fact that in arm_neon.h, they reverse every vector before and after the operation on big endian. It's not always actually necessary so we only do it if it's broken without it - however, the intrinsic test tool doesn't detect the difference in behaviour because both arguments it picks are identical. e.g.: |
|
You can try adding Also I don't actually see vzipq_u8 on the latest CI run, why is that? |
I have no idea, I can confirm that locally the test is generated and run.
I will check. Thanks Edit: @adamgemmell adding Edit2: sorry, |
|
None of the unsigned variants of vzipq seem to be seen there, weird. I'd quite like to know why this patch detects the difference - when I looked locally the codegen of the tests seemed very similar |
|
Yeah I fixed the test not being included, I used |
This comment has been minimized.
This comment has been minimized.
a057d30 to
2dfa840
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
17f5a39 to
bb03cef
Compare
This comment has been minimized.
This comment has been minimized.
0316c79 to
76c3d5a
Compare
This comment has been minimized.
This comment has been minimized.
bf75e04 to
20ac12e
Compare
This reverts commit 5f676a5.
This enables the `big_endian_inverse` transformation to apply to some of these intrinsics only when appropriate.
…sted specifically
Instead of doing an `as {i,u}16` cast, we can use the same LLVM
intrinsics as when converting to `{i,u}{32,64}`, which is what Clang does
and ensures the intrinsic result matches.
Currently
intrinsic-testprints the outputs and then compares the outputs manually. This PR uses a different approach -- generate C wrappers for the intrinsics, link to them from Rust, and then just use simple rust tests to compare outputsBlocked on
#2095#2096#2099#2100#2101#2105#2106#2107#2108#2111 #2112edit: I won't split this any further, so after #2106, #2111 and #2112 merges, I will mark this ready for review