Commit a6e8a31
authored
Rollup merge of #151611 - bonega:improve-is-slice-is-ascii-performance, r=folkertdev
Improve is_ascii performance on x86_64 with explicit SSE2 intrinsics
# Summary
Improves `slice::is_ascii` performance for SSE2 target roughly 1.5-2x on larger inputs.
AVX-512 keeps similiar performance characteristics.
This is building on the work already merged in #151259.
In particular this PR improves the default SSE2 performance, I don't consider this a temporary fix anymore.
Thanks to @folkertdev for pointing me to consider `as_chunk` again.
# The implementation:
- Uses 64-byte chunks with 4x 16-byte SSE2 loads OR'd together
- Extracts the MSB mask with a single `pmovmskb` instruction
- Falls back to usize-at-a-time SWAR for inputs < 64 bytes
# Performance impact (vs before #151259):
- AVX-512: 34-48x faster
- SSE2: 1.5-2x faster
<details>
<summary>Benchmark Results (click to expand)</summary>
Benchmarked on AMD Ryzen 9 9950X (AVX-512 capable). Values show relative performance (1.00 = fastest).
Tops out at 139GB/s for large inputs.
### early_non_ascii
| Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
|------------|------------|----------|------------|----------|
| 64 | 1.01 | **1.00** | 13.45 | 1.13 |
| 1024 | 1.01 | **1.00** | 13.53 | 1.14 |
| 65536 | 1.01 | **1.00** | 13.99 | 1.12 |
| 1048576 | 1.02 | **1.00** | 13.29 | 1.12 |
### late_non_ascii
| Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
|------------|------------|----------|------------|----------|
| 64 | **1.00** | 1.01 | 13.37 | 1.13 |
| 1024 | 1.10 | **1.00** | 42.42 | 1.95 |
| 65536 | **1.00** | 1.06 | 42.22 | 1.73 |
| 1048576 | **1.00** | 1.03 | 34.73 | 1.46 |
### pure_ascii
| Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
|------------|------------|----------|------------|----------|
| 4 | 1.03 | **1.00** | 1.75 | 1.32 |
| 8 | **1.00** | 1.14 | 3.89 | 2.06 |
| 16 | **1.00** | 1.04 | 1.13 | 1.62 |
| 32 | 1.07 | 1.19 | 5.11 | **1.00** |
| 64 | **1.00** | 1.13 | 13.32 | 1.57 |
| 128 | **1.00** | 1.01 | 19.97 | 1.55 |
| 256 | **1.00** | 1.02 | 27.77 | 1.61 |
| 1024 | **1.00** | 1.02 | 41.34 | 1.84 |
| 4096 | 1.02 | **1.00** | 45.61 | 1.98 |
| 16384 | 1.01 | **1.00** | 48.67 | 2.04 |
| 65536 | **1.00** | 1.03 | 43.86 | 1.77 |
| 262144 | **1.00** | 1.06 | 41.44 | 1.79 |
| 1048576 | 1.02 | **1.00** | 35.36 | 1.44 |
</details>
## Reproduction / Test Projects
Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation
- `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison
- `fuzz/` - Compares old/new implementations with libfuzzer
Relates to: llvm/llvm-project#1769062 files changed
Lines changed: 25 additions & 43 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
460 | 460 | | |
461 | 461 | | |
462 | 462 | | |
463 | | - | |
| 463 | + | |
464 | 464 | | |
465 | | - | |
| 465 | + | |
466 | 466 | | |
467 | | - | |
468 | | - | |
469 | | - | |
470 | | - | |
471 | 467 | | |
| 468 | + | |
472 | 469 | | |
473 | 470 | | |
474 | 471 | | |
475 | | - | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
480 | | - | |
481 | | - | |
482 | | - | |
483 | | - | |
484 | | - | |
485 | | - | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
492 | | - | |
493 | | - | |
494 | | - | |
495 | | - | |
496 | | - | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
497 | 488 | | |
498 | 489 | | |
499 | 490 | | |
500 | | - | |
501 | | - | |
502 | | - | |
503 | | - | |
504 | | - | |
505 | | - | |
506 | | - | |
507 | | - | |
508 | | - | |
509 | | - | |
510 | 491 | | |
511 | 492 | | |
512 | | - | |
| 493 | + | |
| 494 | + | |
513 | 495 | | |
514 | 496 | | |
515 | 497 | | |
| |||
529 | 511 | | |
530 | 512 | | |
531 | 513 | | |
532 | | - | |
| 514 | + | |
533 | 515 | | |
534 | 516 | | |
535 | 517 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | | - | |
| 16 | + | |
18 | 17 | | |
19 | 18 | | |
20 | | - | |
21 | 19 | | |
22 | 20 | | |
23 | 21 | | |
24 | 22 | | |
| 23 | + | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
0 commit comments