Add SIMD optimization for int_to_float conversion#580
Merged
veluca93 merged 14 commits intolibjxl:mainfrom Jan 21, 2026
Merged
Conversation
Benchmark @ c1f5321Comparing: 47e5c029 (Base) vs 37f8a2a9 (PR)
|
veluca93
reviewed
Dec 22, 2025
veluca93
reviewed
Jan 17, 2026
7d1f674 to
4c8acef
Compare
veluca93
reviewed
Jan 20, 2026
veluca93
reviewed
Jan 20, 2026
veluca93
reviewed
Jan 20, 2026
veluca93
reviewed
Jan 20, 2026
veluca93
reviewed
Jan 20, 2026
veluca93
reviewed
Jan 20, 2026
veluca93
reviewed
Jan 20, 2026
veluca93
reviewed
Jan 20, 2026
9dec3f8 to
a83d955
Compare
a83d955 to
7aafdb8
Compare
Add SIMD fast paths for converting custom bit-depth floats to f32: - 32-bit float passthrough: Simple bitcast using SIMD - 16-bit float (f16/half-precision): SIMD conversion with scalar fallback for subnormal values The 16-bit float SIMD path handles normal, zero, and inf/nan cases directly, falling back to scalar for the rare subnormal case which requires variable-iteration normalization. Also adds BitDepth::f16() test helper and comprehensive unit tests for the conversion functions.
Address veluca93 review: add load_f16_bits() and store_f16() methods to F32SimdVec trait instead of implementing conversion in convert.rs. - AVX2+F16C: Hardware _mm256_cvtph_ps/_mm256_cvtps_ph - AVX-512: Hardware _mm512_cvtph_ps/_mm512_cvtps_ph - SSE4.2/NEON/Scalar: Scalar fallback Simplifies convert.rs by ~100 lines.
- AVX: Always require f16c for AVX2 path (removes runtime check) - AVX512: Restructure inner functions to not be unsafe, only wrap memory operations in unsafe blocks with SAFETY comments - NEON: Use inline ASM for f16 conversion (fcvtl/fcvtn) since stdarch incorrectly requires fp16 feature for basic conversion - Add f16 type module to jxl_simd and use it instead of u16/standalone functions throughout the crate
…mainder - Add I32Vec::store_u16() method to extract lower 16 bits from each i32 lane and store as u16 values, implemented for all SIMD backends - Remove scalar remainder handling in int_to_float functions since render pipeline buffers are always padded to SIMD width - Use div_ceil pattern consistent with other SIMD functions in convert.rs
The method takes &mut [u16] (raw bits), so the name should match load_f16_bits for consistency.
The SIMD conversion functions were using chunks_exact() which only processes complete SIMD vectors, leaving remainder elements unprocessed. This caused test failures when the row size wasn't divisible by the SIMD width (e.g., 244 pixels with AVX2 width of 8). Fix by adding scalar fallback loops to handle remainder elements for both 32-bit float passthrough and 16-bit float conversion paths. Also use const assert to verify the buffer size assumption at compile time rather than runtime.
c673732 to
ec6ae4b
Compare
f2c096b to
2b41c51
Compare
veluca93
approved these changes
Jan 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SIMD fast paths for the
int_to_floatfunction which converts custom bit-depth floats stored as i32 back to f32.32-bit float: straightforward bitcast via SIMD.
16-bit float (f16): SIMD handles normal values, zeros, and inf/nan. Subnormals fall back to scalar since they need a variable-iteration normalization loop.
Waiting for perf CI to see the impact.