Skip to content

Fix incorrect CPU float16 implementation#2395

Open
eliasachermann wants to merge 7 commits into
spcl:mainfrom
eliasachermann:fix/cpu-half-conversion
Open

Fix incorrect CPU float16 implementation#2395
eliasachermann wants to merge 7 commits into
spcl:mainfrom
eliasachermann:fix/cpu-half-conversion

Conversation

@eliasachermann

Copy link
Copy Markdown

Problem

The dace::half fallback in dace/runtime/include/dace/types.h is broken in two independent ways:

  1. No default constructor.
    Generated code that declares an uninitialized dace::float16 fails to compile:

    error: no matching function for call to 'dace::half::half()'
    
  2. Incorrect conversions.
    The float↔half conversion is wrong for zero, subnormals, Inf/NaN, and does no rounding.

Fix

  • Replace both conversions with IEEE-754 round-to-nearest-even routines, correctly handling zero, subnormals, overflow→Inf and Inf/NaN.
  • Add a default constructor.

Tests

Adds tests/half_cpu_test.py, to test the conversion against NumPy as the reference.

@eliasachermann eliasachermann changed the title correct CPU half (float16) conversion, add default constructor Fix incorrect CPU float16 implementation Jun 6, 2026
Comment thread tests/half_cpu_test.py Outdated
Remove print statement indicating all tests passed.
@ThrudPrimrose ThrudPrimrose self-requested a review June 8, 2026 16:18

@ThrudPrimrose ThrudPrimrose left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need subnormal, inf and NaN in roundtrip tests to ensure we handle them correctly.

Comment thread tests/half_cpu_test.py
1e-3,
6e-5,
6e-8,
-6e-8,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a subnormal number to your tests?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added inf and nan tests

@acalotoiu acalotoiu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ThrudPrimrose added a commit that referenced this pull request Jun 17, 2026
Merges PR #2395 (head e33d5b0) into
extended: replaces the buggy CPU half-precision conversion in
dace/runtime/include/dace/types.h with the round-to-nearest-even
implementation from the PR, and adds tests/half_cpu_test.py.

Conflict resolution: extended had clang-format-reformatted types.h, so
the PR's surrounding context conflicted. Kept extended's formatting and
swapped in only the corrected half struct (the PR's functional change).
half_cpu_test.py 2/2 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ahmadbelb

Copy link
Copy Markdown

@eliasachermann @ThrudPrimrose Quick follow up. On ARMv8.2-A targets built with the FP16 arithmetic extension (+fp16 / FEAT_FP16, which Apple Silicon has), _Float16 lowers to native fp16 instructions. So we could use it directly there and keep the software half as the fallback for x86 and anything without native fp16.

#elif (defined(__aarch64__) || defined(__arm64__)) && defined(__FLT16_MANT_DIG__)
    typedef std::complex<float> complex64;
    typedef std::complex<double> complex128;
    typedef _Float16 float16;   // native ARM hardware fp16
#else

@ThrudPrimrose Not sure if ARM is something you want to support here, but if it is I'm happy to open a
separate PR for this, or add it here if that's easier. What do you think?

@ThrudPrimrose

Copy link
Copy Markdown
Collaborator

@eliasachermann @ThrudPrimrose Quick follow up. On ARMv8.2-A targets built with the FP16 arithmetic extension (+fp16 / FEAT_FP16, which Apple Silicon has), _Float16 lowers to native fp16 instructions. So we could use it directly there and keep the software half as the fallback for x86 and anything without native fp16.

#elif (defined(__aarch64__) || defined(__arm64__)) && defined(__FLT16_MANT_DIG__)
    typedef std::complex<float> complex64;
    typedef std::complex<double> complex128;
    typedef _Float16 float16;   // native ARM hardware fp16
#else

@ThrudPrimrose Not sure if ARM is something you want to support here, but if it is I'm happy to open a separate PR for this, or add it here if that's easier. What do you think?

This would be a nice extension in my opinion.

Once this PR is merged, I would propose that you open a new PR on top of what has merged such that we have native fp16 support for corresponding ARM target. I also know that latest avx512 instruction sets also have native fp16, could be nice to also support them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants