Skip to content

feat(cudf): GPU Decimal (Part 4)#17806

Open
simoneves wants to merge 7 commits into
facebookincubator:mainfrom
simoneves:simoneves/decimal_pr4
Open

feat(cudf): GPU Decimal (Part 4)#17806
simoneves wants to merge 7 commits into
facebookincubator:mainfrom
simoneves:simoneves/decimal_pr4

Conversation

@simoneves

@simoneves simoneves commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Decimal expression code refactoring and tidying moved from Part 3.

@bdice's comments moved from #16751.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 11, 2026
@netlify

netlify Bot commented Jun 11, 2026

Copy link
Copy Markdown

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d9ec8ac
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/6a2c7ee01c2afa0008cf0bbd

}
int sign = 1;
if (numerator < 0) {
numerator = -numerator;

@simoneves simoneves Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @bdice:
This is UB if numerator == INT128_MIN.

__int128_t scaled = numerator * scale;
__int128_t quotient = scaled / denom;
__int128_t remainder = scaled % denom;
if (remainder * 2 >= denom) {

@simoneves simoneves Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @bdice:
This may help guard against overflow.
Suggested change

  • if (remainder * 2 >= denom) {
  • if (remainder >= denom - remainder) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also per @bdice:
Just verifying: Is -1.5 supposed to round "half up" to -1 or "half away from zero" to -2? This does the latter. Please add a comment indicating the convention being followed.


template <typename OutT>
__device__ OutT
decimalDivideImpl(__int128_t numerator, __int128_t denom, __int128_t scale) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @bdice:
The parameter named scale is actually something like 10 ^ scale. We should name it more clearly, since scale is typically the exponent and not the value ten-to-the-exponent.

denom = -denom;
sign = -sign;
}
__int128_t scaled = numerator * scale;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @bdice:
This function should explicitly document that the rescaling has the potential for UB (overflows).

OutT* out;
__int128_t scale;

__device__ void operator()(int32_t idx) const {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @bdice:
As before, int32_t indicates a row index? Use cudf::size_type here and in the counting iterators if so.

int32_t aRescale,
rmm::cuda_stream_view stream) {
if (inType == cudf::type_id::DECIMAL64) {
if (outType == cudf::type_id::DECIMAL64) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @bdice:
We could use cudf::double_type_dispatcher here (potentially with a custom type map that only accepts numeric::decimal64 / numeric::decimal128), then use cudf::device_storage_type_t to recover the underlying types. It sounds like a lot of logic but it's reusable across the repeating dispatches for column/scalar variants.

@simoneves simoneves Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but not using a custom type map, as it's not clear to me how to do that with a double_type_dispatcher and not sure it would make the code any cleaner, as we'd still need to further validate the type combinations even if the custom map only allowed certain types in at all. Adding CUDF_UNREACHABLE in the other-type-combinations stub instead.


target_compile_options(velox_cudf_expression PRIVATE -Wno-missing-field-initializers)

set_target_properties(velox_cudf_expression PROPERTIES CUDA_STANDARD 20 CUDA_STANDARD_REQUIRED ON)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to root CMakeLists.txt (possible contentious)

Comment thread CMakeLists.txt
enable_language(CUDA)
# Use same C++ standard throughout
set(CMAKE_CUDA_STANDARD ${CMAKE_CXX_STANDARD})
set(CMAKE_CUDA_STANDARD_REQUIRED ${CMAKE_CXX_STANDARD_REQUIRED})

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this was deemed necessary. It might have been a leftover from Part 2, although Part 2 obviously works just fine without it. Should it still be moved here (from the CMakeLists.txt in cudf/expression)? Will it break Wave, or at least offend those devs?

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Selective Build Plan

Linux release with adapters is running a full build (changes touch velox/experimental/ or velox/external/). See the CI workflows README for what this means.


Selective build plan

@simoneves simoneves force-pushed the simoneves/decimal_pr4 branch 3 times, most recently from 65a3c66 to 95b71f3 Compare June 11, 2026 18:26
@github-actions github-actions Bot added the cudf cudf related - GPU acceleration label Jun 12, 2026
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

CI Failure Analysis

Auto-generated by the CI Failure Analysis workflow. This comment is updated in place each time CI fails on a new commit, so it always reflects the latest run — re-pushing or re-running CI will refresh the analysis below. Last updated 2026-06-12 23:06:08 UTC from workflow run 27444049307.

❌ Linux release with adapters — BUILD Failure View logs

Build errors:

Linker error when building velox_hive_connector_test — missing ORC reader factory symbols:

FAILED: velox/connectors/hive/tests/velox_hive_connector_test

ld: FileConnectorUtilTest.cpp.o: in function `FileConnectorUtilTest::TearDown()':
  undefined reference to `facebook::velox::orc::unregisterOrcReaderFactory()'

ld: FileConnectorUtilTest.cpp.o: in function `FileConnectorUtilTest::SetUp()':
  undefined reference to `facebook::velox::orc::registerOrcReaderFactory()'

ld: HiveConnectorUtilTest.cpp.o: in function `HiveConnectorUtilTest::TearDown()':
  undefined reference to `facebook::velox::orc::unregisterOrcReaderFactory()'

ld: HiveConnectorUtilTest.cpp.o: in function `HiveConnectorUtilTest::SetUp()':
  undefined reference to `facebook::velox::orc::registerOrcReaderFactory()'

collect2: error: ld returned 1 exit status

The velox_hive_connector_test binary links against test files (FileConnectorUtilTest.cpp, HiveConnectorUtilTest.cpp) that call registerOrcReaderFactory() / unregisterOrcReaderFactory(), but the ORC reader library (velox_dwio_orc_reader) is not included in the link dependencies.


Correlation with PR changes:

This failure is not caused by this PR. PR #17806 only modifies files under velox/experimental/cudf/expression/ (CUDA decimal kernels) and the top-level CMakeLists.txt (CUDA standard settings). It does not touch any Hive connector code, ORC reader code, or the Hive connector test CMakeLists.txt.

Known issues:

  • 🔗 Tracked by #17816 — "Link error for velox_hive_connector_test in Linux adapter CI"
  • This is a pre-existing failure on the main branch. The most recent main CI run (27445031257) also fails on the same job with the same error.
  • A fix was merged to main in commit 16d2e3b ("fix: Link ORC reader into Hive connector tests (fix: Link ORC reader into Hive connector tests #17817)"). Rebasing this PR onto the latest main should resolve this failure.

Recommended fix:

Rebase this PR branch onto the latest main to pick up the ORC reader link fix from commit 16d2e3b:

git fetch upstream && git rebase upstream/main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. cudf cudf related - GPU acceleration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants