[core] Fix silent data corruption due to overflow in FieldProductAgg#7906
[core] Fix silent data corruption due to overflow in FieldProductAgg#7906ArnavBalyan wants to merge 1 commit into
Conversation
|
cc @JingsongLi thanks! :) |
|
cc @leaves12138 @JingsongLi can you pls review thanks! |
JingsongLi
left a comment
There was a problem hiding this comment.
Review of FieldProductAgg overflow detection
The intent is good — silent integer overflow during aggregation is indeed a correctness bug that corrupts data. The agg() fix for INTEGER/BIGINT using Math.multiplyExact is clean. The toExactByte/toExactShort helpers are correct since Java promotes byte * byte and short * short to int before the multiplication, so the int result can be range-checked without itself overflowing.
Issues
1. retract() method has the same overflow bug but is not fixed
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/aggregate/FieldProductAgg.java
The retract method performs division, which can also overflow silently in the same cases:
(byte) ((byte) Byte.MIN_VALUE / (byte) -1)→ result is 128, which wraps to -128 when cast to byte(short) ((short) Short.MIN_VALUE / (short) -1)→ result is 32768, wraps to -32768(int) Integer.MIN_VALUE / (int) -1→ wraps toInteger.MIN_VALUEin Java(long) Long.MIN_VALUE / (long) -1→ wraps toLong.MIN_VALUEin Java
If the goal of this PR is to eliminate silent data corruption from overflow, the retract() path should get the same treatment for consistency. At minimum, the TINYINT and SMALLINT cases need the same range-check pattern since the int promotion makes the overflow detectable. For INTEGER and BIGINT, you would need explicit checks (if (accumulator == Integer.MIN_VALUE && inputField == -1) etc.) since Java's / operator does not throw.
2. Exception message could include operand values for debuggability
When a compaction task fails with ArithmeticException("byte overflow"), the operator has no idea which values caused it. Consider including the operands:
throw new ArithmeticException(
String.format("byte overflow: %d * %d = %d", (byte) accumulator, (byte) inputField, value));This is especially important because this exception will surface during compaction, potentially long after the data was written, making reproduction difficult.
3. Consider the failure mode during compaction
An unchecked ArithmeticException thrown during compaction will fail the compaction task. Depending on the retry/error-handling policy, this could block compaction indefinitely for a partition that has overflowing values. This is arguably better than silent corruption, but it would be worth documenting this behavioral change (e.g., in the PR description or release notes) so users understand that previously-silent overflows will now cause task failures.
Minor notes
- The test cases are well chosen (boundary values, both positive and negative overflow). Good coverage.
- FLOAT/DOUBLE overflow to infinity is not addressed, which is fine — that is standard IEEE 754 semantics and expected for floating-point types.
FieldSumAgghas the same class of overflow bugs, but that is out of scope for this PR.
Overall this is a useful correctness fix. The main request is to apply the same protection to retract() for completeness, since a half-fix leaves an inconsistency where agg() detects overflow but retract() on the same type silently corrupts.
53057d1 to
0885945
Compare
|
Thanks for the review! Have addressed all comments:
|
Purpose
Tests