Skip to content

fix(subq/cache): bypass subquery cache per event#35335

Open
stephenkgu wants to merge 36 commits into
mainfrom
fix/6984935627
Open

fix(subq/cache): bypass subquery cache per event#35335
stephenkgu wants to merge 36 commits into
mainfrom
fix/6984935627

Conversation

@stephenkgu
Copy link
Copy Markdown
Contributor

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Copilot AI review requested due to automatic review settings May 14, 2026 07:17
@stephenkgu stephenkgu requested review from a team and dapan1121 as code owners May 14, 2026 07:17
gemini-code-assist[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to make stream scalar subqueries re-evaluate per trigger event instead of reusing cached remote subquery results.

Changes:

  • Propagates stream-mode state through scalar/executor context.
  • Keeps REMOTE_VALUE nodes dispatchable in stream mode and bypasses remote-node cache.
  • Adds stream fetch worker initialization and a CI test entry.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/ci/cases.task Adds a new stream subquery per-event pytest case to CI.
include/libs/scalar/scalar.h Adds isStream to scalar extra info.
source/libs/scalar/inc/sclInt.h Adds isStream to scalar runtime context.
source/libs/scalar/src/scalar.c Handles REMOTE_VALUE as a value while preserving remote-node dispatch and propagates stream state.
source/libs/executor/src/executor.c Initializes thread-local scalar extra info with stream state.
source/libs/executor/src/executil.c Changes stream remote fetch/cache behavior and primary time-range handling for remote subquery conditions.
source/dnode/vnode/src/vnd/vnodeStream.c Initializes scalar extra info before vnode stream fetch execution.
source/dnode/mgmt/mgmt_mnode/src/mmWorker.c Initializes scalar extra info before mnode stream fetch execution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/ci/cases.task
Comment on lines +5271 to +5272
if (ctx->isStream) {
TAOS_CHECK_EXIT(fetchRemoteNodeImpl(ctx, subQIdx, pRes));
Comment on lines +4514 to 4517
// First-call empty result: keep node type as REMOTE_VALUE so the
// scalar walker re-dispatches sclWalkRemoteValue per evaluation.
pRemote->val.isNull = true;
pRemote->val.translate = true;
ctx.stream.pStreamRuntimeFuncInfo = pExtra->pStreamInfo;
ctx.stream.streamTsRange = pExtra->pStreamRange;
ctx.pSubJobCtx = pExtra->pSubJobCtx;
ctx.isStream = pExtra->isStream;
Comment thread source/libs/executor/src/executil.c
Copilot AI review requested due to automatic review settings May 14, 2026 09:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Comment on lines +4514 to 4518
// First-call empty result: keep node type as REMOTE_VALUE so the
// scalar walker re-dispatches sclWalkRemoteValue per evaluation.
pRemote->val.isNull = true;
pRemote->val.translate = true;
pRemote->val.flag &= (~VALUE_FLAG_VAL_UNSET);
Comment on lines +5271 to +5279
if (ctx->isStream) {
// Clear the slot before refetching so that an empty result from this
// event takes handleRemoteValueRes's first-call branch (which marks
// the placeholder NULL) instead of the "EOF after data" branch
// (which would silently retain the previous event's value).
if (ppRes) {
*ppRes = NULL;
}
TAOS_CHECK_EXIT(fetchRemoteNodeImpl(ctx, subQIdx, pRes));
Comment thread test/cases/18-StreamProcessing/02-Stream/test_stream_subquery_per_event.py Outdated
Comment thread test/cases/18-StreamProcessing/02-Stream/test_stream_subquery_per_event.py Outdated
if (ppRes) {
*ppRes = NULL;
}
TAOS_CHECK_EXIT(fetchRemoteNodeImpl(ctx, subQIdx, pRes));
Comment thread source/libs/executor/src/executil.c Outdated
Comment on lines +225 to +228
tdSql.execute("delete from inicio_descarga")
tdSql.execute(
"insert into linea_descarga values ('2026-05-01 00:00:03', 1)"
)
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 14, 2026 09:25
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Comment on lines +5273 to +5281
if (ctx->isStream) {
// Clear the slot before refetching so that an empty result from this
// event takes handleRemoteValueRes's first-call branch (which marks
// the placeholder NULL) instead of the "EOF after data" branch
// (which would silently retain the previous event's value).
if (ppRes) {
*ppRes = NULL;
}
TAOS_CHECK_EXIT(fetchRemoteNodeImpl(ctx, subQIdx, pRes));
Comment on lines +5273 to +5281
if (ctx->isStream) {
// Clear the slot before refetching so that an empty result from this
// event takes handleRemoteValueRes's first-call branch (which marks
// the placeholder NULL) instead of the "EOF after data" branch
// (which would silently retain the previous event's value).
if (ppRes) {
*ppRes = NULL;
}
TAOS_CHECK_EXIT(fetchRemoteNodeImpl(ctx, subQIdx, pRes));
Comment thread test/cases/18-StreamProcessing/02-Stream/test_stream_subquery_per_event.py Outdated
Comment thread test/cases/18-StreamProcessing/02-Stream/test_stream_subquery_per_event.py Outdated
Comment on lines +5273 to +5281
if (ctx->isStream) {
// Clear the slot before refetching so that an empty result from this
// event takes handleRemoteValueRes's first-call branch (which marks
// the placeholder NULL) instead of the "EOF after data" branch
// (which would silently retain the previous event's value).
if (ppRes) {
*ppRes = NULL;
}
TAOS_CHECK_EXIT(fetchRemoteNodeImpl(ctx, subQIdx, pRes));
Comment on lines +5273 to +5281
if (ctx->isStream) {
// Clear the slot before refetching so that an empty result from this
// event takes handleRemoteValueRes's first-call branch (which marks
// the placeholder NULL) instead of the "EOF after data" branch
// (which would silently retain the previous event's value).
if (ppRes) {
*ppRes = NULL;
}
TAOS_CHECK_EXIT(fetchRemoteNodeImpl(ctx, subQIdx, pRes));
ctx.stream.pStreamRuntimeFuncInfo = pExtra->pStreamInfo;
ctx.stream.streamTsRange = pExtra->pStreamRange;
ctx.pSubJobCtx = pExtra->pSubJobCtx;
ctx.isStream = pExtra->isStream;
Comment thread test/cases/18-StreamProcessing/02-Stream/test_stream_subquery_per_event.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 14, 2026 09:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

// (which would silently retain the previous event's value).
if (ppRes) {
*ppRes = NULL;
}
Comment on lines +4514 to 4518
// First-call empty result: keep node type as REMOTE_VALUE so the
// scalar walker re-dispatches sclWalkRemoteValue per evaluation.
pRemote->val.isNull = true;
pRemote->val.translate = true;
pRemote->val.flag &= (~VALUE_FLAG_VAL_UNSET);
Comment thread test/cases/18-StreamProcessing/02-Stream/test_stream_subquery_per_event.py Outdated
Comment on lines +5273 to +5281
if (ctx->isStream) {
// Clear the slot before refetching so that an empty result from this
// event takes handleRemoteValueRes's first-call branch (which marks
// the placeholder NULL) instead of the "EOF after data" branch
// (which would silently retain the previous event's value).
if (ppRes) {
*ppRes = NULL;
}
TAOS_CHECK_EXIT(fetchRemoteNodeImpl(ctx, subQIdx, pRes));
Comment thread test/cases/18-StreamProcessing/02-Stream/test_stream_subquery_per_event.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 14, 2026 09:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Comment thread test/cases/18-StreamProcessing/02-Stream/test_stream_subquery_per_event.py Outdated
Comment thread source/libs/executor/src/executil.c
Comment thread source/libs/executor/src/executil.c
Comment thread include/libs/scalar/scalar.h
Comment thread source/libs/scalar/inc/sclInt.h
Comment on lines +4538 to +4542
// Completion sentinel after a data row: do NOT mutate the AST node
// type to QUERY_NODE_VALUE. In stream mode the node must remain
// REMOTE_VALUE so the next per-event walker pass re-fires the fetch
// (otherwise every subsequent trigger event replays the literal
// captured on the first event).
…Null in setValueFromResBlock

- setValueFromResBlock: clear pRes->isNull so a NULL placeholder from an
  earlier empty stream-event subquery does not mask a later non-NULL value.
- sclInitParam REMOTE_VALUE_LIST: in stream mode free cached hash filter
  and re-arm VALUELIST_FLAG_VAL_UNSET so each trigger re-evaluates the IN
  list.
- sclInitParam REMOTE_ROW: in stream mode force valSet=false so each
  trigger re-fetches the row subquery (e.g. > ANY rewrites).
- Tests cover IN-list, ROW, and the empty->non-empty WHERE transition.
The previous tdSql.checkResultsByFunc call could return on the pre-event
3-row snapshot, missing a stale (1,1) row emitted seconds later. Replace
with a bounded poll that waits for either rows>=4 or a 60s deadline,
then asserts the stale value is absent. Verified the test still catches
the regression by reverting the qFetchRemoteNode slot-clear.
Previously when a stream WHERE cond contained a remote subquery, all
primary-key range pruning was disabled.  Now still call filterGetTimeRange
to extract independent literal bounds (e.g. `ts >= '2026-05-01' AND ts <
'2026-05-02' AND ts >= (SELECT ...)`) for scan pruning while keeping
the residual remote predicate; isStrict=false ensures the cond stays in
pConditions for per-event re-evaluation.

This comment was marked as outdated.

Copilot AI review requested due to automatic review settings May 15, 2026 07:23
stephenkgu and others added 4 commits May 15, 2026 00:24
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
unreachable; non-stream EOF is handled below at the post-return path.
Drop the dead conditional and clarify the comment.

This comment was marked as outdated.

Streams call this per trigger event; for VAR/DECIMAL types datum.p
owns a heap buffer (ownership transferred from the response block via
nodesSetValueNodeValueExt with needFree=false). Without freeing the
previous buffer first, every event leaks one allocation in stream
REMOTE_VALUE / REMOTE_ROW subqueries with string/binary results.
filterGetTimeRange aborts collection when any AND-child is uncollectable
(e.g. ts >= remote-subquery), so passing the original cond returned a
full window and disabled scan pruning even when independent literal
bounds were present.  Clone the cond, drop remote-bearing children from
the AND, then call filterGetTimeRange so literal bounds in
`ts >= '2026-05-01' AND ts < '2026-05-02' AND ts >= (SELECT ...)` are
extracted; isStrict stays false so the residual remote predicate is
evaluated per event.
gemini-code-assist[bot]

This comment was marked as outdated.

Copilot AI review requested due to automatic review settings May 15, 2026 08:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment thread test/cases/18-StreamProcessing/02-Stream/test_stream_subquery_per_event.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 15, 2026 08:36

This comment was marked as outdated.

stephenkgu and others added 3 commits May 15, 2026 01:52
initQueryTableDataCond seeds pCond->twindows from scanRange/winRange
before calling getPrimaryTimeRange.  The previous stream branch reset
that to TSWINDOW_INITIALIZER (full scan) when no static bound could be
extracted, widening the scan past the trigger window.  Save the incoming
window, intersect any extracted literal range with it, and fall back to
the original on extraction failure.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 15, 2026 08:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment on lines +618 to +626
deadline = time.monotonic() + 1.0
while True:
tdSql.checkResultsByFunc(
sql=f"select total from {self.db}.r order by ts",
func=expected_rows,
)
if time.monotonic() >= deadline:
break
time.sleep(0.1)
Event 3 empties the whitelist.  check3 ran without waiting, so insert4
could re-populate the whitelist before the stream evaluated event 3's
subquery.  The stream then saw id=1 and produced a stale SUM=30 row.

Fix: replace the two conflicting check3 methods with one that polls for
15 s while asserting rows stays at 2.  This both detects a stale row
immediately and ensures event 3 is processed with an empty whitelist
before insert4 runs.  Simplify check4 accordingly: it now just waits
for the correct (10, 30, 10) sequence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants