dmchen/try-long-startup-in-2-replica#35307
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a recovery issue in dual-replica vgroups where the system prematurely exits the ASSIGNED_LEADER state before the follower has caught up, causing write stalls. The changes introduce a lag threshold (syncLogLagThreshold) and progress logging to ensure followers are sufficiently synced before returning to normal leader mode. Feedback identifies a critical regression where the new lag check incorrectly affects 3-replica groups, potentially causing them to report as unsynced if one follower is slow. Additionally, the catchup logging logic needs refinement to prevent log flooding in high-throughput scenarios caused by frequent lag updates.
| bool isSync = false; | ||
| int64_t maxLag = 0; | ||
| SyncIndex minMatchIndex = SYNC_INDEX_INVALID; | ||
| bool snapshotSending = syncNodeSnapshotSending(pSyncNode); | ||
|
|
||
| bool leaderBaseReady = pSyncNode->commitIndex >= pSyncNode->assignedCommitIndex; | ||
| if (leaderBaseReady) { | ||
| if (pSyncNode->peersNum <= 0) { | ||
| isSync = true; | ||
| } else { | ||
| maxLag = syncNodeGetMaxFollowerLag(pSyncNode, &minMatchIndex); | ||
| isSync = (maxLag <= syncNodeGetLagThreshold()); | ||
| if (snapshotSending && maxLag > syncNodeGetLagThreshold()) { | ||
| isSync = false; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The current implementation applies the lag threshold check to all states, including the normal LEADER state. This introduces a regression for 3-replica groups: if one follower is down or slow, syncCheckSynced will now return NOT_SYNCED, whereas previously it would return SUCCESS as long as the leader was healthy. This violates the assumption that 3-replica behavior remains unchanged. The lag check should be restricted to the ASSIGNED_LEADER state to ensure it only affects the recovery transition of 2-replica groups. Additionally, the explicit check for snapshotSending at line 865 is redundant because the maxLag check already covers that condition.
bool isSync = false;
int64_t maxLag = 0;
SyncIndex minMatchIndex = SYNC_INDEX_INVALID;
bool snapshotSending = syncNodeSnapshotSending(pSyncNode);
bool leaderBaseReady = pSyncNode->commitIndex >= pSyncNode->assignedCommitIndex;
if (leaderBaseReady) {
if (pSyncNode->peersNum > 0) {
maxLag = syncNodeGetMaxFollowerLag(pSyncNode, &minMatchIndex);
if (pSyncNode->state == TAOS_SYNC_STATE_ASSIGNED_LEADER) {
isSync = (maxLag <= syncNodeGetLagThreshold());
} else {
isSync = true;
}
} else {
isSync = true;
}
}| if (pSyncNode->catchupLastLag != maxLag) { | ||
| shouldLog = true; | ||
| } |
There was a problem hiding this comment.
The condition pSyncNode->catchupLastLag != maxLag will cause log flooding in high-throughput scenarios. Since the maxLag (difference between commitIndex and matchIndex) changes with almost every write operation, this condition effectively bypasses the intended throttling provided by syncCatchupLogIntervalMs. This could lead to excessive disk I/O and log file growth.
if (pSyncNode->catchupLastLogMs == 0 || nowMs - pSyncNode->catchupLastLogMs >= intervalMs) {
shouldLog = true;
}
if (isSync && pSyncNode->catchupLastLag > syncNodeGetLagThreshold()) {
shouldLog = true;
}There was a problem hiding this comment.
Pull request overview
This PR addresses long write unavailability during dual-replica recovery by refining “synced” determination to account for follower catch-up lag, adding catch-up diagnostics/log throttling, and introducing config knobs for tuning.
Changes:
- Update
syncCheckSynced()to require follower lag within a configurable threshold (with snapshot-aware logic) and emit periodic catch-up progress logs. - Allow
ASSIGNED_LEADERto pass leader-state checks for arbiter sync validation paths. - Add global server configs
syncLogLagThresholdandsyncCatchupLogIntervalMs, plus feature design/spec documentation.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| TD-xxx-try-long-startup-in-2-replica | Adds a one-line task/issue note at repo root |
| source/libs/sync/src/syncMain.c | Implements lag-threshold-based sync check + catch-up progress logging |
| source/libs/sync/inc/syncInt.h | Adds per-node fields for catch-up log throttling/diagnostics |
| source/dnode/vnode/src/vnd/vnodeSvr.c | Treats ASSIGNED_LEADER as acceptable for arbiter sync-check handling |
| source/dnode/mnode/impl/src/mndArbGroup.c | Adjusts assigned-leader handling when member token changes via heartbeat |
| source/common/src/tglobal.c | Introduces and wires new server config items for lag threshold and log interval |
| include/common/tglobal.h | Declares new global config variables |
| docs/specs/001-fix-dual-replica-sync/tasks.md | Adds task breakdown for the feature |
| docs/specs/001-fix-dual-replica-sync/spec.md | Adds feature specification |
| docs/specs/001-fix-dual-replica-sync/research.md | Adds research/decision record |
| docs/specs/001-fix-dual-replica-sync/quickstart.md | Adds validation quickstart |
| docs/specs/001-fix-dual-replica-sync/plan.md | Adds implementation plan |
| docs/specs/001-fix-dual-replica-sync/data-model.md | Adds data model notes |
| docs/specs/001-fix-dual-replica-sync/contracts/internal-sync-check-contract.md | Adds internal behavior contract |
| docs/specs/001-fix-dual-replica-sync/CLARIFICATION_REPORT.md | Adds clarification workflow report |
| docs/specs/001-fix-dual-replica-sync/checklists/requirements.md | Adds requirements checklist |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mndArbGroupResetAssignedLeader(pNewGroup); | ||
| resetAssigned = true; | ||
| tstrncpy(pNewGroup->assignedLeader.token, pRspMember->memberToken, TSDB_ARB_TOKEN_SIZE); | ||
| pNewGroup->assignedLeader.assignAcked = true; |
| @@ -0,0 +1 @@ | |||
| /speckit.specify 分析项目的代码,理清现有双副本实现。目前双副本实现有个问题,就是当其中一个副本宕掉,vgroup变成单副本运行,这时写入大量数据,然后宕掉的副本起来,vgroup又变回双副本,这时vgroup会长时间无法进入到正常状态,估计是刚起来的副本在追数据。请分析这个问题,并且修复这个问题。 No newline at end of file | |||
| ```bash | ||
| cd /root/github/taosdata/TDinternal | ||
| cmake -S . -B build | ||
| cmake --build build -j | ||
| ``` |
| - [ ] T001 校验构建与测试基线于 `/root/github/taosdata/TDinternal/build`(`cmake --build` 与 `ctest --output-on-failure`) | ||
| - [X] T002 创建 sync 单测目录与初始化文件于 `/root/github/taosdata/TDinternal/community/tests/pytest/sync/` | ||
| - [X] T003 [P] 创建双副本系统测试目录于 `/root/github/taosdata/TDinternal/community/tests/system-test/2-query/dual_replica/` | ||
| - [X] T004 [P] 对齐 quickstart 执行入口说明于 `/root/github/taosdata/TDinternal/specs/001-fix-dual-replica-sync/quickstart.md` | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 2: Foundational (Blocking Prerequisites) | ||
|
|
||
| **Purpose**: 建立所有用户故事共享的判定/配置/诊断基础能力 | ||
|
|
||
| **⚠️ CRITICAL**: 本阶段完成前,不进入任何用户故事实现 | ||
|
|
||
| - [X] T005 在 `/root/github/taosdata/TDinternal/community/source/libs/sync/src/syncMain.c` 增加 follower lag 统一计算辅助逻辑(leaderCommitIndex、matchIndex、lag) | ||
| - [X] T006 [P] 在 `/root/github/taosdata/TDinternal/community/source/libs/sync/src/syncMain.c` 增加 `syncLogLagThreshold` 默认值、读取与非法值回退 | ||
| - [X] T007 [P] 在 `/root/github/taosdata/TDinternal/community/source/libs/sync/src/syncMain.c` 增加 `syncCatchupLogIntervalMs` 默认值、读取与非法值回退 | ||
| - [X] T008 在 `/root/github/taosdata/TDinternal/community/source/libs/sync/inc/syncInt.h` 增加 catchup 观测字段声明(进度日志节流与诊断复用) | ||
| - [X] T009 在 `/root/github/taosdata/TDinternal/community/source/libs/sync/src/syncMain.c` 增加 CHECK_SYNC 诊断输出结构(lag/threshold/snapshotActive) |
| ### Source Code (repository root) | ||
|
|
||
| ```text | ||
| community/source/libs/sync/ | ||
| ├── inc/ | ||
| └── src/ | ||
| ├── syncMain.c | ||
| ├── syncPipeline.c | ||
| ├── syncReplication.c | ||
| ├── syncCommit.c | ||
| └── syncSnapshot.c | ||
|
|
||
| community/source/dnode/mnode/impl/src/ | ||
| └── mndArbGroup.c | ||
|
|
||
| community/source/dnode/vnode/src/vnd/ | ||
| └── vnodeSvr.c | ||
|
|
||
| community/tests/ | ||
| ├── system-test/ | ||
| └── pytest/ | ||
| ``` |
| @@ -772,19 +844,34 @@ int32_t syncCheckSynced(int64_t rid) { | |||
| TAOS_RETURN(code); | |||
| } | |||
|
|
|||
| if (pSyncNode->state != TAOS_SYNC_STATE_LEADER) { | |||
| if (pSyncNode->state != TAOS_SYNC_STATE_LEADER && pSyncNode->state != TAOS_SYNC_STATE_ASSIGNED_LEADER) { | |||
| code = TSDB_CODE_SYN_NOT_LEADER; | |||
| syncNodeRelease(pSyncNode); | |||
| TAOS_RETURN(code); | |||
| } | |||
|
|
|||
| bool isSync = pSyncNode->commitIndex >= pSyncNode->assignedCommitIndex; | |||
| code = (isSync ? TSDB_CODE_SUCCESS : TSDB_CODE_VND_ARB_NOT_SYNCED); | |||
| if (!isSync) { | |||
| sInfo("vgId:%d, not synced, assignedCommitIndex:%" PRId64 ", commitIndex:%" PRId64, pSyncNode->vgId, | |||
| pSyncNode->assignedCommitIndex, pSyncNode->commitIndex); | |||
| bool isSync = false; | |||
| int64_t maxLag = 0; | |||
| SyncIndex minMatchIndex = SYNC_INDEX_INVALID; | |||
| bool snapshotSending = syncNodeSnapshotSending(pSyncNode); | |||
|
|
|||
| bool leaderBaseReady = pSyncNode->commitIndex >= pSyncNode->assignedCommitIndex; | |||
| if (leaderBaseReady) { | |||
| if (pSyncNode->peersNum <= 0) { | |||
| isSync = true; | |||
| } else { | |||
| maxLag = syncNodeGetMaxFollowerLag(pSyncNode, &minMatchIndex); | |||
| isSync = (maxLag <= syncNodeGetLagThreshold()); | |||
| if (snapshotSending && maxLag > syncNodeGetLagThreshold()) { | |||
| isSync = false; | |||
| } | |||
| } | |||
| } | |||
|
|
|||
| syncNodeLogCatchupProgress(pSyncNode, isSync, maxLag, minMatchIndex, snapshotSending); | |||
|
|
|||
| code = (isSync ? TSDB_CODE_SUCCESS : TSDB_CODE_VND_ARB_NOT_SYNCED); | |||
|
|
|||
| | `syncCheckSynced()` | `community/source/libs/sync/src/syncMain.c` | 判断 vgroup 是否已同步 | | ||
| | `mndArbCheckSync()` | `community/source/dnode/mnode/impl/src/mndArbGroup.c` | Arbiter 定期检查 isSync | | ||
| | `ASSIGNED_LEADER` 状态 | `community/source/libs/sync/src/syncMain.c` | 单副本降级运行模式 | | ||
| | `SSyncLogReplMgr` | `community/source/libs/sync/inc/syncInt.h` | 控制 leader → follower 日志复制进度 | | ||
| | `snapshotSender` | `community/source/libs/sync/src/syncSnapshot.c` | 快照发送,用于追落太远的副本 | |
Description
Issue(s)
Checklist
Please check the items in the checklist if applicable.