Skip to content

dmchen/try-long-startup-in-2-replica#35307

Open
cadem wants to merge 1 commit into
3.0from
dmchen/try-long-startup-in-2-replica
Open

dmchen/try-long-startup-in-2-replica#35307
cadem wants to merge 1 commit into
3.0from
dmchen/try-long-startup-in-2-replica

Conversation

@cadem
Copy link
Copy Markdown
Contributor

@cadem cadem commented May 11, 2026

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Copilot AI review requested due to automatic review settings May 11, 2026 02:06
@cadem cadem requested review from a team, dapan1121, guanshengliang and zitsen as code owners May 11, 2026 02:06
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a recovery issue in dual-replica vgroups where the system prematurely exits the ASSIGNED_LEADER state before the follower has caught up, causing write stalls. The changes introduce a lag threshold (syncLogLagThreshold) and progress logging to ensure followers are sufficiently synced before returning to normal leader mode. Feedback identifies a critical regression where the new lag check incorrectly affects 3-replica groups, potentially causing them to report as unsynced if one follower is slow. Additionally, the catchup logging logic needs refinement to prevent log flooding in high-throughput scenarios caused by frequent lag updates.

Comment on lines +853 to 869
bool isSync = false;
int64_t maxLag = 0;
SyncIndex minMatchIndex = SYNC_INDEX_INVALID;
bool snapshotSending = syncNodeSnapshotSending(pSyncNode);

bool leaderBaseReady = pSyncNode->commitIndex >= pSyncNode->assignedCommitIndex;
if (leaderBaseReady) {
if (pSyncNode->peersNum <= 0) {
isSync = true;
} else {
maxLag = syncNodeGetMaxFollowerLag(pSyncNode, &minMatchIndex);
isSync = (maxLag <= syncNodeGetLagThreshold());
if (snapshotSending && maxLag > syncNodeGetLagThreshold()) {
isSync = false;
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation applies the lag threshold check to all states, including the normal LEADER state. This introduces a regression for 3-replica groups: if one follower is down or slow, syncCheckSynced will now return NOT_SYNCED, whereas previously it would return SUCCESS as long as the leader was healthy. This violates the assumption that 3-replica behavior remains unchanged. The lag check should be restricted to the ASSIGNED_LEADER state to ensure it only affects the recovery transition of 2-replica groups. Additionally, the explicit check for snapshotSending at line 865 is redundant because the maxLag check already covers that condition.

  bool      isSync = false;
  int64_t   maxLag = 0;
  SyncIndex minMatchIndex = SYNC_INDEX_INVALID;
  bool      snapshotSending = syncNodeSnapshotSending(pSyncNode);

  bool leaderBaseReady = pSyncNode->commitIndex >= pSyncNode->assignedCommitIndex;
  if (leaderBaseReady) {
    if (pSyncNode->peersNum > 0) {
      maxLag = syncNodeGetMaxFollowerLag(pSyncNode, &minMatchIndex);
      if (pSyncNode->state == TAOS_SYNC_STATE_ASSIGNED_LEADER) {
        isSync = (maxLag <= syncNodeGetLagThreshold());
      } else {
        isSync = true;
      }
    } else {
      isSync = true;
    }
  }

Comment on lines +811 to +813
if (pSyncNode->catchupLastLag != maxLag) {
shouldLog = true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition pSyncNode->catchupLastLag != maxLag will cause log flooding in high-throughput scenarios. Since the maxLag (difference between commitIndex and matchIndex) changes with almost every write operation, this condition effectively bypasses the intended throttling provided by syncCatchupLogIntervalMs. This could lead to excessive disk I/O and log file growth.

  if (pSyncNode->catchupLastLogMs == 0 || nowMs - pSyncNode->catchupLastLogMs >= intervalMs) {
    shouldLog = true;
  }

  if (isSync && pSyncNode->catchupLastLag > syncNodeGetLagThreshold()) {
    shouldLog = true;
  }

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses long write unavailability during dual-replica recovery by refining “synced” determination to account for follower catch-up lag, adding catch-up diagnostics/log throttling, and introducing config knobs for tuning.

Changes:

  • Update syncCheckSynced() to require follower lag within a configurable threshold (with snapshot-aware logic) and emit periodic catch-up progress logs.
  • Allow ASSIGNED_LEADER to pass leader-state checks for arbiter sync validation paths.
  • Add global server configs syncLogLagThreshold and syncCatchupLogIntervalMs, plus feature design/spec documentation.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
TD-xxx-try-long-startup-in-2-replica Adds a one-line task/issue note at repo root
source/libs/sync/src/syncMain.c Implements lag-threshold-based sync check + catch-up progress logging
source/libs/sync/inc/syncInt.h Adds per-node fields for catch-up log throttling/diagnostics
source/dnode/vnode/src/vnd/vnodeSvr.c Treats ASSIGNED_LEADER as acceptable for arbiter sync-check handling
source/dnode/mnode/impl/src/mndArbGroup.c Adjusts assigned-leader handling when member token changes via heartbeat
source/common/src/tglobal.c Introduces and wires new server config items for lag threshold and log interval
include/common/tglobal.h Declares new global config variables
docs/specs/001-fix-dual-replica-sync/tasks.md Adds task breakdown for the feature
docs/specs/001-fix-dual-replica-sync/spec.md Adds feature specification
docs/specs/001-fix-dual-replica-sync/research.md Adds research/decision record
docs/specs/001-fix-dual-replica-sync/quickstart.md Adds validation quickstart
docs/specs/001-fix-dual-replica-sync/plan.md Adds implementation plan
docs/specs/001-fix-dual-replica-sync/data-model.md Adds data model notes
docs/specs/001-fix-dual-replica-sync/contracts/internal-sync-check-contract.md Adds internal behavior contract
docs/specs/001-fix-dual-replica-sync/CLARIFICATION_REPORT.md Adds clarification workflow report
docs/specs/001-fix-dual-replica-sync/checklists/requirements.md Adds requirements checklist

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mndArbGroupResetAssignedLeader(pNewGroup);
resetAssigned = true;
tstrncpy(pNewGroup->assignedLeader.token, pRspMember->memberToken, TSDB_ARB_TOKEN_SIZE);
pNewGroup->assignedLeader.assignAcked = true;
@@ -0,0 +1 @@
/speckit.specify 分析项目的代码,理清现有双副本实现。目前双副本实现有个问题,就是当其中一个副本宕掉,vgroup变成单副本运行,这时写入大量数据,然后宕掉的副本起来,vgroup又变回双副本,这时vgroup会长时间无法进入到正常状态,估计是刚起来的副本在追数据。请分析这个问题,并且修复这个问题。 No newline at end of file
Comment on lines +4 to +8
```bash
cd /root/github/taosdata/TDinternal
cmake -S . -B build
cmake --build build -j
```
Comment on lines +20 to +37
- [ ] T001 校验构建与测试基线于 `/root/github/taosdata/TDinternal/build`(`cmake --build` 与 `ctest --output-on-failure`)
- [X] T002 创建 sync 单测目录与初始化文件于 `/root/github/taosdata/TDinternal/community/tests/pytest/sync/`
- [X] T003 [P] 创建双副本系统测试目录于 `/root/github/taosdata/TDinternal/community/tests/system-test/2-query/dual_replica/`
- [X] T004 [P] 对齐 quickstart 执行入口说明于 `/root/github/taosdata/TDinternal/specs/001-fix-dual-replica-sync/quickstart.md`

---

## Phase 2: Foundational (Blocking Prerequisites)

**Purpose**: 建立所有用户故事共享的判定/配置/诊断基础能力

**⚠️ CRITICAL**: 本阶段完成前,不进入任何用户故事实现

- [X] T005 在 `/root/github/taosdata/TDinternal/community/source/libs/sync/src/syncMain.c` 增加 follower lag 统一计算辅助逻辑(leaderCommitIndex、matchIndex、lag)
- [X] T006 [P] 在 `/root/github/taosdata/TDinternal/community/source/libs/sync/src/syncMain.c` 增加 `syncLogLagThreshold` 默认值、读取与非法值回退
- [X] T007 [P] 在 `/root/github/taosdata/TDinternal/community/source/libs/sync/src/syncMain.c` 增加 `syncCatchupLogIntervalMs` 默认值、读取与非法值回退
- [X] T008 在 `/root/github/taosdata/TDinternal/community/source/libs/sync/inc/syncInt.h` 增加 catchup 观测字段声明(进度日志节流与诊断复用)
- [X] T009 在 `/root/github/taosdata/TDinternal/community/source/libs/sync/src/syncMain.c` 增加 CHECK_SYNC 诊断输出结构(lag/threshold/snapshotActive)
Comment on lines +63 to +84
### Source Code (repository root)

```text
community/source/libs/sync/
├── inc/
└── src/
├── syncMain.c
├── syncPipeline.c
├── syncReplication.c
├── syncCommit.c
└── syncSnapshot.c

community/source/dnode/mnode/impl/src/
└── mndArbGroup.c

community/source/dnode/vnode/src/vnd/
└── vnodeSvr.c

community/tests/
├── system-test/
└── pytest/
```
Comment on lines 838 to +874
@@ -772,19 +844,34 @@ int32_t syncCheckSynced(int64_t rid) {
TAOS_RETURN(code);
}

if (pSyncNode->state != TAOS_SYNC_STATE_LEADER) {
if (pSyncNode->state != TAOS_SYNC_STATE_LEADER && pSyncNode->state != TAOS_SYNC_STATE_ASSIGNED_LEADER) {
code = TSDB_CODE_SYN_NOT_LEADER;
syncNodeRelease(pSyncNode);
TAOS_RETURN(code);
}

bool isSync = pSyncNode->commitIndex >= pSyncNode->assignedCommitIndex;
code = (isSync ? TSDB_CODE_SUCCESS : TSDB_CODE_VND_ARB_NOT_SYNCED);
if (!isSync) {
sInfo("vgId:%d, not synced, assignedCommitIndex:%" PRId64 ", commitIndex:%" PRId64, pSyncNode->vgId,
pSyncNode->assignedCommitIndex, pSyncNode->commitIndex);
bool isSync = false;
int64_t maxLag = 0;
SyncIndex minMatchIndex = SYNC_INDEX_INVALID;
bool snapshotSending = syncNodeSnapshotSending(pSyncNode);

bool leaderBaseReady = pSyncNode->commitIndex >= pSyncNode->assignedCommitIndex;
if (leaderBaseReady) {
if (pSyncNode->peersNum <= 0) {
isSync = true;
} else {
maxLag = syncNodeGetMaxFollowerLag(pSyncNode, &minMatchIndex);
isSync = (maxLag <= syncNodeGetLagThreshold());
if (snapshotSending && maxLag > syncNodeGetLagThreshold()) {
isSync = false;
}
}
}

syncNodeLogCatchupProgress(pSyncNode, isSync, maxLag, minMatchIndex, snapshotSending);

code = (isSync ? TSDB_CODE_SUCCESS : TSDB_CODE_VND_ARB_NOT_SYNCED);

Comment on lines +18 to +22
| `syncCheckSynced()` | `community/source/libs/sync/src/syncMain.c` | 判断 vgroup 是否已同步 |
| `mndArbCheckSync()` | `community/source/dnode/mnode/impl/src/mndArbGroup.c` | Arbiter 定期检查 isSync |
| `ASSIGNED_LEADER` 状态 | `community/source/libs/sync/src/syncMain.c` | 单副本降级运行模式 |
| `SSyncLogReplMgr` | `community/source/libs/sync/inc/syncInt.h` | 控制 leader → follower 日志复制进度 |
| `snapshotSender` | `community/source/libs/sync/src/syncSnapshot.c` | 快照发送,用于追落太远的副本 |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants