taosdata · cadem · May 6, 2026
@@ -0,0 +1 @@
+改进目前raft的选举实现，加入对appliedIndex的判断，appliedIndex大的更优先被选举为leader。不要替我创建git分支。
@@ -0,0 +1,36 @@
+# Specification Quality Checklist: Raft Election with Applied Progress Priority
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-04-10
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [x] No implementation details (languages, frameworks, APIs)
+- [x] Focused on user value and business needs
+- [x] Written for non-technical stakeholders
+- [x] All mandatory sections completed
+
+## Requirement Completeness
+
+- [x] No [NEEDS CLARIFICATION] markers remain
+- [x] Requirements are testable and unambiguous
+- [x] Success criteria are measurable
+- [x] Success criteria are technology-agnostic (no implementation details)
+- [x] All acceptance scenarios are defined
+- [x] Edge cases are identified
+- [x] Scope is clearly bounded
+- [x] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [x] All functional requirements have clear acceptance criteria
+- [x] User scenarios cover primary flows
+- [x] Feature meets measurable outcomes defined in Success Criteria
+- [x] No implementation details leak into specification
+
+## Notes
+
+- Validation iteration 1: all checklist items passed.
+- No clarification questions required.
+- Branch creation step was intentionally skipped per user instruction; artifacts were created directly under the sequential feature directory.
@@ -0,0 +1,43 @@
+# Contract: RequestVote Applied-Index Preference
+
+## Interface
+
+- Interface type: Internal node-to-node sync RPC contract
+- Producer: `syncNodeRequestVotePeers`
+- Consumer: `syncNodeOnRequestVote`
+- Message: `SyncRequestVote`
+
+## Request Contract
+
+Required fields for vote evaluation:
+
+| Field | Meaning | Requirement |
+|------|---------|-------------|
+| `term` | Candidate term | Must equal sender current term |
+| `lastLogTerm` | Candidate last log term | Must be populated from local sync log state |
+| `lastLogIndex` | Candidate last log index | Must be populated from local sync log state |
+| `candidateAppliedIndex` | Candidate applied progress snapshot | Must be populated from `FpAppliedIndexCb` at request-build time |
+
+## Vote Evaluation Rules
+
+1. Reject requests from nodes not in the raft group.
+2. Update local term and step down if the request term is higher, preserving existing behavior.
+3. Apply the existing log-recency gate based on `lastLogTerm` and `lastLogIndex`.
+4. If the candidate fails the log-recency gate, reject the vote regardless of applied progress.
+5. If the candidate passes the log-recency gate, compare `candidateAppliedIndex` with the receiver's local applied index.
+6. Prefer granting votes to the candidate with greater or equal applied progress.
+7. If either side reports an unavailable applied index, treat applied progress as unavailable for ordering and fall back to existing deterministic behavior.
+8. If applied progress is equal, fall back to existing deterministic behavior.
+9. Reset the election timer only when the vote is actually granted, preserving current liveness behavior.
+
+## Observability Contract
+
+- Vote-decision logs must include both log-recency inputs and applied-progress inputs.
+- Logs for unequal applied indexes must reveal whether the applied-progress comparison influenced the grant result.
+- Election outcome logs must allow operators to explain why the elected node was preferred.
+
+## Compatibility Assumption
+
+- This contract change assumes homogeneous cluster binaries for the feature rollout.
+- Mixed-version compatibility is not included in this feature plan.
+- Operational rollout guidance MUST document same-version deployment as a prerequisite.
@@ -0,0 +1,61 @@
+# Data Model: Raft Election with Applied Progress Priority
+
+## Candidate Vote Request
+
+- Purpose: Internal request-vote message emitted by a candidate to each voting peer during an election round.
+- Fields:
+  - `srcId`: Candidate raft identity.
+  - `destId`: Target voter identity.
+  - `term`: Candidate term for the election round.
+  - `lastLogIndex`: Candidate last log index used by the existing recency gate.
+  - `lastLogTerm`: Candidate last log term used by the existing recency gate.
+  - `candidateAppliedIndex`: Candidate applied progress snapshot used for preference ordering.
+- Validation rules:
+  - `term` must equal the candidate's current persisted term at send time.
+  - `candidateAppliedIndex` must be derived from the FSM callback at the time the request is built.
+  - Message consumers must tolerate equal applied-index values and preserve deterministic tie fallback.
+
+## Local Applied Progress Snapshot
+
+- Purpose: The receiving node's current view of its own applied progress at vote-evaluation time.
+- Fields:
+  - `localAppliedIndex`: Value returned by `FpAppliedIndexCb`.
+  - `localLastLogIndex`: Last local log index.
+  - `localLastLogTerm`: Last local log term.
+- Validation rules:
+  - `localLastLogTerm` and `localLastLogIndex` remain the first-stage election safety check.
+  - `localAppliedIndex` is compared only after the candidate satisfies the safety gate.
+
+## Election Decision Record
+
+- Purpose: Structured evidence for why a vote was granted or rejected and why a leader was elected.
+- Fields:
+  - `term`
+  - `candidateId`
+  - `voterId`
+  - `candidateLastLogTerm`
+  - `candidateLastLogIndex`
+  - `candidateAppliedIndex`
+  - `localLastLogTerm`
+  - `localLastLogIndex`
+  - `localAppliedIndex`
+  - `grantDecision`
+  - `decisionReason`
+- Validation rules:
+  - Every unequal-applied-index vote decision must be explainable from logged decision factors.
+  - Logging must not remove or obscure the existing recency decision context.
+
+## Election Round
+
+- Purpose: One leader-election attempt for a vgroup term.
+- Fields:
+  - `term`
+  - `candidateSet`
+  - `voteResponses`
+  - `winner`
+  - `winnerAppliedIndex`
+  - `tieFallbackUsed`
+- State transitions:
+  - `Follower -> Candidate`: election starts and self-vote recorded.
+  - `Candidate -> Leader`: quorum obtained, possibly influenced by applied-progress preference.
+  - `Candidate/Follower -> Follower`: step-down on higher term or vote for another candidate.
@@ -0,0 +1,87 @@
+# Implementation Plan: Raft Election with Applied Progress Priority
+
+**Branch**: `[001-raft-appliedindex-election]` | **Date**: 2026-04-10 | **Spec**: `/root/github/taosdata/TDinternal/specs/001-raft-appliedindex-election/spec.md`
+**Input**: Feature specification from `/specs/001-raft-appliedindex-election/spec.md`
+
+**Note**: Planning was bound to `SPECIFY_FEATURE=001-raft-appliedindex-election` because the user explicitly requested no branch creation.
+
+## Summary
+
+Improve TDinternal's Raft leader election so that eligible candidates with higher applied progress are preferred, while preserving existing last-log-term/last-log-index safety checks and election liveness. The implementation will extend the internal `SyncRequestVote` contract to carry the candidate's applied index, incorporate an applied-progress comparison into vote granting after existing log recency validation, and add observability plus unit/integration coverage for unequal-progress, tie, and lagging-node election scenarios.
+
+## Technical Context
+
+**Language/Version**: C for sync runtime, C++ for existing Google Test coverage  
+**Primary Dependencies**: TDinternal sync library, internal RPC message layer, WAL/log store, FSM callbacks, Google Test  
+**Storage**: Raft store metadata, WAL-backed log store, FSM-reported applied progress  
+**Testing**: Existing `community/source/libs/sync/test` Google Test executables plus targeted cluster-style sync tests  
+**Target Platform**: Linux server nodes running TDinternal sync/raft services  
+**Project Type**: Native distributed systems library  
+**Performance Goals**: Keep vote evaluation O(1), add no extra election round-trips, and avoid measurable regression in election completion under current replica counts  
+**Constraints**: Preserve Raft log safety, preserve deterministic behavior on ties, keep elections live when applied progress is unavailable/equal, and treat mixed-version wire compatibility as out of scope unless separately planned  
+**Scale/Scope**: Per-vgroup replica elections across small quorum-based replica sets with optional learners; change scope is limited to sync election logic, internal message schema, and sync tests
+
+## Constitution Check
+
+*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
+
+- Constitution file review result: `.specify/memory/constitution.md` still contains template placeholders and no ratified enforceable principles.
+- Gate status before research: PASS by absence of enforceable constitutional constraints.
+- Required caution despite missing constitution rules: preserve existing Raft safety invariants, keep plan scoped to sync election paths, and do not assume branch creation is available.
+- Post-design re-check: PASS. The design stays within the sync library boundary, preserves current safety gating on log recency, and adds only one internal message-contract change plus corresponding tests.
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/001-raft-appliedindex-election/
+├── plan.md
+├── research.md
+├── data-model.md
+├── quickstart.md
+├── contracts/
+│   └── request-vote-applied-index.md
+└── tasks.md
+```
+
+### Source Code (repository root)
+
+```text
+community/source/libs/sync/
+├── inc/
+│   ├── syncElection.h
+│   ├── syncInt.h
+│   ├── syncMessage.h
+│   ├── syncRequestVote.h
+│   └── syncUtil.h
+├── src/
+│   ├── syncElection.c
+│   ├── syncMain.c
+│   ├── syncMessage.c
+│   ├── syncRequestVote.c
+│   └── syncUtil.c
+└── test/
+    ├── syncElectTest.cpp
+    ├── syncRequestVoteTest.cpp
+    ├── syncVotesGrantedTest.cpp
+    └── sync_test_lib/
+
+community/tests/
+└── pytest/
+    └── cluster/
+        └── syncingTest.py
+
+community/contrib/test/traft/
+├── cluster/
+├── join_into_vgroup/
+└── single_node/
+```
+
+**Structure Decision**: Keep all runtime changes inside `community/source/libs/sync/{inc,src}` and validate behavior primarily through `community/source/libs/sync/test`. Use existing cluster-style test scaffolding under `community/contrib/test/traft` or `community/tests/pytest/cluster` only if unit-level coverage cannot express a required election scenario.
+
+## Complexity Tracking
+
+| Violation | Why Needed | Simpler Alternative Rejected Because |
+|-----------|------------|-------------------------------------|
+| Internal wire contract change | `SyncRequestVote` must carry candidate applied progress so remote voters can compare candidates using data they do not currently receive | Deriving candidate applied progress locally is impossible with current message payloads |
@@ -0,0 +1,51 @@
+# Quickstart: Raft Election with Applied Progress Priority
+
+## 1. Build targeted sync tests
+
+From the repository root:
+
+```sh
+cmake --build build --target syncRequestVoteTest syncVotesGrantedTest syncElectTest
+```
+
+## 2. Implement the runtime change
+
+Update the sync runtime in these areas:
+
+- Add the applied-index field to `SyncRequestVote` in `community/source/libs/sync/inc/syncMessage.h`.
+- Populate the field in `community/source/libs/sync/src/syncElection.c` when building vote requests.
+- Read local applied progress from `FpAppliedIndexCb` in `community/source/libs/sync/src/syncRequestVote.c` and incorporate it after the existing log-recency gate.
+- Extend request-vote logging in `community/source/libs/sync/src/syncUtil.c` or adjacent sync logging paths so operators can see the comparison inputs.
+
+## 3. Add focused test coverage
+
+Recommended first-pass coverage:
+
+- `syncRequestVoteTest`: grant and reject decisions for higher, lower, equal, and unavailable applied-index cases.
+- `syncVotesGrantedTest` or adjacent helpers: ensure tie handling remains deterministic.
+- `syncElectTest`: verify a higher-applied candidate wins once all candidates are otherwise log-eligible.
+
+## 4. Run targeted tests
+
+Run the built binaries from the build output directory, for example:
+
+```sh
+./build/community/source/libs/sync/test/syncRequestVoteTest
+./build/community/source/libs/sync/test/syncVotesGrantedTest
+./build/community/source/libs/sync/test/syncElectTest
+```
+
+## 5. Capture failover baseline and comparison
+
+- Record a baseline failover drill before enabling the new election preference.
+- Re-run the same drill after the change using the same replica topology and workload.
+- Compare median leader recovery completion time and keep the before/after timestamps with the test logs.
+- If using `community/tests/pytest/cluster/syncingTest.py`, record total elapsed time around the relevant replica and schema-change sequence.
+
+## 6. Verify observability and rollout assumptions
+
+- Trigger an election scenario with unequal applied progress.
+- Confirm logs include candidate/local applied index, last-log term/index, and the grant reason.
+- Confirm tie scenarios still elect a single leader without repeated deadlock.
+- Confirm unavailable applied index falls back to the existing deterministic path instead of blocking votes.
+- Roll out only to same-version clusters because the RequestVote wire contract has changed.
@@ -0,0 +1,37 @@
+# Research: Raft Election with Applied Progress Priority
+
+## Decision 1: Keep last-log-term and last-log-index as the primary safety gate
+
+- Decision: Preserve the current `lastLogTerm` and `lastLogIndex` vote eligibility logic as the first-stage gate, and only evaluate applied progress after the candidate passes that existing Raft recency check.
+- Rationale: The current implementation in `syncRequestVote.c` enforces Raft-style log freshness. Replacing that check with applied progress would risk violating election safety and diverging from established sync behavior.
+- Alternatives considered: Use applied progress as the sole election criterion. Rejected because applied progress does not encode log freshness or term ordering and would weaken the existing safety guard.
+
+## Decision 2: Extend `SyncRequestVote` with candidate applied index
+
+- Decision: Add a candidate applied-index field to the internal `SyncRequestVote` message and populate it during `syncNodeRequestVotePeers`.
+- Rationale: Remote voters cannot prefer the most up-to-date candidate unless the candidate's applied progress is transmitted with the vote request. The existing contract only carries term and log recency metadata.
+- Alternatives considered: Infer candidate applied progress from heartbeats or cached peer state. Rejected because cached progress may be stale or absent exactly when an election is needed.
+
+## Decision 3: Source applied progress from the FSM callback already exposed by sync
+
+- Decision: Use `SSyncFSM::FpAppliedIndexCb` as the authoritative provider for local applied progress during vote request construction and vote evaluation.
+- Rationale: The sync layer already invokes this callback for observability in `syncUtil.c`, which makes it the least invasive and most consistent source of applied progress.
+- Alternatives considered: Introduce a new state field on `SSyncNode` or reuse `commitIndex`. Rejected because that duplicates existing data flow or fails to represent actually applied progress.
+
+## Decision 4: Define applied progress as a deterministic preference factor, not a liveness blocker
+
+- Decision: Use applied progress to prefer the candidate with the highest applied index when log recency is acceptable, but retain deterministic fallback behavior when applied indexes are equal or unavailable.
+- Rationale: The feature request requires higher applied progress to be preferred, while the specification also requires elections to remain live under ties and degraded conditions.
+- Alternatives considered: Refuse all votes for candidates with missing or lower applied index. Rejected because it could deadlock elections during partial visibility or uniform lag.
+
+## Decision 5: Treat mixed-version message compatibility as out of scope for this feature slice
+
+- Decision: Plan this feature as a homogeneous-version cluster change that updates both message producer and consumer together.
+- Rationale: `SyncRequestVote` is a raw internal message contract with fixed-size allocation in `syncMessage.c`; adding fields changes the on-wire layout. Supporting mixed-version clusters would require a broader version-negotiation or dual-decoding design not requested here.
+- Alternatives considered: Add backward-compatible dual-format decoding in the same feature. Rejected because it expands scope materially beyond election preference behavior.
+
+## Decision 6: Validate behavior primarily with sync unit tests, plus optional cluster-style follow-up coverage
+
+- Decision: Add or update tests in `community/source/libs/sync/test` for message contract coverage, vote-grant decision logic, and election outcomes, with cluster-style tests only if a unit harness cannot express a scenario.
+- Rationale: The sync test suite already contains focused executables for request-vote, vote-manager, and election behavior, which is the most efficient place to pin regressions.
+- Alternatives considered: Rely solely on manual `traft` cluster programs. Rejected because they are slower and less precise for guarding edge cases like ties and unavailable applied progress.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		改进目前raft的选举实现，加入对appliedIndex的判断，appliedIndex大的更优先被选举为leader。不要替我创建git分支。