Skip to content

perf(sdb): optimize sdbWriteFileImp to eliminate lock contention and reduce I/O syscalls.#35286

Open
xiao-77 wants to merge 1 commit into
mainfrom
enh/main/sdb-write
Open

perf(sdb): optimize sdbWriteFileImp to eliminate lock contention and reduce I/O syscalls.#35286
xiao-77 wants to merge 1 commit into
mainfrom
enh/main/sdb-write

Conversation

@xiao-77
Copy link
Copy Markdown
Contributor

@xiao-77 xiao-77 commented May 7, 2026

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the SDB checkpointing process by introducing a write buffer to reduce syscalls, moving encryption key checks outside the main loop, and implementing a two-phase row snapshotting mechanism to minimize lock contention. Feedback highlights a potential data consistency issue due to the 'fuzzy checkpoint' behavior where locks are released before encoding. Additionally, improvements were suggested for error handling during key loading and optimizing memory management for the encryption buffer.

Comment on lines +728 to +731
// Perf-A: Phase 2 — encode and write with no lock held.
// Pinned rows remain valid in memory even if concurrently deleted from the
// hash. Re-check status before encoding: a row that transitioned to DROPPED
// after the snapshot must not be written to the checkpoint file.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Releasing the table lock before encoding and writing rows introduces a "fuzzy checkpoint" behavior. While this eliminates lock contention, it means that concurrent updates to the same row could result in inconsistent data being written to the checkpoint file (e.g., a row containing a mix of old and new field values if the encoding process is not atomic relative to updates). If the system requires strict consistency for the checkpoint file, the lock should be held during the encoding phase, or an MVCC-like mechanism should be used.

Comment on lines +651 to +652
if (taosWaitCfgKeyLoaded() != 0) {
code = terrno;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The return value of taosWaitCfgKeyLoaded() should be used directly to set the code variable, rather than relying on terrno. This is more robust as it avoids potential issues if terrno was not correctly set by the callee.

  if ((code = taosWaitCfgKeyLoaded()) != 0) {

Comment on lines +755 to +764
if (needed > encBufLen) {
taosMemoryFree(encBuf);
encBuf = taosMemoryMalloc(needed);
if (encBuf == NULL) {
code = terrno;
encBufLen = 0;
} else {
encBufLen = needed;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of freeing and re-allocating the encryption buffer when a larger size is needed, consider using taosMemoryRealloc. This is generally more efficient as it may avoid a full copy and re-allocation if the current memory block can be extended in place.

              if (needed > encBufLen) {
                char *tmp = taosMemoryRealloc(encBuf, needed);
                if (tmp == NULL) {
                  code = terrno;
                } else {
                  encBuf = tmp;
                  encBufLen = needed;
                }
              }

Copilot AI review requested due to automatic review settings May 7, 2026 06:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the mnode SDB checkpoint writer (sdbWriteFileImp) to reduce lock contention and cut down on write() syscalls during sdb.data generation.

Changes:

  • Add a 256 KiB write buffer (sdbBufWrite/sdbFlushBuf) to batch writes and reduce syscall frequency.
  • Move taosWaitCfgKeyLoaded() to a one-time check before row loops.
  • Snapshot row pointers under a read lock and reuse an encryption buffer to avoid per-row malloc/free.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +728 to +743
// Perf-A: Phase 2 — encode and write with no lock held.
// Pinned rows remain valid in memory even if concurrently deleted from the
// hash. Re-check status before encoding: a row that transitioned to DROPPED
// after the snapshot must not be written to the checkpoint file.
int32_t rowCount = (int32_t)taosArrayGetSize(pRowList);
for (int32_t j = 0; j < rowCount; j++) {
SSdbRow *pRow = *(SSdbRow **)taosArrayGet(pRowList, j);

if (tsMetaKey[0] != '\0') {
newDataLen = ENCRYPTED_LEN(pRaw->dataLen);
newData = taosMemoryMalloc(newDataLen);
if (newData == NULL) {
code = terrno;
taosHashCancelIterate(hash, ppRow);
sdbFreeRaw(pRaw);
break;
}
if (code == 0) {
if (pRow->status == SDB_STATUS_DROPPED) {
sdbPrintOper(pSdb, pRow, "not-write");
} else {
sdbPrintOper(pSdb, pRow, "write");

SSdbRaw *pRaw = (*encodeFp)(pRow->pObj);
if (pRaw == NULL) {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants