Flagcx lat impr by mikethegoblin · Pull Request #2622 · kvcache-ai/Mooncake

mikethegoblin · 2026-06-25T11:28:10Z

Description

Module

Type of Change

How Has This Been Tested?

Ran tebench as well as vLLM benchmark in 1P1D scenario
tebench result:

Checklist

I have performed a self-review of my own code
I have formatted my code using ./scripts/code_format.sh
I have run pre-commit run --all-files and all hooks pass
I have updated the documentation (if applicable)
I have added tests to prove my changes are effective
For changes >500 LOC: I have filed an RFC issue

AI Assistance Disclosure

No AI tools were used
AI tools were used (specify below)

used codex

gemini-code-assist

Code Review

This pull request refactors the FlagCxTransport to remove the background I/O worker thread, transitioning to direct non-blocking submission of slices and a polling-based completion mechanism via getTransferStatus. The review feedback highlights two key areas for improvement: first, avoiding holding the flagcx_mu_ mutex while calling connForSegment to prevent potential performance bottlenecks from blocking metadata queries; second, optimizing submitSlices to bypass dynamic memory allocation of std::unordered_map when all slices in a batch are homogeneous.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-25T11:30:59Z

+    FlagcxP2pConn *conn = nullptr;
+    {
+        std::lock_guard<std::mutex> lk(flagcx_mu_);
+        conn = connForSegment(group.front()->target_id);
+    }


Holding flagcx_mu_ while calling connForSegment can lead to performance bottlenecks. connForSegment calls metadata_->getSegmentDescByID, which may perform blocking network I/O (e.g., querying etcd or performing handshakes) if the metadata cache is disabled or needs updating. Holding the mutex during this time blocks all other threads trying to submit transfers or poll status.

Consider refactoring connForSegment to only lock flagcx_mu_ around the flagcxP2pEngineGetConn call, and call connForSegment without holding the lock here.

gemini-code-assist · 2026-06-25T11:30:59Z

+void FlagCxTransport::submitSlices(const std::vector<Slice *> &slices) {
+    std::lock_guard<std::mutex> lk(submit_mu_);
+    std::unordered_map<uint64_t, std::vector<Slice *>> groups;
+    for (Slice *s : slices) {
+        const uint64_t key =
+            (static_cast<uint64_t>(s->target_id) << 1) |
+            (s->opcode == TransferRequest::WRITE ? 0u : 1u);
+        groups[key].push_back(s);
+    }
+    for (auto &kv : groups) runSliceGroup(kv.second);
+}


In high-performance networking, avoiding dynamic memory allocations on the data path is crucial. Currently, submitSlices allocates a std::unordered_map on every call. For the very common case where all slices in the batch are homogeneous (same target_id and opcode) or when there is only a single slice, we can completely bypass the map allocation and call runSliceGroup directly.

void FlagCxTransport::submitSlices(const std::vector<Slice *> &slices) { if (slices.empty()) return; std::lock_guard<std::mutex> lk(submit_mu_); bool all_same = true; for (size_t i = 1; i < slices.size(); ++i) { if (slices[i]->target_id != slices[0]->target_id || slices[i]->opcode != slices[0]->opcode) { all_same = false; break; } } if (all_same) { runSliceGroup(slices); return; } std::unordered_map<uint64_t, std::vector<Slice *>> groups; for (Slice *s : slices) { const uint64_t key = (static_cast<uint64_t>(s->target_id) << 1) | (s->opcode == TransferRequest::WRITE ? 0u : 1u); groups[key].push_back(s); } for (auto &kv : groups) runSliceGroup(kv.second); }

alogfans · 2026-06-26T09:11:35Z

I think it's not a good idea that pollPendingTransfers() called during getTransferStatus(). This forces users to call getTransferStatus() to reach progress, that is, user must activity polling.

mikethegoblin added 2 commits June 25, 2026 17:28

implement direct submit

4ff3681

move transfer status polling to getTransferStatus

d3896db

mikethegoblin requested review from alogfans, chestnut-Q and doujiang24 as code owners June 25, 2026 11:28

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

github-actions Bot added run-ci Transfer Engine labels Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flagcx lat impr#2622

Flagcx lat impr#2622
mikethegoblin wants to merge 2 commits into
kvcache-ai:feat/flagcx-supportfrom
mikethegoblin:flagcx-lat-impr

mikethegoblin commented Jun 25, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

alogfans commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mikethegoblin commented Jun 25, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Module

Type of Change

How Has This Been Tested?

Checklist

AI Assistance Disclosure

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

alogfans commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mikethegoblin commented Jun 25, 2026 •

edited by github-actions Bot

Loading