[core][distributed] use tcp store directly#10275
Conversation
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
|
running stress tests locally to see if it will hang or have race conditions |
|
locally tested 100 times and it's good. |
Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
inspired by pytorch/rfcs#71 (comment) , we can use https://pytorch.org/docs/stable/distributed.html#torch.distributed.TCPStore directly, instead of depending on internal APIs like rendezvous and prefix store.