dist: skip stale cert when rebuilding scheduler HTTP client#2707
Open
timn-nexthop wants to merge 1 commit into
Open
dist: skip stale cert when rebuilding scheduler HTTP client#2707timn-nexthop wants to merge 1 commit into
timn-nexthop wants to merge 1 commit into
Conversation
When a build server registers with a new certificate, the scheduler rebuilds its outbound reqwest client with the new cert plus every other cert it knows about. The loop over `certs.values()` ran before `certs.insert(...)` overwrote the map entry, so it still contained the stale cert for the same `server_id` — meaning both the old and new self-signed certs for that server were installed as trust anchors in the rebuilt client. Each build server's cert is self-signed, so old and new share a Subject DN. TLS validators index trust anchors by Subject; with two anchors having the same name, path building can pick the stale one, fail signature verification against its public key, and reject the handshake. The result is that cert rotation on a build server deterministically breaks the scheduler's ability to talk to it until the scheduler restarts. Skip the entry matching `server_id` when iterating existing certs so only the up-to-date cert for that server ends up as a trust anchor.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a build server registers with a new certificate, the scheduler rebuilds its outbound reqwest client with the new cert plus every other cert it knows about. The loop over
certs.values()ran beforecerts.insert(...)overwrote the map entry, so it still contained the stale cert for the sameserver_id— meaning both the old and new self-signed certs for that server were installed as trust anchors in the rebuilt client.Each build server's cert is self-signed, so old and new share a Subject DN. TLS validators index trust anchors by Subject; with two anchors having the same name, path building can pick the stale one, fail signature verification against its public key, and reject the handshake. The result is that cert rotation on a build server deterministically breaks the scheduler's ability to talk to it until the scheduler restarts.
Skip the entry matching
server_idwhen iterating existing certs so only the up-to-date cert for that server ends up as a trust anchor.We've been running with this change since mid December and it's fixed this problem for us.