Skip to content

feat: retain working Lantern servers across config updates#534

Open
garmr-ulfr wants to merge 5 commits into
mainfrom
retain-working-servers
Open

feat: retain working Lantern servers across config updates#534
garmr-ulfr wants to merge 5 commits into
mainfrom
retain-working-servers

Conversation

@garmr-ulfr

@garmr-ulfr garmr-ulfr commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

This pull request updates the logic for handling config responses to keep working Lantern servers from previous configs instead of replacing the full set entirely. Servers are retained until they stop working or the max limit is reached (60).

Lantern server management improvements

  • Refactored the server update logic in LocalBackend: replaced setServers with a new updateServers method that intelligently evicts Lantern servers when the configured limit is exceeded, prioritizing removal of hard-demoted and oldest servers, and avoids disrupting refreshed or active servers. Added helper functions for server tag management and eviction logic. (backend/radiance.go) [1] [2] [3]
  • Defined a maximum retention limit for Lantern servers (maxRetainedLanternServers) to prevent excessive resource usage. (backend/radiance.go)

CLI and usability improvements

  • Enhanced the lantern servers list command with a --limit flag to restrict the number of servers shown, and added sorting by lowest latency. The output now also prints the total number of servers available. (cmd/lantern/servers.go) [1] [2] [3] [4] [5]
  • Commented out routing the child process's stdout into the log file. The child process logs to both the file and std, so logs were being duplicated. Added a TODO comment for coming up with a better solution.

Dependency updates

  • Updated lantern-box dependency to version v0.0.95 in go.mod.

Replace setServers with updateServers, which keeps working Lantern
servers from prior config batches instead of replacing the full set.
Servers present in the incoming config are pruned from the add list so
their selection history and connections survive; retained servers are
capped at 60 and evicted oldest-first by selection age, with hard-demoted
servers evicted first.

Sort 'servers list' by latest success delay and add a --limit flag.
Include the event value in the panic-in-callback log.

Bumps lantern-box to v0.0.95.
Copilot AI review requested due to automatic review settings June 19, 2026 00:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes how Lantern server configs are applied so previously working servers can be retained across config updates (with a cap), and it improves the lantern servers list UX while updating the lantern-box dependency.

Changes:

  • Replace full Lantern server set replacement with an “update + evict” strategy that retains prior working servers up to a fixed limit.
  • Add --limit to lantern servers list and sort output by lowest observed latency.
  • Bump github.com/getlantern/lantern-box to v0.0.95.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
backend/radiance.go Adds updateServers + eviction helpers to retain servers across config updates.
backend/radiance_test.go Adds/relocates backend tests, including eviction tests for retained Lantern servers.
backend/exhaustion_test.go Removes the old exhaustion gate test file (moved into backend/radiance_test.go).
cmd/lantern/servers.go Adds --limit, sorts by latency, and updates list output behavior.
cmd/lanternd/lanternd.go Adjusts how the parent process derives the logger writer when draining child output.
events/events.go Enhances panic logging in event callbacks to include event context.
go.mod Updates github.com/getlantern/lantern-box version.
go.sum Updates checksums for the lantern-box version bump.
Comments suppressed due to low confidence (1)

cmd/lantern/servers.go:137

  • The new --limit flag is ignored when --json is used, so the CLI behaves inconsistently across output modes.
func serversList(ctx context.Context, c *ipc.Client, showLatency, asJSON bool, limit int) error {
	srvs, err := c.Servers(ctx)
	if err != nil {
		return err
	}
	if asJSON {
		out := make([]ServerListEntry, 0, len(srvs))
		for _, s := range srvs {
			out = append(out, ServerListEntry{
				Tag:              s.Tag,
				Type:             s.Type,
				Location:         s.Location,
				SelectionHistory: s.SelectionHistory,
			})
		}
		return printJSON(out)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/lanternd/lanternd.go Outdated
Comment thread cmd/lantern/servers.go
Comment thread cmd/lantern/servers.go Outdated
Comment thread events/events.go
Comment thread backend/radiance.go Outdated
Prune servers already present in the manager from the incoming list
before computing evictions, and evict hard-demoted Lantern servers even
when they reappear in the new config batch. Drops the now-unused
incomingTags filtering from lanternServersToEvict.

Guard against a nil SelectionHistory when sorting 'servers list', log
only the event type on a panic-in-callback, and write child process
output to stdout only to avoid duplicate log lines.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Comment thread cmd/lantern/servers.go
Comment thread cmd/lantern/servers.go Outdated
Comment thread backend/radiance.go
Sort and truncate the server list before branching on output format so
'servers list --json --limit N' returns the N fastest instead of all
servers. Guard SelectionHistory before dereferencing it in
printServerEntry, and correct the config-update error log to say
'updating servers in manager'.

@myleshorton myleshorton left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a thorough pass across both repos. The core idea — stop discarding selection history on every config refresh — is a real improvement, and Copilot's surface comments are all addressed. One significant correctness issue plus a few design tradeoffs to weigh before merge.

🔴 The prune-by-tag assumption doesn't hold — in-place server updates get silently dropped

updateServers prunes incoming servers whose tag already exists, justified (in the radiance.go:656 thread) by "the tag will change if any of the options change." That isn't true: the client outbound tag is route-ID-derived, not content-derived.

  • Tag = <type>-out-<track>-<routeID> — lantern-cloud cmd/api/config.go:420/:426 (proxy.Name = trackName-routeID, config.go:390).
  • lantern-cloud mutates a route's client-visible options in place under a stable route ID: proxy_infrastructure.sql:957 UpdateVPSRouteRunning … SET address=@address, port=@port, launch_cfg=@launch_cfg WHERE id=@id (+ route_update.go, POST /proxy/routes/{id}).

→ When address/port/launch_cfg change but the route ID (tag) stays the same — in-place reprovision, key/credential rotation, launch_cfg edits — the prune drops the update and the client keeps stale config until that server fails, hard-demotes, is evicted, and is re-added fresh on a later cycle. Full-replace SetServers applied it immediately, so this is a regression. Details in the radiance.go:656 thread. If reprovision/rotation always issues a new route ID this is safe — but that invariant is unenforced/undocumented and load-bearing; please confirm + document, and ideally update-in-place when a tag matches but options differ (preserving SelectionHistory).

🟡 Design tradeoffs worth a decision

  • Config loses authority to force-remove a server. A server dropped from the config is retained until it hard-demotes or ages out — there's no longer a config-driven way to pull a bad/compromised/decommissioned server quickly. Confirm that's acceptable and document it.
  • Hard-demoted-but-still-offered servers flap. They're evicted unconditionally and pruned from the incoming batch, so they're removed this cycle and re-added fresh next cycle (re-probed, demotion lost), repeating every refresh. If the reset is intentional, a comment noting the per-cycle reprobe churn would help.
  • Retaining up to 60 servers means more outbounds + more probing. Worth confirming the auto-select/URL-test path doesn't aggressively probe all 60 retained servers (resource cost).
  • maxRetainedLanternServers caps retained, not total. retentionBudget = max(limit-incomingCount, 0), so a single config batch > 60 still exceeds the cap. Unlikely, but the name implies a hard limit it doesn't enforce.
  • AddServers no longer sets IsLantern. It works only because the caller sets IsLantern: true (radiance.go:238) and Clone() preserves it; the old SetServers enforced it. Eviction silently keys on IsLantern, so a future caller that forgets breaks the cap invisibly — consider keeping that in the manager or asserting it.

🟢 Minor

  • RemoveServersAddServers isn't atomic: if the add fails after the remove, you net-lose servers (low odds given the prune, but silent).
  • Test gap: lanternServersToEvict is well covered, but there's no test for updateServers itself (prune + evict + add + outbounds) — which is where the tag issue lives — nor for incomingCount > limit.
  • cmd/lanternd/lanternd.go: the commented-out handler + TODO is unrelated scope creep for this PR.

👍 Good

Preserving SelectionHistory for working servers across refreshes is the right call and a clear win over reset-everything; eviction unit tests are clean; events.go now names the panicking event type; CI green.

Remove Manager.SetServers, which is now dead code. Also clarify
that hard-demoted Lantern servers are always evicted so a later
re-offer is re-probed as a fresh candidate.
@garmr-ulfr

garmr-ulfr commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

On "🔴 The prune-by-tag assumption doesn't hold"

updateServers prunes incoming servers whose tag already exists, justified (in the radiance.go:656 thread) by "the tag will change if any of the options change." That isn't true: the client outbound tag is route-ID-derived, not content-derived.

This is true. Existing routes will not pick up changes to serverOpts or clientOpts. The launch_cfg is a write-once snapshot baked in at route creation time — there is no reconciliation loop that rewrites it when track config changes. Therefore, route ID and config (tag included) are one-to-one. This is internal details of lantern-cloud, which is where it should documented, not radiance.

The cited UpdateVPSRouteRunning write only fires during a route's initial provisioning, never on a live route a client holds; route_update.go only toggles deprecated. Config changes ship as new routes with new IDs (new tags). Not a defect.

On "🟡 Design tradeoffs worth a decision"

  • Config loses authority to force-remove a server.

Overstated. There is no force-remove channel in the config or in lantern-cloud; removal-by-omission was a side effect of full-replace, not a control surface. When lantern-cloud deprecates a route, it tears the box down promptly, so the server goes unreachable; client hard-demotes within ~5–15 min and evicts it on the next config fetch (≤15 min). Bounded and self-healing; that trade is the point of the PR. Not a blocker.

  • Hard-demoted-but-still-offered servers flap.

The "demotion lost" flagged as a downside is the intended behavior. If a server is re-offered, it's healthy ASN-wide, so re-adding it fresh and discarding SelectionHistory is correct: a re-offer is the bandit saying "still a good arm, re-evaluate," and the client must respect that. Reprobe is bounded to once per poll. Worth a code comment marking the reset as intentional — nothing to fix.

  • Retaining up to 60 servers means more outbounds + more probing.

Confirmed: the background cycle does probe all retained members, but it's concurrency-capped and only while the tunnel is active. That's by design — retention is pointless without it. It's bounded by the cap + interval; if it's too much, we can lower it — it's a tuning knob, not a correctness issue.

  • maxRetainedLanternServers caps retained, not total.

True and irrelevant. total ≤ max(incomingCount, limit): retention never pushes the population above limit. Exceeding it requires the server to assign >60 in one batch — which is lantern-cloud's decision, not retention's. It's named for exactly what it controls — retained Lantern servers. No change needed.

  • AddServers no longer sets IsLantern.

Wrong in both directions. AddServers never set it, and it shouldn't — IsLantern is provenance, known only at the origin; the manager holds user servers too (IsLantern: false) and stamping would mislabel them. SetServers was the only method that ever set it, where the write was incidental to its group-scoped replace and redundant with the construction site (radiance.go:238). The correct action is to delete SetServers since it's now dead code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants