feat: add EndPoint.resolveAll() for multi-address DNS expansion (DRIVER-201) — Part 2/2#890
Conversation
…ER-201) Addresses the endpoint-API aspect of DRIVER-201. Problem: EndPoint.resolve() returns a single SocketAddress. When a hostname maps to multiple IPs, the driver can only try the first one and fails with AllNodesFailedException if it is unreachable — the remaining IPs are invisible to the connection layer. Solution (per @dkropachev's architectural direction): - Deprecate EndPoint.resolve(). Add EndPoint.resolveAll() with a default implementation that wraps resolve() in a single-element array for backward compatibility with third-party implementations. - DefaultEndPoint.resolveAll(): if the stored InetSocketAddress is unresolved, calls InetAddress.getAllByName() to expand the hostname to all known IPs, returning one InetSocketAddress per IP. Falls back to the single-element unresolved address if DNS fails, so the connect attempt surfaces a descriptive error rather than returning empty. - SniEndPoint.resolveAll(): re-resolves the proxy hostname on each call and returns all A-records sorted by IP, enabling the caller to try each proxy address in sequence. - ClientRoutesEndPoint.resolveAll(): delegates to resolve() (single- address topology-monitor lookup) and wraps in a one-element array. - ChannelFactory.connect(): replaced endPoint.resolve() with endPoint.resolveAll(). Iterates through the returned candidates via tryNextCandidate(); on per-address failure logs and tries the next; only fails the overall resultFuture when all candidates are exhausted. Protocol-version negotiation (downgrade retries) is scoped to the same address via connectToAddress(), which is semantically correct. Tests: - DefaultEndPointTest: 3 new cases — already-resolved passthrough, unresolved hostname expansion, unresolvable hostname fallback. - SniEndPointTest: new class with cases for resolveAll() happy path, unresolvable host exception, and resolve() sanity check. - All 13 existing ChannelFactory tests continue to pass (LocalEndPoint uses the default single-element resolveAll() via the interface default).
…VER-201) newControlReconnectionQueryPlan() now creates copies of the original contact-point nodes (with their unresolved hostname endpoints) instead of synthetic nodes with resolved IPs. This ensures the control channel carries the hostname endpoint, which is preserved in metadata after topology refresh. DNS expansion for connection fallback is handled by ChannelFactory (PR scylladb#890), so the control-reconnection path does not need to inject resolved-IP nodes into the query plan. Also adds getContactPoints() stub back to LoadBalancingPolicyWrapperTest so tests that cover the control-reconnect path continue to pass.
Before-init query plan now uses getContactPoints() (original unresolved hostname nodes) instead of getResolvedContactPoints(). The DNS expansion to all IPs happens at the ChannelFactory level (PR scylladb#890), so expanding here was redundant and broke should_connect_with_mocked_hostname by replacing hostname endpoints with resolved-IP endpoints. Also remove the should_connect_when_first_dns_entry_is_non_responsive integration test from this PR; it belongs in PR scylladb#890 where ChannelFactory expansion actually enables it to pass.
There was a problem hiding this comment.
Pull request overview
Part 2/2 of DRIVER-201: extends the EndPoint API and ChannelFactory so that a hostname mapping to multiple IPs is tried address-by-address at the connection layer, instead of only the first IP. The EndPoint.resolve() method is deprecated in favor of a new resolveAll() default method; DefaultEndPoint, SniEndPoint, and ClientRoutesEndPoint override it; ChannelFactory.connect() now iterates over candidates and only fails when all are exhausted, while keeping protocol-version downgrade scoped to a single address.
Changes:
- Add
EndPoint.resolveAll()(default impl delegating to deprecatedresolve()); override inDefaultEndPoint,SniEndPoint,ClientRoutesEndPoint. - Rework
ChannelFactory.connect()intotryNextCandidate/connectToAddressso per-address failures fall back to the next IP while protocol-version downgrades stay scoped to one address. - Add unit tests for
DefaultEndPoint.resolveAll()and a newSniEndPointTest.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| core/src/main/java/com/datastax/oss/driver/api/core/metadata/EndPoint.java | Deprecates resolve(); adds default resolveAll() method. |
| core/src/main/java/com/datastax/oss/driver/internal/core/metadata/DefaultEndPoint.java | Overrides resolveAll() using InetAddress.getAllByName with single-address fallback. |
| core/src/main/java/com/datastax/oss/driver/internal/core/metadata/SniEndPoint.java | Overrides resolveAll() returning one address per sorted A-record. |
| core/src/main/java/com/datastax/oss/driver/internal/core/metadata/ClientRoutesEndPoint.java | Overrides resolveAll() to wrap the single topology-monitor address. |
| core/src/main/java/com/datastax/oss/driver/internal/core/channel/ChannelFactory.java | Adds candidate-iteration and per-address protocol-negotiation methods. |
| core/src/test/java/com/datastax/oss/driver/internal/core/metadata/DefaultEndPointTest.java | New tests for resolveAll() (resolved, unresolved expansion, unresolvable fallback). |
| core/src/test/java/com/datastax/oss/driver/internal/core/metadata/SniEndPointTest.java | New test class covering SNI resolveAll() happy path, unresolvable host, and resolve() sanity check. |
Comments suppressed due to low confidence (1)
core/src/main/java/com/datastax/oss/driver/internal/core/channel/ChannelFactory.java:303
- When
connectToAddressfails withUnsupportedProtocolVersionException.forNegotiation(i.e. all protocol downgrades exhausted),tryNextCandidatewill treat this like any other per-address failure and try the next IP, even though the protocol-negotiation failure is a server-wide condition that will recur on every other IP of the same node. This also reuses the sharedattemptedVersionsCopyOnWriteArrayListacross candidates, so on each subsequent address the downgrade loop re-attempts the same protocol versions and adds duplicate entries, and the final exception ultimately reported will list each version multiple times. Consider distinguishing non-address-specific failures (UnsupportedProtocolVersionException, authentication errors, etc.) and short-circuiting the candidate loop in those cases.
perAddressFuture.whenComplete(
(channel, error) -> {
if (error == null) {
resultFuture.complete(channel);
} else if (index + 1 < candidates.length) {
LOG.debug(
"[{}] Failed to connect to {} ({}), trying next address",
logPrefix,
candidate,
error.getMessage());
tryNextCandidate(
endPoint,
shardingInfo,
shardId,
options,
nodeMetricUpdater,
currentVersion,
isNegotiating,
attemptedVersions,
resultFuture,
candidates,
index + 1);
} else {
// Note: might be completed already if the failure happened in initializer()
resultFuture.completeExceptionally(error);
}
});
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SocketAddress[] candidates; | ||
| try { | ||
| resolvedAddress = endPoint.resolve(); | ||
| candidates = endPoint.resolveAll(); | ||
| } catch (Exception e) { | ||
| resultFuture.completeExceptionally(e); | ||
| return; | ||
| } | ||
|
|
||
| tryNextCandidate( | ||
| endPoint, | ||
| shardingInfo, | ||
| shardId, | ||
| options, | ||
| nodeMetricUpdater, | ||
| currentVersion, | ||
| isNegotiating, | ||
| attemptedVersions, | ||
| resultFuture, | ||
| candidates, | ||
| 0); | ||
| } |
| public SocketAddress[] resolveAll() { | ||
| if (!address.isUnresolved()) { | ||
| return new SocketAddress[] {address}; | ||
| } | ||
| try { | ||
| InetAddress[] all = InetAddress.getAllByName(address.getHostString()); | ||
| SocketAddress[] result = new SocketAddress[all.length]; | ||
| for (int i = 0; i < all.length; i++) { | ||
| result[i] = new InetSocketAddress(all[i], address.getPort()); | ||
| } | ||
| return result; | ||
| } catch (UnknownHostException e) { | ||
| // Fallback: return the single unresolved address; the connect attempt will fail with a | ||
| // descriptive error rather than silently returning an empty array. | ||
| return new SocketAddress[] {address}; | ||
| } | ||
| } |
| * @deprecated Use {@link #resolveAll()} instead. When a hostname maps to multiple IPs (e.g. in | ||
| * dynamic DNS environments) only one address is returned here, causing the driver to miss | ||
| * fallback IPs when the first one is unreachable. {@code resolveAll()} returns the full set. | ||
| */ | ||
| @Deprecated |
| public void resolve_all_returns_all_proxy_addresses_for_resolvable_hostname() { | ||
| // localhost reliably resolves to at least one address | ||
| SniEndPoint endPoint = | ||
| new SniEndPoint(new InetSocketAddress("localhost", 9042), "test-server-name"); | ||
| SocketAddress[] all = endPoint.resolveAll(); | ||
| assertThat(all).isNotEmpty(); | ||
| for (SocketAddress addr : all) { | ||
| InetSocketAddress inet = (InetSocketAddress) addr; | ||
| assertThat(inet.isUnresolved()).isFalse(); | ||
| assertThat(inet.getPort()).isEqualTo(9042); | ||
| } | ||
| } |
Problem
EndPoint.resolve()returns a singleSocketAddress. When a hostname maps to multiple IPs, the driver can only try the first one at the connection layer. If that IP is unreachable the driver fails withAllNodesFailedExceptioneven though other IPs are available.Fixes DRIVER-201 at the general connection layer.
Changes
EndPointinterfaceresolve()is now@Deprecated.resolveAll()default method returnsSocketAddress[]. The default implementation wrapsresolve()for backward compatibility with third-party implementations.DefaultEndPointresolveAll(): for unresolved addresses callsInetAddress.getAllByName()and returns oneInetSocketAddressper IP. Falls back to a single-element array (the unresolved address) if DNS fails, so the connect attempt surfaces a descriptive error.SniEndPointresolveAll(): re-resolves the proxy hostname on each call, sorts all A-records by IP, and returns oneInetSocketAddressper record — enabling the driver to try each proxy IP in sequence.ClientRoutesEndPointresolveAll(): wraps the single topology-monitor-resolved address in a one-element array (single-address by design).ChannelFactoryconnect()now callsendPoint.resolveAll()instead ofendPoint.resolve().tryNextCandidate()iterates through the returned array; on per-address failure it logs and tries the next; only fails the overallresultFuturewhen all candidates are exhausted.connectToAddress()scopes protocol-version negotiation (downgrade retries) to a single address, which is semantically correct.Tests
DefaultEndPointTest: 3 new cases — already-resolved passthrough, unresolved hostname expansion, unresolvable hostname fallback.SniEndPointTest: new class coveringresolveAll()happy path, unresolvable host exception, andresolve()sanity check.ChannelFactorytests pass unchanged (LocalEndPointuses the default single-elementresolveAll()via the interface default).