Skip to content

feat: add EndPoint.resolveAll() for multi-address DNS expansion (DRIVER-201) — Part 2/2#890

Draft
nikagra wants to merge 1 commit into
scylladb:scylla-4.xfrom
nikagra:fix/DRIVER-201-endpoint-resolve-all
Draft

feat: add EndPoint.resolveAll() for multi-address DNS expansion (DRIVER-201) — Part 2/2#890
nikagra wants to merge 1 commit into
scylladb:scylla-4.xfrom
nikagra:fix/DRIVER-201-endpoint-resolve-all

Conversation

@nikagra
Copy link
Copy Markdown

@nikagra nikagra commented May 15, 2026

Problem

EndPoint.resolve() returns a single SocketAddress. When a hostname maps to multiple IPs, the driver can only try the first one at the connection layer. If that IP is unreachable the driver fails with AllNodesFailedException even though other IPs are available.

Fixes DRIVER-201 at the general connection layer.

Note: This is part 2 of 2. Part 1 (#889) fixes the initial contact endpoints by expanding hostnames in the load-balancing query plan. This PR fixes the underlying EndPoint API and ChannelFactory so that any connection attempt — pool connections, reconnections, cloud SNI — benefits from multi-address fallback.

Changes

EndPoint interface

  • resolve() is now @Deprecated.
  • New resolveAll() default method returns SocketAddress[]. The default implementation wraps resolve() for backward compatibility with third-party implementations.

DefaultEndPoint

  • Overrides resolveAll(): for unresolved addresses calls InetAddress.getAllByName() and returns one InetSocketAddress per IP. Falls back to a single-element array (the unresolved address) if DNS fails, so the connect attempt surfaces a descriptive error.

SniEndPoint

  • Overrides resolveAll(): re-resolves the proxy hostname on each call, sorts all A-records by IP, and returns one InetSocketAddress per record — enabling the driver to try each proxy IP in sequence.

ClientRoutesEndPoint

  • Overrides resolveAll(): wraps the single topology-monitor-resolved address in a one-element array (single-address by design).

ChannelFactory

  • connect() now calls endPoint.resolveAll() instead of endPoint.resolve().
  • New tryNextCandidate() iterates through the returned array; on per-address failure it logs and tries the next; only fails the overall resultFuture when all candidates are exhausted.
  • New connectToAddress() scopes protocol-version negotiation (downgrade retries) to a single address, which is semantically correct.

Tests

  • DefaultEndPointTest: 3 new cases — already-resolved passthrough, unresolved hostname expansion, unresolvable hostname fallback.
  • SniEndPointTest: new class covering resolveAll() happy path, unresolvable host exception, and resolve() sanity check.
  • All 13 existing ChannelFactory tests pass unchanged (LocalEndPoint uses the default single-element resolveAll() via the interface default).

…ER-201)

Addresses the endpoint-API aspect of DRIVER-201.

Problem: EndPoint.resolve() returns a single SocketAddress. When a
hostname maps to multiple IPs, the driver can only try the first one
and fails with AllNodesFailedException if it is unreachable — the
remaining IPs are invisible to the connection layer.

Solution (per @dkropachev's architectural direction):
- Deprecate EndPoint.resolve(). Add EndPoint.resolveAll() with a default
  implementation that wraps resolve() in a single-element array for
  backward compatibility with third-party implementations.
- DefaultEndPoint.resolveAll(): if the stored InetSocketAddress is
  unresolved, calls InetAddress.getAllByName() to expand the hostname
  to all known IPs, returning one InetSocketAddress per IP. Falls back
  to the single-element unresolved address if DNS fails, so the connect
  attempt surfaces a descriptive error rather than returning empty.
- SniEndPoint.resolveAll(): re-resolves the proxy hostname on each call
  and returns all A-records sorted by IP, enabling the caller to try
  each proxy address in sequence.
- ClientRoutesEndPoint.resolveAll(): delegates to resolve() (single-
  address topology-monitor lookup) and wraps in a one-element array.
- ChannelFactory.connect(): replaced endPoint.resolve() with
  endPoint.resolveAll(). Iterates through the returned candidates via
  tryNextCandidate(); on per-address failure logs and tries the next;
  only fails the overall resultFuture when all candidates are exhausted.
  Protocol-version negotiation (downgrade retries) is scoped to the
  same address via connectToAddress(), which is semantically correct.

Tests:
- DefaultEndPointTest: 3 new cases — already-resolved passthrough,
  unresolved hostname expansion, unresolvable hostname fallback.
- SniEndPointTest: new class with cases for resolveAll() happy path,
  unresolvable host exception, and resolve() sanity check.
- All 13 existing ChannelFactory tests continue to pass (LocalEndPoint
  uses the default single-element resolveAll() via the interface default).
@nikagra nikagra marked this pull request as draft May 15, 2026 18:18
nikagra added a commit to nikagra/java-driver that referenced this pull request May 15, 2026
…VER-201)

newControlReconnectionQueryPlan() now creates copies of the original
contact-point nodes (with their unresolved hostname endpoints) instead
of synthetic nodes with resolved IPs. This ensures the control channel
carries the hostname endpoint, which is preserved in metadata after
topology refresh.

DNS expansion for connection fallback is handled by ChannelFactory
(PR scylladb#890), so the control-reconnection path does not need to inject
resolved-IP nodes into the query plan.

Also adds getContactPoints() stub back to LoadBalancingPolicyWrapperTest
so tests that cover the control-reconnect path continue to pass.
nikagra added a commit to nikagra/java-driver that referenced this pull request May 15, 2026
Before-init query plan now uses getContactPoints() (original unresolved
hostname nodes) instead of getResolvedContactPoints(). The DNS expansion
to all IPs happens at the ChannelFactory level (PR scylladb#890), so expanding
here was redundant and broke should_connect_with_mocked_hostname by
replacing hostname endpoints with resolved-IP endpoints.

Also remove the should_connect_when_first_dns_entry_is_non_responsive
integration test from this PR; it belongs in PR scylladb#890 where ChannelFactory
expansion actually enables it to pass.
@nikagra nikagra requested a review from Copilot May 19, 2026 23:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Part 2/2 of DRIVER-201: extends the EndPoint API and ChannelFactory so that a hostname mapping to multiple IPs is tried address-by-address at the connection layer, instead of only the first IP. The EndPoint.resolve() method is deprecated in favor of a new resolveAll() default method; DefaultEndPoint, SniEndPoint, and ClientRoutesEndPoint override it; ChannelFactory.connect() now iterates over candidates and only fails when all are exhausted, while keeping protocol-version downgrade scoped to a single address.

Changes:

  • Add EndPoint.resolveAll() (default impl delegating to deprecated resolve()); override in DefaultEndPoint, SniEndPoint, ClientRoutesEndPoint.
  • Rework ChannelFactory.connect() into tryNextCandidate / connectToAddress so per-address failures fall back to the next IP while protocol-version downgrades stay scoped to one address.
  • Add unit tests for DefaultEndPoint.resolveAll() and a new SniEndPointTest.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
core/src/main/java/com/datastax/oss/driver/api/core/metadata/EndPoint.java Deprecates resolve(); adds default resolveAll() method.
core/src/main/java/com/datastax/oss/driver/internal/core/metadata/DefaultEndPoint.java Overrides resolveAll() using InetAddress.getAllByName with single-address fallback.
core/src/main/java/com/datastax/oss/driver/internal/core/metadata/SniEndPoint.java Overrides resolveAll() returning one address per sorted A-record.
core/src/main/java/com/datastax/oss/driver/internal/core/metadata/ClientRoutesEndPoint.java Overrides resolveAll() to wrap the single topology-monitor address.
core/src/main/java/com/datastax/oss/driver/internal/core/channel/ChannelFactory.java Adds candidate-iteration and per-address protocol-negotiation methods.
core/src/test/java/com/datastax/oss/driver/internal/core/metadata/DefaultEndPointTest.java New tests for resolveAll() (resolved, unresolved expansion, unresolvable fallback).
core/src/test/java/com/datastax/oss/driver/internal/core/metadata/SniEndPointTest.java New test class covering SNI resolveAll() happy path, unresolvable host, and resolve() sanity check.
Comments suppressed due to low confidence (1)

core/src/main/java/com/datastax/oss/driver/internal/core/channel/ChannelFactory.java:303

  • When connectToAddress fails with UnsupportedProtocolVersionException.forNegotiation (i.e. all protocol downgrades exhausted), tryNextCandidate will treat this like any other per-address failure and try the next IP, even though the protocol-negotiation failure is a server-wide condition that will recur on every other IP of the same node. This also reuses the shared attemptedVersions CopyOnWriteArrayList across candidates, so on each subsequent address the downgrade loop re-attempts the same protocol versions and adds duplicate entries, and the final exception ultimately reported will list each version multiple times. Consider distinguishing non-address-specific failures (UnsupportedProtocolVersionException, authentication errors, etc.) and short-circuiting the candidate loop in those cases.
    perAddressFuture.whenComplete(
        (channel, error) -> {
          if (error == null) {
            resultFuture.complete(channel);
          } else if (index + 1 < candidates.length) {
            LOG.debug(
                "[{}] Failed to connect to {} ({}), trying next address",
                logPrefix,
                candidate,
                error.getMessage());
            tryNextCandidate(
                endPoint,
                shardingInfo,
                shardId,
                options,
                nodeMetricUpdater,
                currentVersion,
                isNegotiating,
                attemptedVersions,
                resultFuture,
                candidates,
                index + 1);
          } else {
            // Note: might be completed already if the failure happened in initializer()
            resultFuture.completeExceptionally(error);
          }
        });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +222 to +242
SocketAddress[] candidates;
try {
resolvedAddress = endPoint.resolve();
candidates = endPoint.resolveAll();
} catch (Exception e) {
resultFuture.completeExceptionally(e);
return;
}

tryNextCandidate(
endPoint,
shardingInfo,
shardId,
options,
nodeMetricUpdater,
currentVersion,
isNegotiating,
attemptedVersions,
resultFuture,
candidates,
0);
}
Comment on lines +61 to +77
public SocketAddress[] resolveAll() {
if (!address.isUnresolved()) {
return new SocketAddress[] {address};
}
try {
InetAddress[] all = InetAddress.getAllByName(address.getHostString());
SocketAddress[] result = new SocketAddress[all.length];
for (int i = 0; i < all.length; i++) {
result[i] = new InetSocketAddress(all[i], address.getPort());
}
return result;
} catch (UnknownHostException e) {
// Fallback: return the single unresolved address; the connect attempt will fail with a
// descriptive error rather than silently returning an empty array.
return new SocketAddress[] {address};
}
}
Comment on lines +39 to +43
* @deprecated Use {@link #resolveAll()} instead. When a hostname maps to multiple IPs (e.g. in
* dynamic DNS environments) only one address is returned here, causing the driver to miss
* fallback IPs when the first one is unreachable. {@code resolveAll()} returns the full set.
*/
@Deprecated
Comment on lines +30 to +41
public void resolve_all_returns_all_proxy_addresses_for_resolvable_hostname() {
// localhost reliably resolves to at least one address
SniEndPoint endPoint =
new SniEndPoint(new InetSocketAddress("localhost", 9042), "test-server-name");
SocketAddress[] all = endPoint.resolveAll();
assertThat(all).isNotEmpty();
for (SocketAddress addr : all) {
InetSocketAddress inet = (InetSocketAddress) addr;
assertThat(inet.isUnresolved()).isFalse();
assertThat(inet.getPort()).isEqualTo(9042);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants