Skip to content

fix: expand contact point hostnames to all DNS IPs at connection time (DRIVER-201) — Part 1/2#889

Draft
nikagra wants to merge 3 commits into
scylladb:scylla-4.xfrom
nikagra:fix/DRIVER-201-contact-point-dns-expansion
Draft

fix: expand contact point hostnames to all DNS IPs at connection time (DRIVER-201) — Part 1/2#889
nikagra wants to merge 3 commits into
scylladb:scylla-4.xfrom
nikagra:fix/DRIVER-201-contact-point-dns-expansion

Conversation

@nikagra
Copy link
Copy Markdown

@nikagra nikagra commented May 15, 2026

Problem

When RESOLVE_CONTACT_POINTS=false (the default), a contact point hostname was stored as a single unresolved InetSocketAddress. At connection time the load-balancing query plan contained exactly one Node per hostname, so only the first IP returned by DNS was ever tried. If that IP was non-responsive the driver raised AllNodesFailedException with no fallback to other IPs the hostname might resolve to.

This is particularly impactful in dynamic DNS environments where a hostname maps to multiple nodes and the first one may be temporarily unavailable.

Fixes DRIVER-201.

Note: This is part 1 of 2, implementing the initial contact endpoints fix per @dkropachev's architectural direction. Part 2 will expand the EndPoint API with resolveAll() to address the broader connection path.

Changes

DefaultDriverOption / TypedDriverOption

  • RESOLVE_CONTACT_POINTS is now @Deprecated. Contact points are always kept as unresolved hostnames; the option is a no-op.

SessionBuilder

  • Removed the RESOLVE_CONTACT_POINTS read; ContactPoints.merge() is now always called with resolve=false, deferring all DNS expansion to connection time.

MetadataManager

  • New getResolvedContactPoints() method: for each contact point backed by an unresolved hostname, calls InetAddress.getAllByName() to obtain all IPs and creates a synthetic DefaultNode for each. Already-resolved and non-InetSocketAddress endpoints are returned as-is.

LoadBalancingPolicyWrapper

  • newQueryPlan() (in BEFORE_INIT/DURING_INIT states) and newControlReconnectionQueryPlan() now call getResolvedContactPoints() instead of getContactPoints(), so the query plan contains one entry per resolved IP. The driver naturally falls back to the next candidate if one is unreachable.

Tests

  • 4 new MetadataManagerTest cases covering: null state, already-resolved passthrough, single-hostname DNS expansion, multi-endpoint expansion.
  • LoadBalancingPolicyWrapperTest updated to stub getResolvedContactPoints().
  • New MockResolverIT.should_connect_when_first_dns_entry_is_non_responsive: 2-node CCM cluster on 127.0.1.x; first DNS entry points to a non-existent IP (127.0.1.11); asserts session opens successfully against the real nodes.

… (DRIVER-201)

Addresses the initial-contact-endpoints aspect of DRIVER-201.

Problem: with RESOLVE_CONTACT_POINTS=false (the default), a contact
point hostname was stored as a single unresolved InetSocketAddress.
At connection time the load-balancing query plan contained exactly one
Node per hostname, so only the first IP returned by DNS was ever tried.
If that IP was non-responsive the driver raised AllNodesFailedException
with no fallback to other IPs the hostname might resolve to.

Solution (per @dkropachev's architectural direction):
- Deprecate DefaultDriverOption.RESOLVE_CONTACT_POINTS. Contact points
  are now always kept as unresolved hostnames (resolve=false is hardcoded
  in SessionBuilder), deferring DNS expansion to connection time.
- Add MetadataManager.getResolvedContactPoints(): for each contact point
  backed by an unresolved hostname it calls InetAddress.getAllByName() to
  expand the hostname to all known IPs, creating a synthetic DefaultNode
  for each IP. Already-resolved or non-InetSocketAddress endpoints pass
  through unchanged.
- LoadBalancingPolicyWrapper now calls getResolvedContactPoints() instead
  of getContactPoints() in newQueryPlan() (BEFORE/DURING_INIT states) and
  newControlReconnectionQueryPlan(), so the query plan contains one node
  per resolved IP and the driver naturally falls back to the next IP when
  one is unreachable.

Tests:
- 4 new MetadataManagerTest cases covering null state, already-resolved
  passthrough, single-hostname expansion, and multi-endpoint expansion.
- LoadBalancingPolicyWrapperTest updated to stub getResolvedContactPoints().
- New MockResolverIT.should_connect_when_first_dns_entry_is_non_responsive
  integration test: first DNS entry is a non-existent IP, session must
  open successfully against the remaining real IPs.
nikagra added 2 commits May 15, 2026 20:37
…VER-201)

newControlReconnectionQueryPlan() now creates copies of the original
contact-point nodes (with their unresolved hostname endpoints) instead
of synthetic nodes with resolved IPs. This ensures the control channel
carries the hostname endpoint, which is preserved in metadata after
topology refresh.

DNS expansion for connection fallback is handled by ChannelFactory
(PR scylladb#890), so the control-reconnection path does not need to inject
resolved-IP nodes into the query plan.

Also adds getContactPoints() stub back to LoadBalancingPolicyWrapperTest
so tests that cover the control-reconnect path continue to pass.
Before-init query plan now uses getContactPoints() (original unresolved
hostname nodes) instead of getResolvedContactPoints(). The DNS expansion
to all IPs happens at the ChannelFactory level (PR scylladb#890), so expanding
here was redundant and broke should_connect_with_mocked_hostname by
replacing hostname endpoints with resolved-IP endpoints.

Also remove the should_connect_when_first_dns_entry_is_non_responsive
integration test from this PR; it belongs in PR scylladb#890 where ChannelFactory
expansion actually enables it to pass.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to address DRIVER-201 by ensuring a hostname contact point can be tried via all DNS-resolved IPs (instead of only the first), and begins deprecating RESOLVE_CONTACT_POINTS as a no-op to preserve unresolved hostnames until connection time.

Changes:

  • Deprecates RESOLVE_CONTACT_POINTS in DefaultDriverOption / TypedDriverOption and stops reading it in SessionBuilder (always merges contact points with resolve=false).
  • Adds MetadataManager#getResolvedContactPoints() plus unit tests that validate DNS expansion / passthrough behavior.
  • Updates comments/tests around contact-point handling in load-balancing wrapper tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
core/src/main/java/com/datastax/oss/driver/internal/core/metadata/MetadataManager.java Adds getResolvedContactPoints() that expands unresolved hostname contact points to all DNS IPs.
core/src/main/java/com/datastax/oss/driver/internal/core/metadata/LoadBalancingPolicyWrapper.java Adjusts comments around contact point handling (but currently still uses getContactPoints() in query plan construction).
core/src/main/java/com/datastax/oss/driver/api/core/session/SessionBuilder.java Stops honoring RESOLVE_CONTACT_POINTS; always stores contact points as unresolved.
core/src/main/java/com/datastax/oss/driver/api/core/config/TypedDriverOption.java Marks RESOLVE_CONTACT_POINTS deprecated with updated Javadoc.
core/src/main/java/com/datastax/oss/driver/api/core/config/DefaultDriverOption.java Marks RESOLVE_CONTACT_POINTS deprecated with updated Javadoc.
core/src/test/java/com/datastax/oss/driver/internal/core/metadata/MetadataManagerTest.java Adds tests for getResolvedContactPoints() behavior (null state, passthrough, DNS expansion).
core/src/test/java/com/datastax/oss/driver/internal/core/metadata/LoadBalancingPolicyWrapperTest.java Updates an assertion comment related to appended contact points.
Comments suppressed due to low confidence (1)

core/src/main/java/com/datastax/oss/driver/internal/core/metadata/LoadBalancingPolicyWrapper.java:176

  • The new comment says DNS expansion is handled by ChannelFactory at connection time, but ChannelFactory.connect() currently just calls endPoint.resolve() once and bootstrap.connect(resolvedAddress), which won’t iterate over all A/AAAA records for an unresolved InetSocketAddress. Either update this logic to actually use getResolvedContactPoints() here, or implement/point to the connection-time DNS expansion in ChannelFactory; as-is, the comment (and intended fallback behavior) is incorrect.
      // Use the original (potentially unresolved) contact-point endpoints so that the control
      // connection channel retains the hostname, preserving hostname-based node identity in
      // metadata. DNS expansion to all IPs for each hostname is handled by ChannelFactory at
      // actual connection time.
      Set<DefaultNode> originalNodes = context.getMetadataManager().getContactPoints();

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

case BEFORE_INIT:
case DURING_INIT:
// The contact points are not stored in the metadata yet:
List<Node> nodes = new ArrayList<>(context.getMetadataManager().getContactPoints());
Comment on lines +938 to +941
// RESOLVE_CONTACT_POINTS is deprecated: contact points are always kept as unresolved
// hostnames and expanded to all their DNS IPs lazily at connection time.
Set<EndPoint> contactPoints =
ContactPoints.merge(programmaticContactPoints, configContactPoints, resolveAddresses);
ContactPoints.merge(programmaticContactPoints, configContactPoints, false);
assertThat(queryPlan.poll()).isEqualTo(node1);
// Remaining nodes are contact points appended at the end.
// They are new DefaultNode instances created via newContactPoint, so compare by endpoint.
// Remaining nodes are the resolved contact points appended at the end.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants