Skip to content
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
00b8d4b
Add design doc: Result Set Heartbeat / Keep-Alive
gopalldb Apr 22, 2026
6130304
Implement result set heartbeat / keep-alive (PECOBLR-2321)
gopalldb Apr 22, 2026
d3f0c47
Fix: don't exclude metadata results from heartbeat eligibility
gopalldb Apr 22, 2026
cedf144
Skip heartbeat for direct results (CLOSED state)
gopalldb Apr 22, 2026
5d9aadc
Document async execution heartbeat policy + update eligibility table
gopalldb Apr 22, 2026
fa92347
Add changelog entry for result set heartbeat feature
gopalldb Apr 22, 2026
291be6c
Add heartbeat eligibility tests + skip async PENDING/RUNNING
gopalldb Apr 22, 2026
fe84cc4
Fix thread-safety and robustness issues in heartbeat
gopalldb Apr 22, 2026
53db645
Address should-fix review feedback
gopalldb Apr 22, 2026
994dbc2
Address test gaps + add DEBUG log on heartbeat start
gopalldb Apr 22, 2026
a51bccc
Add e2e integration test for heartbeat against real warehouse
gopalldb Apr 22, 2026
723ce06
Fix all critical and high-severity heartbeat review findings
gopalldb May 11, 2026
7534523
update merge conflict
gopalldb May 11, 2026
d24d62a
Address additional heartbeat review feedback
gopalldb May 11, 2026
7631ee3
Add missing heartbeat tests for concurrency, sentinel flag, and cance…
gopalldb May 11, 2026
469a459
Merge branch 'main' into design/heartbeat-keep-alive
gopalldb May 11, 2026
a037ae1
Proactive heartbeat stop when all data fetched from server
gopalldb May 12, 2026
b881d4c
Add tests for isAllDataFetched() across all implementations
gopalldb May 12, 2026
1910332
Merge branch 'main' into design/heartbeat-keep-alive
gopalldb May 12, 2026
a859117
Merge branch 'main' into design/heartbeat-keep-alive
gopalldb May 12, 2026
458aa5e
Address review feedback on isAllDataFetched and NPE guards
gopalldb May 12, 2026
a40466a
Remove isAllDataFetched — heartbeat stops when next() returns false
gopalldb May 12, 2026
832a1c8
Add coverage tests for heartbeat config and checkStatementAlive
gopalldb May 12, 2026
fc46126
Add more coverage tests to push past 85% threshold
gopalldb May 12, 2026
9798aa5
Merge branch 'main' into design/heartbeat-keep-alive
gopalldb May 15, 2026
0a46f52
Address May 12 review feedback: hashCode, thread names, Throwable, un…
gopalldb May 15, 2026
fd4e90a
Downgrade unsupported heartbeat log from INFO to DEBUG
gopalldb May 15, 2026
75be9b7
Use lightweight /status endpoint for SEA heartbeat instead of full Ge…
gopalldb May 15, 2026
c4a190d
Merge remote-tracking branch 'upstream/main' into design/heartbeat-ke…
gopalldb May 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NEXT_CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## [Unreleased]

### Added
- Added result set heartbeat / keep-alive to prevent server-side result expiry during slow consumption. When enabled via `EnableHeartbeat=1`, the driver periodically polls `GetStatementStatus` (SEA) or `GetOperationStatus` (Thrift) to keep the operation alive while the client reads results. Configurable interval via `HeartbeatIntervalSeconds` (default 60s). Heartbeat automatically stops when results are fully consumed, ResultSet is closed, or the server returns a terminal state. Disabled by default due to cost implications (heartbeats keep the warehouse running).
- Added `CallableStatement` support with IN parameters. `Connection.prepareCall()` now returns a working `DatabricksCallableStatement` that supports positional parameter binding and execution via `{call proc(?)}` JDBC escape syntax. OUT/INOUT parameters and named parameters throw `SQLFeatureNotSupportedException`.
- Added AI coding agent detection to the User-Agent header. When the driver is invoked by a known AI coding agent (e.g. Claude Code, Cursor, Gemini CLI), `agent/<product>` is appended to the User-Agent string.

Expand Down
535 changes: 535 additions & 0 deletions docs/design/HEARTBEAT_KEEP_ALIVE.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ public class DatabricksConnection implements IDatabricksConnection, IDatabricksC
private final Set<IDatabricksStatementInternal> statementSet = ConcurrentHashMap.newKeySet();
private SQLWarning warnings = null;
private final IDatabricksConnectionContext connectionContext;
private final ResultHeartbeatManager heartbeatManager;

/**
* Creates an instance of Databricks connection for given connection context.
Expand All @@ -49,6 +50,7 @@ public DatabricksConnection(IDatabricksConnectionContext connectionContext)
this.connectionContext = connectionContext;
DatabricksThreadContextHolder.setConnectionContext(connectionContext);
this.session = new DatabricksSession(connectionContext);
this.heartbeatManager = createHeartbeatManager(connectionContext);
}

@VisibleForTesting
Expand All @@ -58,10 +60,27 @@ public DatabricksConnection(
this.connectionContext = connectionContext;
DatabricksThreadContextHolder.setConnectionContext(connectionContext);
this.session = new DatabricksSession(connectionContext, testDatabricksClient);
this.heartbeatManager = createHeartbeatManager(connectionContext);
UserAgentManager.setUserAgent(connectionContext);
TelemetryHelper.updateTelemetryAppName(connectionContext, null);
}

private static ResultHeartbeatManager createHeartbeatManager(
IDatabricksConnectionContext connectionContext) {
if (connectionContext instanceof DatabricksConnectionContext) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] instanceof DatabricksConnectionContext silently disables heartbeat for any other context impl

isHeartbeatEnabled() and getHeartbeatIntervalSeconds() live on the concrete class DatabricksConnectionContext, not on the IDatabricksConnectionContext interface. Any test mock, test double, or alternate implementation of IDatabricksConnectionContext falls through to return null — heartbeat silently disabled.

This pattern also makes the feature impossible to enable from any future context implementation (e.g., a wrapped/decorated context for telemetry or testing) without modifying this exact instanceof check.

Fix: Add the two methods to IDatabricksConnectionContext with default impls and drop the instanceof:

// IDatabricksConnectionContext.java
default boolean isHeartbeatEnabled() { return false; }
default int getHeartbeatIntervalSeconds() { return 60; }

Then this method becomes:

private static ResultHeartbeatManager createHeartbeatManager(IDatabricksConnectionContext ctx) {
  if (ctx.isHeartbeatEnabled()) {
    return new ResultHeartbeatManager(ctx.getHeartbeatIntervalSeconds());
  }
  return null;
}

DatabricksConnectionContext ctx = (DatabricksConnectionContext) connectionContext;
if (ctx.isHeartbeatEnabled()) {
return new ResultHeartbeatManager(ctx.getHeartbeatIntervalSeconds());
}
}
return null;
}

/** Returns the heartbeat manager, or null if heartbeat is disabled. */
ResultHeartbeatManager getHeartbeatManager() {
return heartbeatManager;
}

@Override
public void open() throws SQLException {
this.session.open();
Expand Down Expand Up @@ -420,6 +439,9 @@ public void close() throws SQLException {
statement.close(false);
statementSet.remove(statement);
}
if (heartbeatManager != null) {
heartbeatManager.shutdown();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] heartbeatManager.shutdown() is skipped if any statement.close() throws — scheduler + thread leak

for (IDatabricksStatementInternal statement : statementSet) {
  statement.close(false);          // makes RPCs — can throw
  statementSet.remove(statement);
}
if (heartbeatManager != null) {
  heartbeatManager.shutdown();     // never reached on throw above
}

statement.close(false) issues a closeStatement RPC — any network/server error throws SQLException out of this loop. The heartbeatManager.shutdown() and session.close() calls below it are skipped, leaking:

  • The ScheduledExecutorService daemon thread (yes, daemon — but still leaks until JVM exit)
  • All scheduled futures and references they hold (see the this-capture issue on DatabricksResultSet.java:330-376)

Fix: Wrap in try/finally so heartbeatManager.shutdown() always runs. Also catch per-statement exceptions so the loop completes:

try {
  for (IDatabricksStatementInternal statement : statementSet) {
    try { statement.close(false); } catch (Exception e) {
      LOGGER.warn("Error closing statement: {}", e.getMessage());
    }
    statementSet.remove(statement);
  }
} finally {
  if (heartbeatManager != null) {
    heartbeatManager.shutdown();
  }
}

}
this.session.close();
TelemetryClientFactory.getInstance().closeTelemetryClient(connectionContext);
DatabricksClientConfiguratorManager.getInstance().removeInstance(connectionContext);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -908,6 +908,26 @@ public boolean isTelemetryEnabled() {
return getParameter(DatabricksJdbcUrlParams.ENABLE_TELEMETRY).equals("1");
}

public boolean isHeartbeatEnabled() {
return getParameter(DatabricksJdbcUrlParams.ENABLE_HEARTBEAT).equals("1");
}

public int getHeartbeatIntervalSeconds() {
int interval =
Integer.parseInt(getParameter(DatabricksJdbcUrlParams.HEARTBEAT_INTERVAL_SECONDS));
if (interval <= 0) {
LOGGER.warn("HeartbeatIntervalSeconds must be positive, got {}. Using default 60.", interval);
return 60;
}
if (interval > 3600) {
LOGGER.warn(
"HeartbeatIntervalSeconds {} is very large (> 1 hour). "
+ "Heartbeat may not keep the operation alive.",
interval);
}
return interval;
}

@Override
public String getVolumeOperationAllowedPaths() {
return getParameter(
Expand Down
151 changes: 151 additions & 0 deletions src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import com.databricks.jdbc.common.Nullable;
import com.databricks.jdbc.common.StatementType;
import com.databricks.jdbc.common.util.WarningUtil;
import com.databricks.jdbc.dbclient.IDatabricksClient;
import com.databricks.jdbc.dbclient.impl.common.StatementId;
import com.databricks.jdbc.exception.DatabricksParsingException;
import com.databricks.jdbc.exception.DatabricksSQLException;
Expand Down Expand Up @@ -123,6 +124,7 @@ public DatabricksResultSet(
this.cachedTelemetryCollector = resolveTelemetryCollector(parentStatement);
this.isClosed = false;
this.wasNull = false;
startHeartbeatIfEnabled();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CRITICAL] Heartbeat never starts on Thrift result sets — feature is dead-on-arrival on the Thrift path

The Thrift constructor (this method, lines 153-196) does not call startHeartbeatIfEnabled(). Only the SEA constructor at line 127 does.

All Thrift result sets are constructed via DatabricksThriftAccessor (executeStatement, getStatementResult, etc.) using this constructor — so on a transportMode=thrift connection with EnableHeartbeat=1, the manager is created and the eligibility logic correctly returns true for THRIFT_INLINE / THRIFT_ARROW_ENABLED, but no heartbeat ever starts.

Per the design doc's eligibility table, Thrift inline (data only on cluster, server-evictable) is one of the most critical scenarios this feature is meant to cover. It's silently broken.

The eligibility tests in ResultSetHeartbeatEligibilityTest.testThriftInlineIsEligible / testThriftArrowIsEligible mock the instance via reflection and bypass the constructor entirely, so they pass while production reality is broken.

Fix: Add startHeartbeatIfEnabled(); at the end of this constructor (line 196). Add a real-constructor smoke test that builds a Thrift DatabricksResultSet via the production constructor and asserts mgr.getActiveHeartbeatCount() == 1.

}

@VisibleForTesting
Expand Down Expand Up @@ -283,18 +285,167 @@ public boolean next() throws SQLException {
cachedTelemetryCollector.recordResultSetIteration(
statementId.toSQLExecStatementId(), resultSetMetaData.getChunkCount(), hasNext);
}
if (!hasNext) {
stopHeartbeat();
}
return hasNext;
}

@Override
public void close() throws DatabricksSQLException {
stopHeartbeat();
isClosed = true;
this.executionResult.close();
if (parentStatement != null) {
parentStatement.handleResultSetClose(this);
}
}

/** Starts heartbeat polling if enabled on the connection and this result set is eligible. */
private void startHeartbeatIfEnabled() {
if (parentStatement == null || statementId == null) {
return;
}
if (!isHeartbeatEligible()) {
return;
}

try {
DatabricksConnection conn =
(DatabricksConnection) parentStatement.getStatement().getConnection();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CRITICAL] Pooled connections (HikariCP, DBCP, DatabricksPooledConnection) silently get NO heartbeat

This direct cast (DatabricksConnection) parentStatement.getStatement().getConnection() will throw ClassCastException for any pooled connection wrapper:

  • DatabricksPooledConnection returns a JDK dynamic Proxy declaring Connection.class, IDatabricksConnectionInternal.class (see DatabricksPooledConnection.java:155-158) — not DatabricksConnection.
  • HikariCP returns HikariProxyConnection; DBCP returns PoolGuardConnectionWrapper — same story.

The exception is swallowed by the outer catch (Exception e) { LOGGER.debug(...) } at line 384-386 (and again at line 401-402 for stopHeartbeat). Result: users opt in to EnableHeartbeat=1 on the most common Java connection pool deployment, get no protection, and see no error — just a DEBUG line they have to enable to find.

Fix (one of):

  1. connection.unwrap(DatabricksConnection.class) — works through the proxy via IDatabricksConnectionInternal.
  2. Add getHeartbeatManager() to IDatabricksConnectionInternal so the pool proxy forwards it transparently.

Option 2 is cleaner and matches how the rest of the driver handles pooled access.

ResultHeartbeatManager mgr = conn.getHeartbeatManager();
if (mgr == null) {
return; // heartbeat not enabled
}

IDatabricksClient client = conn.getSession().getDatabricksClient();
final int maxConsecutiveFailures = 10;
final java.util.concurrent.atomic.AtomicInteger consecutiveFailures =
new java.util.concurrent.atomic.AtomicInteger(0);
// Get the stopped flag from the manager — shared between the heartbeat task and
// stopHeartbeat(). Prevents RPC on a just-closed client/session: stopHeartbeat sets
// the flag before cancel(false), so an in-flight tick sees it and skips the RPC.
final java.util.concurrent.atomic.AtomicBoolean stopped = mgr.getStoppedFlag(statementId);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CRITICAL] Orphan stopped flag — heartbeat RPC never actually fires

The stopped flag is captured here at line 328 before mgr.startHeartbeat(...) is called at line 378. Inside ResultHeartbeatManager.startHeartbeat():

// ResultHeartbeatManager.java
void startHeartbeat(StatementId statementId, Runnable heartbeatTask) {
  ...
  stopHeartbeat(statementId);              // line 63 — REMOVES this flag from map AND sets it to true
  getStoppedFlag(statementId).set(false);  // line 66 — computeIfAbsent creates a NEW AtomicBoolean
  ...
}

So the AtomicBoolean captured by the closure here is the removed/orphaned one — permanently set to true. The new flag in the map (which mgr.stopHeartbeat(...) later mutates from DatabricksResultSet.stopHeartbeat, Statement.close, Connection.close) is invisible to the closure.

Net effect: every tick, if (stopped.get()) return; short-circuits → client.checkStatementAlive(statementId) is never called. The whole feature is non-functional.

The integration test only passes because warehouses don't actually expire results in 15s — so the absence of heartbeats isn't observed.

Fix options (any one):

  1. Capture the flag after mgr.startHeartbeat(...) returns.
  2. Reuse the same AtomicBoolean in startHeartbeat/stopHeartbeat (don't remove from the map — just set(true)/set(false)).
  3. Have the closure call mgr.getStoppedFlag(statementId).get() per tick instead of holding a captured reference.

Add a unit test that asserts client.checkStatementAlive is invoked at least once via the production wiring — currently no such test exists.


Runnable heartbeatTask =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CRITICAL] Lambda strong-captures this — abandoned ResultSet keeps warehouse alive forever

This lambda invokes stopHeartbeat() (instance method, line 342, 373) and reads statementId (instance field, lines 336/340/352/353/358/367/369). Both implicitly capture this — the entire DatabricksResultSet, including executionResult (Arrow buffers, chunk providers, potentially MB of cached row data).

The future is held in ResultHeartbeatManager.activeHeartbeats for the connection's lifetime. So:

  • A user that does stmt.executeQuery(...).next() once and abandons the ResultSet reference (a real-world bug, but a JDBC driver shouldn't amplify it) will:
    • Never trigger next()→false or close() (the only auto-stop paths)
    • Have the entire ResultSet and its data retained until Connection.close() — typically hours in pooled environments
    • Have the heartbeat poll forever, holding the warehouse open and accumulating cost
  • This is the exact "cost forever" failure mode the design doc Requirements §3 explicitly tries to prevent.
  • It is also a denial-of-service amplifier: an app opening 10k orphaned result sets per hour holds 10k Arrow batches in heap until Connection.close().

The C# ADBC reference avoids this: its poller is per-statement with linked cancellation, so even GC of the statement helps. The Java implementation here is connection-scoped, so GC of the ResultSet alone won't help — the future keeps a hard reference back to the ResultSet.

Fix: Don't capture this. Pull statementId and mgr (or just Runnable stopFn = () -> mgr.stopHeartbeat(localStatementId)) into locals so the lambda has no implicit this reference. Verify with javap -p -c (no synthetic this$0 field on the lambda class) or a simple unit test that holds a WeakReference<DatabricksResultSet> and asserts it's collectable after the strong reference is dropped.

() -> {
if (stopped.get()) {
return; // client/session may be closed, skip RPC
}
try {
boolean alive = client.checkStatementAlive(statementId);
consecutiveFailures.set(0); // reset on success
if (!alive) {
LOGGER.info(
"Heartbeat detected terminal state for statement {}, stopping", statementId);
stopped.set(true);
stopHeartbeat();
}
} catch (Exception e) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] catch (Exception) misses Error — silent heartbeat death without log or cleanup

The heartbeat lambda at line 374 catches Exception, not Throwable. Per the ScheduledExecutorService.scheduleWithFixedDelay javadoc:

If any execution of the task encounters an exception, subsequent executions are suppressed.

Error subclasses (OutOfMemoryError, NoClassDefFoundError, etc.) are not Exception subclasses, so they escape the catch — the scheduler then suppresses the recurring task.

Empirically demonstrated: a JUnit test wires a task that throws Error to ResultHeartbeatManager.startHeartbeat. After 3.5s with a 1s interval:

  • ticks = 1 (only one execution; the rest suppressed)
  • manager.getActiveHeartbeatCount() = 1 (entry leaked in activeHeartbeats)

The consequence is worse than swallowing exceptions: there's no consecutiveFailures increment, no max-failures WARN, no mgr.stopHeartbeat() cleanup. The heartbeat silently dies and the user has no idea their results may expire.

Fix:

} catch (Throwable t) {
  if (capturedMgr.getStoppedFlag(capturedStatementId).get()) return;
  // ... same failure-counter logic ...
  if (t instanceof Error && !(t instanceof VirtualMachineError)) {
    // log + stop cleanly; VirtualMachineError should still propagate
    capturedMgr.stopHeartbeat(capturedStatementId);
  }
  if (t instanceof VirtualMachineError) throw (VirtualMachineError) t;
}

(Or at minimum, change the catch to Throwable so the existing 10-strike path handles Error like any other failure.)

// If stopped was set during the RPC (connection closing), don't count as failure
if (stopped.get()) {
return;
}
int failures = consecutiveFailures.incrementAndGet();
if (failures == 1) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] Default SQLFeatureNotSupportedException retries 10× and emits misleading "results may expire" WARN

The default checkStatementAlive here throws SQLFeatureNotSupportedException("Heartbeat not supported by this client"). The heartbeat lambda in DatabricksResultSet.java:374-403 catches Exception (no instanceof short-circuit for the unsupported case) — so it counts each call as a transient failure.

Empirically verified with a JUnit test that builds a no-override IDatabricksClient via InvocationHandler.invokeDefault:

  • First call throws SQLFeatureNotSupportedException: "Heartbeat not supported by this client".
  • The heartbeat lambda body (grep on SQLFeatureNotSupportedException / instanceof SQL inside the lambda block) contains zero short-circuit — confirmed absent.

User-visible consequence: if anyone wires a custom IDatabricksClient impl without overriding checkStatementAlive, they get ~10 INFO log lines + 1 misleading WARN over ~10 min:

INFO  Heartbeat failed for statement <id> (first failure): Heartbeat not supported by this client
DEBUG Heartbeat failed for statement <id> (failure 2/10): Heartbeat not supported by this client
...
WARN  Heartbeat stopped for statement <id> after 10 consecutive failures.
      Server-side results may expire. Last error: Heartbeat not supported by this client

The WARN says "results may expire" — but the actual cause is a missing client-side override.

Fix options:

  1. Short-circuit in the lambda: treat SQLFeatureNotSupportedException as permanent → call mgr.stopHeartbeat(...) immediately, log a single WARN naming the offending class:
    catch (Exception e) {
      if (e instanceof SQLFeatureNotSupportedException) {
        LOGGER.warn("Heartbeat permanently disabled for statement {} — "
            + "client {} does not implement checkStatementAlive. "
            + "Set EnableHeartbeat=0 to silence.", capturedStatementId, client.getClass().getName());
        capturedMgr.stopHeartbeat(capturedStatementId);
        return;
      }
      // ... existing transient-failure logic ...
    }
  2. Improve the exception message: include this.getClass().getName() and a remediation hint pointing at EnableHeartbeat=0.

// First failure — log at INFO so users see the initial problem
LOGGER.info(
"Heartbeat failed for statement {} (first failure): {}",
statementId,
e.getMessage());
} else {
LOGGER.debug(
"Heartbeat failed for statement {} (failure {}/{}): {}",
statementId,
failures,
maxConsecutiveFailures,
e.getMessage());
}
if (failures >= maxConsecutiveFailures) {
// Terminal failure — log at WARN so it's visible in default log config
LOGGER.warn(
"Heartbeat stopped for statement {} after {} consecutive failures. "
+ "Server-side results may expire. Last error: {}",
statementId,
failures,
e.getMessage());
stopped.set(true);
stopHeartbeat();
}
}
};

mgr.startHeartbeat(statementId, heartbeatTask);
LOGGER.debug(
"Heartbeat started for statement {} (resultType={}, interval={}s)",
statementId,
resultSetType,
mgr.getIntervalSeconds());
} catch (Exception e) {
LOGGER.debug("Failed to start heartbeat: {}", e.getMessage());
}
}

/** Stops the heartbeat for this result set's statement. Idempotent. */
private void stopHeartbeat() {
if (parentStatement == null || statementId == null) {
return;
}
try {
DatabricksConnection conn =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[High] stopHeartbeat() uses raw (DatabricksConnection) cast — pooled connections silently leak heartbeats

Asymmetric with the C3 fix on the start path. startHeartbeatIfEnabled at lines 322-333 correctly uses instanceof + unwrap(DatabricksConnection.class) to handle HikariCP / DBCP / DatabricksPooledConnection proxies. But stopHeartbeat at lines 421-423 still has:

DatabricksConnection conn =
    (DatabricksConnection) parentStatement.getStatement().getConnection();

On any pooled connection this throws ClassCastException, swallowed silently by the surrounding catch (Exception) { LOGGER.debug(...) }.

Empirical verification:

  • JUnit test inspects this method body — confirms raw cast with no unwrap() fallback.
  • Heartbeats started successfully under a pool (via the start-path unwrap) are never stopped via next() returns false or ResultSet.close(). They only terminate when the physical connection's heartbeatManager.shutdown() runs at pool eviction — which in pooled environments can be hours.

Fix: Extract a private DatabricksConnection resolveDatabricksConnection() helper that mirrors the start-path unwrap logic and call it from both startHeartbeatIfEnabled and stopHeartbeat.

(DatabricksConnection) parentStatement.getStatement().getConnection();
ResultHeartbeatManager mgr = conn.getHeartbeatManager();
if (mgr != null) {
mgr.stopHeartbeat(statementId);
}
} catch (Exception e) {
LOGGER.debug("Failed to stop heartbeat: {}", e.getMessage());
}
}

/**
* Determines whether this result set is eligible for heartbeat polling. Package-visible for
* testing.
*
* <p>Heartbeat is NOT needed when:
*
* <ul>
* <li>No execution result (nothing to fetch, also covers async PENDING/RUNNING with no data)
* <li>SEA inline (InlineJsonResult): all rows loaded in memory at construction
* <li>Update count (DML): no result rows to keep alive
* <li>Direct results (CLOSED state): server already closed, data fully delivered
* <li>Async execution (PENDING/RUNNING): user controls polling via getExecutionResult()
* </ul>
*/
boolean isHeartbeatEligible() {
// No execution result — nothing to fetch
if (executionResult == null) {
return false;
}
// SEA inline — all data loaded in memory at construction
if (resultSetType == ResultSetType.SEA_INLINE) {
return false;
}
// Update count — no result rows
if (statementType == StatementType.UPDATE) {
return false;
}
// Check execution state
if (executionStatus != null) {
com.databricks.jdbc.api.ExecutionState state = executionStatus.getExecutionState();
// Direct results — server already closed
if (state == com.databricks.jdbc.api.ExecutionState.CLOSED) {
return false;
}
// Async execution — user controls polling
if (state == com.databricks.jdbc.api.ExecutionState.PENDING
|| state == com.databricks.jdbc.api.ExecutionState.RUNNING) {
return false;
}
}
return true;
}

private static TelemetryCollector resolveTelemetryCollector(
IDatabricksStatementInternal parentStatement) {
try {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,13 @@ public void close(boolean removeFromSession) throws DatabricksSQLException {
this.connection.closeStatement(this);
}
DatabricksThreadContextHolder.clearStatementInfo();
// Safety net: stop any heartbeat for this statement
if (statementId != null) {
ResultHeartbeatManager mgr = connection.getHeartbeatManager();
if (mgr != null) {
mgr.stopHeartbeat(statementId);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Statement.cancel() does not stop the heartbeat

This cancel() calls cancelStatement on the server but does not call mgr.stopHeartbeat(statementId). Only close() (line 175-181) and resetForNewExecution() (line 982-988) clear the heartbeat.

After cancel() returns, the heartbeat keeps polling against a cancelled operation. In the happy path the server returns CANCELED_STATE and the heartbeat task self-stops on the terminal-state check — fine. But if there's a race or "operation not found" before the server registers the cancel, those errors count as transient failures, churning the 10-strike counter and emitting WARN/INFO log noise for up to ~10 minutes after a successful cancel.

Fix: Add a heartbeat stop to cancel(), mirroring the pattern in close():

public void cancel() throws SQLException {
  ...
  if (statementId != null) {
    ResultHeartbeatManager mgr = connection.getHeartbeatManager();
    if (mgr != null) {
      mgr.stopHeartbeat(statementId);
    }
  }
  this.connection.getSession().getDatabricksClient().cancelStatement(statementId);
  ...
}

}
}
shutDownExecutor();
this.updateCount = -1;
this.isClosed = true;
Expand Down Expand Up @@ -672,6 +679,8 @@ public ResultSet executeAsync(String sql) throws SQLException {
LOGGER.debug("ResultSet executeAsync() for statement {%s}", sql);
checkIfClosed();

// No heartbeat during async wait — the user controls polling via getExecutionResult().
// Heartbeat starts later when the ResultSet is constructed (after getExecutionResult()).
resetForNewExecution();

IDatabricksClient client = connection.getSession().getDatabricksClient();
Expand Down Expand Up @@ -969,6 +978,16 @@ private void resetForNewExecution() {
// when the server returns unexpected responses (e.g., WireMock 404 in tests).
// For direct results, the server already closed the handle.

// Stop heartbeat for the previous execution before clearing state.
// Without this, the old heartbeat (keyed by old statementId) would fail and self-terminate
// after 10 consecutive failures — wasteful and noisy in logs.
if (statementId != null) {
ResultHeartbeatManager mgr = connection.getHeartbeatManager();
if (mgr != null) {
mgr.stopHeartbeat(statementId);
}
}

directResultsReceived = false;

// Per JDBC spec, re-executing a Statement implicitly closes the current ResultSet.
Expand Down
Loading
Loading