Skip to content

[FIX] Preserve response headers across context restore in SecurityInterceptor#6123

Open
KishoreKicha14 wants to merge 2 commits intoopensearch-project:mainfrom
KishoreKicha14:fix/preserve-response-headers-on-context-restore
Open

[FIX] Preserve response headers across context restore in SecurityInterceptor#6123
KishoreKicha14 wants to merge 2 commits intoopensearch-project:mainfrom
KishoreKicha14:fix/preserve-response-headers-on-context-restore

Conversation

@KishoreKicha14
Copy link
Copy Markdown

Description

Use newRestorableContext(true) instead of stashContext() for context restoration in RestoringTransportResponseHandler to preserve response headers (including TASK_RESOURCE_USAGE) across transport round-trips.

Category: Bug fix

Why these changes are required?

When the security plugin is enabled, SecurityInterceptor.sendRequestDecorate() stashes the ThreadContext via stashContext() before sending transport requests. On response, RestoringTransportResponseHandler.handleResponse() calls StoredContext.restore(), which blindly overwrites the thread-local context — discarding all response headers added during shard-level execution on data nodes. This causes shard-level TASK_RESOURCE_USAGE entries to be silently dropped, so the coordinator only sees its own resource usage.

Old behavior: RestoringTransportResponseHandler used ThreadContext.StoredContext from stashContext(), whose restore() discards response headers added after stashing. Shard-level task resource usage data was lost when security plugin was enabled.

New behavior: RestoringTransportResponseHandler uses Supplier<ThreadContext.StoredContext> from newRestorableContext(true), whose get() merges current response headers back into the restored context — matching the pattern used by OpenSearch core's ContextRestoreResponseHandler. All response headers including TASK_RESOURCE_USAGE now survive context restore.

Issues Resolved

[List any issues this PR will resolve]

Testing

  1. Unit tests verifying TASK_RESOURCE_USAGE response header survives context restore in handleResponse()
  2. Unit tests verifying multiple response headers survive context restore
  3. Preservation tests for DLS/FLS/masked-field transient header propagation on ClusterSearchShardsResponse
  4. Preservation tests for handleException() context restore and exception propagation
  5. Preservation tests for handleStreamResponse() direct delegation
  6. Combinatorial tests covering all 8 subsets of {DLS, FLS, MaskedField} headers
  7. Randomized TransportException variant tests for exception handling robustness

Manual integration test — built OpenSearch 3.7.0-SNAPSHOT with security + query-insights plugins, created single-shard index, ran search, confirmed _insights/top_queries returns both entries in task_resource_usages:

"task_resource_usages": [
  {
    "action": "indices:data/read/search[phase/query]",
    "taskId": 156,
    "parentTaskId": 155,
    "nodeId": "nJn3W-S9TwqMHeWPcsK9gw",
    "taskResourceUsage": {
      "cpu_time_in_nanos": 27630000,
      "memory_in_bytes": 8088664
    }
  },
  {
    "action": "indices:data/read/search",
    "taskId": 155,
    "parentTaskId": -1,
    "nodeId": "nJn3W-S9TwqMHeWPcsK9gw",
    "taskResourceUsage": {
      "cpu_time_in_nanos": 3343000,
      "memory_in_bytes": 112632
    }
  }
]

Check List

  • New functionality includes testing
  • New functionality has been documented
  • New Roles/Permissions have a corresponding security dashboards plugin PR
  • API changes companion pull request created
  • [X ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

PR Reviewer Guide 🔍

(Review updated until commit 520b2bc)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Resource Leak

The restorableContextSupplier created at line 181 is never consumed if an exception occurs before RestoringTransportResponseHandler is instantiated (lines 183-186). If getThreadContext().putHeader() at line 187 or subsequent operations throw, the supplier's internal context snapshot leaks. The supplier should be created after all operations that might throw, or wrapped in try-catch to ensure cleanup.

final Supplier<ThreadContext.StoredContext> restorableContextSupplier = getThreadContext().newRestorableContext(true);
try (ThreadContext.StoredContext stashedContext = getThreadContext().stashContext()) {
    final TransportResponseHandler<T> restoringHandler = new RestoringTransportResponseHandler<T>(
        handler,
        restorableContextSupplier
    );
Incomplete Restoration

In handleResponse() at line 417, calling contextToRestore.get() without assigning or closing the returned StoredContext leaves the context in a partially restored state. The returned context should be closed in a try-with-resources to ensure proper cleanup, matching the pattern used in handleException() at lines 448-450.

contextToRestore.get();

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

PR Code Suggestions ✨

Latest suggestions up to 520b2bc

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Close StoredContext after restoration

The get() call on contextToRestore returns a StoredContext that should be closed to
properly restore the thread context. Without closing it, the context restoration may
not complete correctly, potentially leaving thread-local state in an inconsistent
state.

src/main/java/org/opensearch/security/transport/SecurityInterceptor.java [417]

-contextToRestore.get();
+try (ThreadContext.StoredContext ignore = contextToRestore.get()) {
+    // Context will be restored when exiting this block
+}
Suggestion importance[1-10]: 10

__

Why: The contextToRestore.get() returns a StoredContext that must be closed to properly restore the thread context. Without closing it, the context restoration is incomplete, which is a critical bug that can lead to thread-local state corruption and loss of response headers.

High
Wrap response handling in context scope

After calling contextToRestore.get(), the returned StoredContext is not stored or
closed, which means the context restoration is incomplete. The subsequent code that
checks response headers and sets transients may execute with the wrong context
state, potentially losing response headers or setting transients in the wrong
context.

src/main/java/org/opensearch/security/transport/SecurityInterceptor.java [417-444]

-contextToRestore.get();
+try (ThreadContext.StoredContext ignore = contextToRestore.get()) {
+    final boolean isDebugEnabled = log.isDebugEnabled();
+    if (response instanceof ClusterSearchShardsResponse) {
+        // ... rest of the logic
+    }
 
-final boolean isDebugEnabled = log.isDebugEnabled();
-if (response instanceof ClusterSearchShardsResponse) {
+    innerHandler.handleResponse(response);
+}
Suggestion importance[1-10]: 10

__

Why: This correctly identifies that the StoredContext returned by contextToRestore.get() must be closed to complete restoration. The suggestion to wrap the entire response handling logic in a try-with-resources block ensures proper context management and prevents loss of response headers.

High

Previous suggestions

Suggestions up to commit 520b2bc
CategorySuggestion                                                                                                                                    Impact
Possible issue
Close StoredContext after restoration

The get() call on contextToRestore returns a StoredContext that should be closed to
properly restore the thread context. Without closing it, the context restoration may
not complete correctly, potentially leaving thread-local state in an inconsistent
state.

src/main/java/org/opensearch/security/transport/SecurityInterceptor.java [417]

-contextToRestore.get();
+try (ThreadContext.StoredContext ignore = contextToRestore.get()) {
+    // Context will be restored when exiting this block
+}
Suggestion importance[1-10]: 10

__

Why: Critical bug fix. The contextToRestore.get() call returns a StoredContext that must be closed to properly restore the thread context. Without the try-with-resources block, the context restoration is incomplete, which can lead to thread-local state corruption and security issues. The improved code correctly wraps the call in a try-with-resources block.

High
Suggestions up to commit 658a214
CategorySuggestion                                                                                                                                    Impact
General
Avoid passing null to stream response handler

Passing null as the StreamTransportResponse argument to handleStreamResponse may
cause a NullPointerException inside the handler implementation if it dereferences
the response. Use a mock or a no-op stub instead to avoid masking real bugs in the
handler.

src/test/java/org/opensearch/security/transport/SecurityInterceptorTests.java [816]

-handler.handleStreamResponse(null);
+handler.handleStreamResponse(mock(org.opensearch.transport.stream.StreamTransportResponse.class));
Suggestion importance[1-10]: 4

__

Why: Passing null to handleStreamResponse could mask real bugs, but in this test the inner handler's handleStreamResponse only sets a boolean flag and doesn't dereference the response, so the risk is low in the current test context.

Low
Remove unused random variable from test

The random variable is declared but never actually used in the test body — all test
cases are statically defined in the exceptions list. Either remove the unused
variable or use it to actually randomize the test inputs to fulfill the
"property-based style" intent described in the Javadoc.

src/test/java/org/opensearch/security/transport/SecurityInterceptorTests.java [986]

-Random random = new Random(42); // Fixed seed for reproducibility
+// random variable removed since all exceptions are statically defined
Suggestion importance[1-10]: 3

__

Why: The random variable is indeed declared but never used in the test body, making it dead code. Removing it improves clarity, though the impact is minor.

Low
Verify response header preservation semantics

The restorableContextSupplier is created before stashContext(), which means it
captures the current (pre-stash) context. However, newRestorableContext(true) with
true means it will also restore response headers. When contextToRestore.get() is
called in the handler, it restores the pre-stash context including its response
headers, which would overwrite the response headers collected during the request.
The response headers are manually re-applied after the restore in handleResponse,
but in handleException they are not — verify that this is the intended behavior and
that the true parameter correctly preserves response headers accumulated during the
stashed context.

src/main/java/org/opensearch/security/transport/SecurityInterceptor.java [181-182]

+final Supplier<ThreadContext.StoredContext> restorableContextSupplier = getThreadContext().newRestorableContext(true);
+try (ThreadContext.StoredContext stashedContext = getThreadContext().stashContext()) {
 
-
Suggestion importance[1-10]: 2

__

Why: The suggestion asks to verify behavior rather than proposing a concrete fix, and the existing_code and improved_code are identical. The concern about response headers in handleException is worth noting but the suggestion doesn't provide actionable code changes.

Low
Suggestions up to commit 20fe3b8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Verify restorable context capture ordering

The restorableContextSupplier is created before stashContext(), which means it
captures the current (pre-stash) context. However, newRestorableContext(true) with
true means response headers will be preserved on restore. Make sure the ordering is
intentional: the restorable context is created from the outer context (before
stash), so that restoring it later will bring back the original context with merged
response headers. If the intent is to restore the stashed (inner) context instead,
the supplier should be created after stashContext(). Verify this ordering matches
the intended semantics.

src/main/java/org/opensearch/security/transport/SecurityInterceptor.java [181-182]

-final Supplier<ThreadContext.StoredContext> restorableContextSupplier = getThreadContext().newRestorableContext(true);
 try (ThreadContext.StoredContext stashedContext = getThreadContext().stashContext()) {
+    final Supplier<ThreadContext.StoredContext> restorableContextSupplier = getThreadContext().newRestorableContext(true);
+    final TransportResponseHandler<T> restoringHandler = new RestoringTransportResponseHandler<T>(
+        handler,
+        restorableContextSupplier
+    );
Suggestion importance[1-10]: 6

__

Why: The ordering of newRestorableContext(true) before stashContext() is actually intentional and correct — it captures the outer (pre-stash) context so that when restored later, the original context is brought back with response headers preserved. The suggestion to move it after stashContext() would change the semantics and likely break the intended behavior. However, the suggestion raises a valid point worth verifying, so a moderate score is appropriate.

Low
General
Avoid passing null to stream response handler

Passing null to handleStreamResponse may cause a NullPointerException inside the
handler implementation if it dereferences the response object. Use a proper mock or
stub StreamTransportResponse instance to avoid masking real bugs in the handler.

src/test/java/org/opensearch/security/transport/SecurityInterceptorTests.java [826]

-handler.handleStreamResponse(null);
+handler.handleStreamResponse(mock(org.opensearch.transport.stream.StreamTransportResponse.class));
Suggestion importance[1-10]: 4

__

Why: Passing null to handleStreamResponse could cause a NullPointerException in some implementations, but in this test the streamCapturingHandler only sets a boolean flag without dereferencing the response, so it won't fail. Using a mock would be safer practice, but the current code works for this specific test.

Low
Suggestions up to commit 20fe3b8
CategorySuggestion                                                                                                                                    Impact
General
Avoid potentially invalid exception constructor arguments

TransportException does not have a two-argument constructor (String, Throwable) that
accepts null as the second argument in all versions — but more importantly, new
TransportException((String) null) may cause a NullPointerException depending on the
implementation. These edge cases should be validated to ensure the test itself
doesn't fail due to constructor issues rather than the code under test.

src/test/java/org/opensearch/security/transport/SecurityInterceptorTests.java [1014]

-new TransportException("empty cause", null)
+new TransportException("empty cause message")
Suggestion importance[1-10]: 3

__

Why: The concern about new TransportException((String) null) and new TransportException("empty cause", null) is valid as edge cases, but TransportException typically extends Exception which handles null messages and causes gracefully. The improvement is minor and the risk is low.

Low
Possible issue
Verify context capture timing relative to stash

The restorableContextSupplier is created before stashContext() is called, which
means it captures the pre-stash context. However, the try-with-resources block will
automatically close stashedContext when the block exits, which restores the context.
Since RestoringTransportResponseHandler now holds the supplier and calls it
asynchronously (after the try block exits), the stashedContext will already be
closed/restored by then. You should verify that newRestorableContext(true) indeed
captures the current (pre-stash) context and that calling get() on the supplier
after the try block exits correctly restores to that pre-stash state, rather than
being a no-op or causing double-restore issues.

src/main/java/org/opensearch/security/transport/SecurityInterceptor.java [181-182]

+final Supplier<ThreadContext.StoredContext> restorableContextSupplier = getThreadContext().newRestorableContext(true);
+try (ThreadContext.StoredContext stashedContext = getThreadContext().stashContext()) {
 
-
Suggestion importance[1-10]: 2

__

Why: The suggestion asks to verify behavior rather than proposing a concrete fix, and the existing_code equals the improved_code. The design of capturing the context before stashContext() is intentional — newRestorableContext(true) captures the pre-stash context so it can be restored asynchronously later, which is the correct pattern.

Low

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 20fe3b8

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 658a214

@cwperks
Copy link
Copy Markdown
Member

cwperks commented Apr 29, 2026

@KishoreKicha14 can you please take a look at the unit test failures? Looks like it might be related to the change in this PR.

The change looks sensible to me and I'm surprised it hasn't been brought up before. How exactly does QI use these response headers in a way that other plugins do not?

Edit: appears to be due to flakiness

@SuppressWarnings({ "rawtypes", "unchecked" })
@Test
public void testPreservation_RandomTransportExceptions_HandleExceptionRestoresAndDelegates() {
Random random = new Random(42); // Fixed seed for reproducibility
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable is unused

Comment on lines 446 to 450
@Override
public void handleException(TransportException e) {
contextToRestore.restore();
contextToRestore.get();
innerHandler.handleException(e);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: contextToRestore.get() returns a Closeable. You could close it by wrapping the call in a try-with-resources, like:

@Override
public void handleException(TransportException e) {
    try (ThreadContext.StoredContext ignore = contextToRestore.get()) {
        innerHandler.handleException(e);
    }
}

Reference: TransportService.java#L1606-L1608

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit c27ba57

…rceptor

Signed-off-by: Kishore Kumaar Natarajan <kkumaarn@amazon.com>
@KishoreKicha14 KishoreKicha14 force-pushed the fix/preserve-response-headers-on-context-restore branch from c27ba57 to 8b2b652 Compare May 7, 2026 22:51
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 520b2bc

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 520b2bc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants