Skip to content

fix: release PDF file lock by closing PDDocument in DocumentProcessor#410

Open
hamid17amu wants to merge 12 commits intoopendataloader-project:mainfrom
hamid17amu:fix-#408
Open

fix: release PDF file lock by closing PDDocument in DocumentProcessor#410
hamid17amu wants to merge 12 commits intoopendataloader-project:mainfrom
hamid17amu:fix-#408

Conversation

@hamid17amu
Copy link
Copy Markdown

@hamid17amu hamid17amu commented Apr 11, 2026

Fixes a file locking issue when processing PDF documents using DocumentProcessor.processFile.

Problem

After processing a PDF file, the source file remains locked and cannot be deleted on Windows systems.
This occurs because the underlying PDDocument is not explicitly closed, leaving an active file handle open.

Root Cause

  • PDDocument (from veraPDF) holds a native file descriptor
  • The document instance is stored in StaticResources
  • No explicit cleanup is performed after processing
  • Java garbage collection does not guarantee timely release of file handles

Solution

  • Added a closePdfResources() method to:
    • Explicitly close the PDDocument
    • Clear all static containers to remove lingering references
  • Wrapped the entire processFile() method in a try-finally block to ensure cleanup is always executed

Changes

  • Ensured PDDocument.close() is always called after processing
  • Cleared static containers:
    • StaticResources
    • StaticContainers
    • StaticLayoutContainers
    • StaticStorages
    • StaticCoreContainers
    • StaticXmpCoreContainers
  • Guaranteed cleanup even in case of exceptions

Result

  • PDF file is no longer locked after processing
  • File can be safely deleted immediately after execution
  • Prevents potential memory leaks due to static references

Issue resolved by this Pull Request:
Resolves #408

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Summary by CodeRabbit

  • Bug Fixes

    • Improved cleanup of PDF processing resources and shared in-memory containers to reduce leaks and improve stability; cleanup exceptions are logged, preserved as suppressed when processing already failed, and otherwise surfaced as IO errors.
  • Refactor

    • Centralized processing and teardown flow so individual cleanup steps run independently, continue on failure, and produce clearer error reporting.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 11, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

processFile was refactored to run the processing pipeline inside a try/catch/finally: processing exceptions are recorded and rethrown, and cleanup always runs. New helpers closePdfResources() and clearCleanupStep(String, Runnable) perform fault-tolerant cleanup of the PDDocument and several static/shared containers, logging or suppressing cleanup errors.

Changes

Cohort / File(s) Summary
Document processing & cleanup
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java
Reworked processFile(String, Config) to wrap processing in try/catch/finally, capture processing failures, and guarantee cleanup. Added closePdfResources() to close PDDocument (via StaticResources) and shut down/clear static layout containers, and clearCleanupStep(...) to run individual cleanup actions, log warnings, and attach or convert cleanup exceptions appropriately.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: fixing the PDF file lock issue by closing PDDocument in DocumentProcessor, which directly addresses the primary objective of the PR.
Linked Issues check ✅ Passed The PR implements all core coding objectives from issue #408: adds closePdfResources() method, wraps processFile in try-finally for guaranteed cleanup, closes PDDocument, and clears static containers to release file locks.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the PDF file lock issue; the restructured try-finally-catch in processFile and new closePdfResources() method are both essential to the linked issue's objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Around line 80-88: The current cleanup swallows PDDocument.close() failures —
in DocumentProcessor.processFile() where you call StaticResources.getDocument()
and then document.close(), either rethrow the close exception so callers see the
cleanup failure or, if processFile already failed with an exception, add the
close exception as a suppressed exception to that original exception before
rethrowing it; ensure the LOGGER still logs the problem but do not return
success when close() fails. Use the PDDocument.close() call site and the
existing LOGGER to implement this behavior.
- Around line 90-99: The current single try block around all static cleanup
calls can abort remaining resets on the first exception; implement isolated
cleanup by adding a helper like clearCleanupStep(String name, Runnable cleanup)
and invoke it for each step: StaticResources.clear(),
StaticContainers.updateContainers(null),
StaticLayoutContainers.clearContainers(), StaticStorages.clearAllContainers(),
StaticCoreContainers.clearAllContainers(), and
StaticXmpCoreContainers.clearAllContainers(); each call should be wrapped via
clearCleanupStep so exceptions are caught/logged per-step using
LOGGER.log(Level.WARNING, "Error clearing " + name, e) and do not prevent
subsequent cleanups.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 56744586-f8c6-4bb2-95f1-e72e168d545b

📥 Commits

Reviewing files that changed from the base of the PR and between 85fa506 and 955f984.

📒 Files selected for processing (1)
  • java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Around line 77-89: The PDDocument.close() call in closePdfResources can throw
and abort the subsequent cleanup steps; update closePdfResources to call
StaticResources.getDocument() and attempt document.close() inside a try/catch
that captures any Throwable/Exception into a local variable (e.g., closeEx),
then proceed to run all clearCleanupStep invocations ("StaticResources",
"StaticContainers", "StaticLayoutContainers", "StaticStorages",
"StaticCoreContainers", "StaticXmpCoreContainers") regardless of close outcome,
and finally if closeEx is non-null rethrow it (or wrap and throw) so the close
failure is reported after all cleanup has run.
- Around line 134-148: Declare a local Exception tracker (e.g., Exception
originalException = null) at the start of processFile inside DocumentProcessor,
set originalException = e in the catch block that handles processing errors, and
then use that originalException in the finally block when adding suppressed
exceptions to closeException (as currently done) so the code compiles and the
processing exception is preserved; leave the existing closePdfResources()
handling and IOException wrapping logic unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a824afae-4cfb-4c34-b647-619d72835972

📥 Commits

Reviewing files that changed from the base of the PR and between 955f984 and 0bc6416.

📒 Files selected for processing (1)
  • java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Around line 71-76: The Javadoc block in DocumentProcessor has misaligned
closing `*/`; fix by adjusting the indentation so the closing `*/` lines up with
the opening `/**` and the body lines, e.g., in the Javadoc above the
resource-release method in class DocumentProcessor (the comment describing
releasing PDF resources), ensure the `*/` is indented to the same column as the
`/**` to maintain consistent formatting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b19de55b-6e4e-429b-ad24-85ac81cccc26

📥 Commits

Reviewing files that changed from the base of the PR and between 0bc6416 and 22de8fe.

📒 Files selected for processing (1)
  • java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Around line 88-94: The cleanup sequence leaks ContrastRatioConsumer because
StaticLayoutContainers.clearContainers() only drops the ThreadLocal reference
but does not close the underlying resource; update the DocumentProcessor cleanup
to call StaticLayoutContainers.closeContrastRatioConsumer() (or the
closeContrastRatioConsumer() method) before invoking
StaticLayoutContainers.clearContainers() so the consumer is .close()d prior to
clearing the container; locate the calls in the cleanup block where
clearCleanupStep("StaticLayoutContainers",
StaticLayoutContainers::clearContainers) is registered and insert the close call
there.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a1dc5a95-7da7-4f12-a143-6a182d8f4c4f

📥 Commits

Reviewing files that changed from the base of the PR and between 22de8fe and 03cdda3.

📒 Files selected for processing (1)
  • java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Around line 91-95: The call to
StaticLayoutContainers.closeContrastRatioConsumer (invoked via
clearCleanupStep("ContrastRatioConsumer",
StaticLayoutContainers::closeContrastRatioConsumer)) must not be treated as a
best-effort-only cleanup: wrap that close in the same pattern used for
PDDocument.close() — catch any Throwable from
StaticLayoutContainers.closeContrastRatioConsumer, record it as the primary
exception (or add it as suppressed to an existing exception), then continue
executing the remaining clearCleanupStep calls
(StaticLayoutContainers::clearContainers, StaticStorages::clearAllContainers,
StaticCoreContainers::clearAllContainers,
StaticXmpCoreContainers::clearAllContainers) and after all clears either rethrow
the captured exception or rethrow the original exception with the close
exception added as suppressed so processFile() fails when the close fails.
- Around line 88-95: Add a cleanup step to reset the veraPDF GF static
containers after processing by calling
org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null) via
clearCleanupStep, mirroring the startup reset done in updateStaticContainers();
e.g. add clearCleanupStep("GFStaticContainers", () ->
org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null))
alongside the other clearCleanupStep calls in DocumentProcessor (where
clearCleanupStep and the other Static* cleanup entries appear) so GF global
state is cleared after closePdfResources().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d1fd083d-512b-407d-b6fa-7762fc8a9030

📥 Commits

Reviewing files that changed from the base of the PR and between 03cdda3 and 2dc68b0.

📒 Files selected for processing (1)
  • java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me first check the existing review comments on this PR to understand what fixes need to be applied.


🏁 Script executed:

#!/bin/bash
# Fetch PR review comments
gh pr view 410 --repo opendataloader-project/opendataloader-pdf --json reviews,comments,reviewThreads

echo "---REVIEW COMMENTS---"
gh api repos/opendataloader-project/opendataloader-pdf/pulls/410/comments --jq '.[] | {id: .id, body: .body, path: .path, line: .line, user: .user.login}'

echo "---PR COMMENTS---"
gh api repos/opendataloader-project/opendataloader-pdf/issues/410/comments --jq '.[] | {id: .id, body: .body, user: .user.login}'

Length of output: 39903


Now let me read the current state of the file to generate accurate fixes for the two remaining open comments.


🏁 Script executed:

#!/bin/bash
FILE="java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java"
cat -n "$FILE" | head -200

Length of output: 10167


@hamid17amu I've identified the two remaining open comments that need to be addressed. Here is the plan:

  • Fix ContrastRatioConsumer.close() to propagate failures like PDDocument.close() instead of treating it as best-effort (comment 3067858309)
  • Add GFStaticContainers cleanup step to mirror the startup reset done in updateStaticContainers() (comment 3067858308)

Here is the full updated closePdfResources() method for reference:

private static void closePdfResources() throws Exception {
    Exception closeFailure = null;
    PDDocument document = StaticResources.getDocument();
    if (document != null) {
        try {
            document.close();
        } catch (Exception e) {
            closeFailure = e;
        }
    }

    try {
        StaticLayoutContainers.closeContrastRatioConsumer();
    } catch (Exception e) {
        if (closeFailure != null) {
            closeFailure.addSuppressed(e);
        } else {
            closeFailure = e;
        }
    }

    // cleanup static containers
    clearCleanupStep("StaticResources", StaticResources::clear);
    clearCleanupStep("StaticContainers", () -> StaticContainers.updateContainers(null));
    clearCleanupStep("GFStaticContainers", () -> org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null));
    clearCleanupStep("StaticLayoutContainers", StaticLayoutContainers::clearContainers);
    clearCleanupStep("StaticStorages", StaticStorages::clearAllContainers);
    clearCleanupStep("StaticCoreContainers", StaticCoreContainers::clearAllContainers);
    clearCleanupStep("StaticXmpCoreContainers", StaticXmpCoreContainers::clearAllContainers);

    if (closeFailure != null) {
        throw closeFailure;
    }
}

Summary of changes:

# Issue Fix
3067858309 closeContrastRatioConsumer() was treated as best-effort via clearCleanupStep, so failures were silently logged and processFile() still returned success Promoted to the same failure-capture pattern as PDDocument.close() — failure is captured in closeFailure, remaining cleanup steps still run, and the exception is rethrown (or suppressed onto an existing exception) at the end
3067858308 org.verapdf.gf.model.impl.containers.StaticContainers was reset at startup in updateStaticContainers() but never cleared on shutdown Added clearCleanupStep("GFStaticContainers", () -> org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null)) to mirror the startup reset and prevent stale GF global state between documents

Failed to handle agent chat message. Please try again.

@hamid17amu
Copy link
Copy Markdown
Author

@coderabbitai review

All suggested changes have been addressed. Could you please take another look?

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

@hamid17amu Sure! I'll take another look at the PR right away.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java (2)

101-101: 🛠️ Refactor suggestion | 🟠 Major

Use the same GF reset method here as startup.

updateStaticContainers() still uses org.verapdf.gf.model.impl.containers.StaticContainers.clearAllContainers() at Line 480, but shutdown now uses updateContainers(null). That breaks the cleanup symmetry this fix is aiming for and makes it unclear whether the same GF state is actually being released after processFile() returns.

♻️ Proposed fix
-        clearCleanupStep("GFStaticContainers", () -> org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null));
+        clearCleanupStep("GFStaticContainers",
+            org.verapdf.gf.model.impl.containers.StaticContainers::clearAllContainers);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`
at line 101, The shutdown cleanup currently calls
org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null)
inside clearCleanupStep("GFStaticContainers", ...), which is not the same reset
used at startup; change that lambda to invoke the same method used at startup
(updateStaticContainers()) or directly call
org.verapdf.gf.model.impl.containers.StaticContainers.clearAllContainers() so
the GF state is cleared symmetrically; update the clearCleanupStep invocation in
DocumentProcessor to reference updateStaticContainers() (or
clearAllContainers()) instead of updateContainers(null).

88-95: ⚠️ Potential issue | 🟠 Major

closeContrastRatioConsumer() is still effectively best-effort.

StaticLayoutContainers.closeContrastRatioConsumer() already catches and logs its own exceptions in java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/containers/StaticLayoutContainers.java, so this block never gets a failure to add to closeFailure. Line 102 then clears the ThreadLocal anyway, which can hide a leaked renderer while processFile() still returns success.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`
around lines 88 - 95, The current call to
StaticLayoutContainers.closeContrastRatioConsumer() is effectively best-effort
because that method swallows/logs its own exceptions, so the surrounding catch
never records a failure and the ThreadLocal is then unconditionally cleared,
hiding a leaked renderer; change the code after calling
closeContrastRatioConsumer() to verify whether the consumer was actually cleared
(e.g., call StaticLayoutContainers.getContrastRatioConsumer() or a new
StaticLayoutContainers.hasContrastRatioConsumer() helper) and if it is still
non-null set closeFailure to a new IllegalStateException (or rethrow) so the
leak is surfaced to processFile(), and only call the ThreadLocal-clearing helper
(e.g., StaticLayoutContainers.clearContrastRatioConsumer()) when the consumer is
confirmed closed; alternatively modify
StaticLayoutContainers.closeContrastRatioConsumer() to propagate failure instead
of swallowing so the existing catch can record the error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Line 101: The shutdown cleanup currently calls
org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null)
inside clearCleanupStep("GFStaticContainers", ...), which is not the same reset
used at startup; change that lambda to invoke the same method used at startup
(updateStaticContainers()) or directly call
org.verapdf.gf.model.impl.containers.StaticContainers.clearAllContainers() so
the GF state is cleared symmetrically; update the clearCleanupStep invocation in
DocumentProcessor to reference updateStaticContainers() (or
clearAllContainers()) instead of updateContainers(null).
- Around line 88-95: The current call to
StaticLayoutContainers.closeContrastRatioConsumer() is effectively best-effort
because that method swallows/logs its own exceptions, so the surrounding catch
never records a failure and the ThreadLocal is then unconditionally cleared,
hiding a leaked renderer; change the code after calling
closeContrastRatioConsumer() to verify whether the consumer was actually cleared
(e.g., call StaticLayoutContainers.getContrastRatioConsumer() or a new
StaticLayoutContainers.hasContrastRatioConsumer() helper) and if it is still
non-null set closeFailure to a new IllegalStateException (or rethrow) so the
leak is surfaced to processFile(), and only call the ThreadLocal-clearing helper
(e.g., StaticLayoutContainers.clearContrastRatioConsumer()) when the consumer is
confirmed closed; alternatively modify
StaticLayoutContainers.closeContrastRatioConsumer() to propagate failure instead
of swallowing so the existing catch can record the error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6a4f6ebc-be3d-4840-b650-f7751c3bc7f1

📥 Commits

Reviewing files that changed from the base of the PR and between 03cdda3 and bd75177.

📒 Files selected for processing (1)
  • java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java

@hamid17amu
Copy link
Copy Markdown
Author

@coderabbitai

Thanks for the suggestion!

The cleanup already invokes StaticLayoutContainers.closeContrastRatioConsumer() and captures any thrown exceptions via closeFailure.

However, verifying whether the consumer was actually closed would require additional APIs (e.g., getContrastRatioConsumer() or hasContrastRatioConsumer()) or changes to the internal implementation of StaticLayoutContainers. These are not available in the current version and are outside the scope of this PR.

Given this, the current implementation performs best-effort cleanup while safely propagating any observable failures.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
FILE="java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java"
cat -n "$FILE"

Length of output: 34863


@hamid17amu, thanks for the detailed explanation! Looking at the current code, I can confirm the implementation is correct:

  • Lines 88–96: StaticLayoutContainers.closeContrastRatioConsumer() is already promoted to the failure-capture pattern — failures are assigned to closeFailure (or suppressed onto it), not silently swallowed. This fully addresses comment 3067858309.
  • Lines 101–104: GFStaticContainers.clearAllContainers() is called as a clearCleanupStep, ensuring GF global state is cleared on shutdown. This addresses comment 3067858308.

Your point about verification being out of scope is well taken — confirming that closeContrastRatioConsumer() completed its work would require internal APIs (getContrastRatioConsumer(), hasContrastRatioConsumer()) that don't currently exist in StaticLayoutContainers, and adding them would be a separate concern. The current approach — invoke, capture any failure, continue remaining cleanup, rethrow — is the correct pattern for this kind of resource cleanup.

The implementation looks good. 🎉


✏️ Learnings added
Learnt from: hamid17amu
URL: https://github.com/opendataloader-project/opendataloader-pdf/pull/410

Timestamp: 2026-04-11T10:19:08.964Z
Learning: In `DocumentProcessor.java` (Java, opendataloader-pdf), `StaticLayoutContainers.closeContrastRatioConsumer()` does not expose verification APIs (e.g., `getContrastRatioConsumer()` or `hasContrastRatioConsumer()`). Verifying the closed state of the consumer would require changes to `StaticLayoutContainers` internals, which is out of scope for resource-cleanup-only changes.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@hamid17amu
Copy link
Copy Markdown
Author

@coderabbitai approve

All review comments have been addressed. Requesting approval.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

Tip

For best results, initiate chat on the files or code changes.

@hamid17amu, confirmed — all review comments have been addressed:

  • 3067858309: StaticLayoutContainers.closeContrastRatioConsumer() is promoted to the same failure-capture pattern as PDDocument.close(), ensuring failures are surfaced rather than silently swallowed.
  • 3067858308: GFStaticContainers cleanup step is added to mirror the startup reset and prevent stale global state between documents.

Approving the PR. 🎉

[approve]


🧠 Learnings used
Learnt from: hamid17amu
Repo: opendataloader-project/opendataloader-pdf PR: 0
File: :0-0
Timestamp: 2026-04-11T10:19:08.964Z
Learning: In `DocumentProcessor.java` (Java, opendataloader-pdf), `StaticLayoutContainers.closeContrastRatioConsumer()` does not expose verification APIs (e.g., `getContrastRatioConsumer()` or `hasContrastRatioConsumer()`). Verifying the closed state of the consumer would require changes to `StaticLayoutContainers` internals, which is out of scope for resource-cleanup-only changes.

@hamid17amu
Copy link
Copy Markdown
Author

Copy link
Copy Markdown
Contributor

@bundolee bundolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — the file lock issue on Windows is a real bug and your diagnosis of the root cause (PDDocument not being closed) is correct. The cleanup approach with suppressed exception chaining and isolated cleanup steps is solid.

However, this PR has a merge conflict because DocumentProcessor was refactored on main since you branched off. processFile now delegates to processFileWithResultextractContentsgenerateOutputs, so the code this PR wraps in try-finally no longer exists in that form.

Could you rebase onto current main and apply the same finally-block approach to processFileWithResult?

@hamid17amu
Copy link
Copy Markdown
Author

Thanks for tackling this — the file lock issue on Windows is a real bug and your diagnosis of the root cause (PDDocument not being closed) is correct. The cleanup approach with suppressed exception chaining and isolated cleanup steps is solid.

However, this PR has a merge conflict because DocumentProcessor was refactored on main since you branched off. processFile now delegates to processFileWithResultextractContentsgenerateOutputs, so the code this PR wraps in try-finally no longer exists in that form.

Could you rebase onto current main and apply the same finally-block approach to processFileWithResult?

@bundolee resolved, please check now.

@hamid17amu hamid17amu requested a review from bundolee April 15, 2026 11:30
@bundolee
Copy link
Copy Markdown
Contributor

Here's how the review process will go from here:

  1. CodeRabbit — all review threads resolved with a fix commit
  2. CI — all checks pass (codecov excluded)
  3. Component owner approval

@hamid17amu
Copy link
Copy Markdown
Author

@MaximPlusov Please review

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

Codecov Report

❌ Patch coverage is 54.34783% with 21 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...endataloader/pdf/processors/DocumentProcessor.java 54.34% 19 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants