fix: release PDF file lock by closing PDDocument in DocumentProcessor#410
fix: release PDF file lock by closing PDDocument in DocumentProcessor#410hamid17amu wants to merge 12 commits intoopendataloader-project:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughprocessFile was refactored to run the processing pipeline inside a try/catch/finally: processing exceptions are recorded and rethrown, and cleanup always runs. New helpers Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Around line 80-88: The current cleanup swallows PDDocument.close() failures —
in DocumentProcessor.processFile() where you call StaticResources.getDocument()
and then document.close(), either rethrow the close exception so callers see the
cleanup failure or, if processFile already failed with an exception, add the
close exception as a suppressed exception to that original exception before
rethrowing it; ensure the LOGGER still logs the problem but do not return
success when close() fails. Use the PDDocument.close() call site and the
existing LOGGER to implement this behavior.
- Around line 90-99: The current single try block around all static cleanup
calls can abort remaining resets on the first exception; implement isolated
cleanup by adding a helper like clearCleanupStep(String name, Runnable cleanup)
and invoke it for each step: StaticResources.clear(),
StaticContainers.updateContainers(null),
StaticLayoutContainers.clearContainers(), StaticStorages.clearAllContainers(),
StaticCoreContainers.clearAllContainers(), and
StaticXmpCoreContainers.clearAllContainers(); each call should be wrapped via
clearCleanupStep so exceptions are caught/logged per-step using
LOGGER.log(Level.WARNING, "Error clearing " + name, e) and do not prevent
subsequent cleanups.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 56744586-f8c6-4bb2-95f1-e72e168d545b
📒 Files selected for processing (1)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Around line 77-89: The PDDocument.close() call in closePdfResources can throw
and abort the subsequent cleanup steps; update closePdfResources to call
StaticResources.getDocument() and attempt document.close() inside a try/catch
that captures any Throwable/Exception into a local variable (e.g., closeEx),
then proceed to run all clearCleanupStep invocations ("StaticResources",
"StaticContainers", "StaticLayoutContainers", "StaticStorages",
"StaticCoreContainers", "StaticXmpCoreContainers") regardless of close outcome,
and finally if closeEx is non-null rethrow it (or wrap and throw) so the close
failure is reported after all cleanup has run.
- Around line 134-148: Declare a local Exception tracker (e.g., Exception
originalException = null) at the start of processFile inside DocumentProcessor,
set originalException = e in the catch block that handles processing errors, and
then use that originalException in the finally block when adding suppressed
exceptions to closeException (as currently done) so the code compiles and the
processing exception is preserved; leave the existing closePdfResources()
handling and IOException wrapping logic unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a824afae-4cfb-4c34-b647-619d72835972
📒 Files selected for processing (1)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Around line 71-76: The Javadoc block in DocumentProcessor has misaligned
closing `*/`; fix by adjusting the indentation so the closing `*/` lines up with
the opening `/**` and the body lines, e.g., in the Javadoc above the
resource-release method in class DocumentProcessor (the comment describing
releasing PDF resources), ensure the `*/` is indented to the same column as the
`/**` to maintain consistent formatting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b19de55b-6e4e-429b-ad24-85ac81cccc26
📒 Files selected for processing (1)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Around line 88-94: The cleanup sequence leaks ContrastRatioConsumer because
StaticLayoutContainers.clearContainers() only drops the ThreadLocal reference
but does not close the underlying resource; update the DocumentProcessor cleanup
to call StaticLayoutContainers.closeContrastRatioConsumer() (or the
closeContrastRatioConsumer() method) before invoking
StaticLayoutContainers.clearContainers() so the consumer is .close()d prior to
clearing the container; locate the calls in the cleanup block where
clearCleanupStep("StaticLayoutContainers",
StaticLayoutContainers::clearContainers) is registered and insert the close call
there.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a1dc5a95-7da7-4f12-a143-6a182d8f4c4f
📒 Files selected for processing (1)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Around line 91-95: The call to
StaticLayoutContainers.closeContrastRatioConsumer (invoked via
clearCleanupStep("ContrastRatioConsumer",
StaticLayoutContainers::closeContrastRatioConsumer)) must not be treated as a
best-effort-only cleanup: wrap that close in the same pattern used for
PDDocument.close() — catch any Throwable from
StaticLayoutContainers.closeContrastRatioConsumer, record it as the primary
exception (or add it as suppressed to an existing exception), then continue
executing the remaining clearCleanupStep calls
(StaticLayoutContainers::clearContainers, StaticStorages::clearAllContainers,
StaticCoreContainers::clearAllContainers,
StaticXmpCoreContainers::clearAllContainers) and after all clears either rethrow
the captured exception or rethrow the original exception with the close
exception added as suppressed so processFile() fails when the close fails.
- Around line 88-95: Add a cleanup step to reset the veraPDF GF static
containers after processing by calling
org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null) via
clearCleanupStep, mirroring the startup reset done in updateStaticContainers();
e.g. add clearCleanupStep("GFStaticContainers", () ->
org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null))
alongside the other clearCleanupStep calls in DocumentProcessor (where
clearCleanupStep and the other Static* cleanup entries appear) so GF global
state is cleared after closePdfResources().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d1fd083d-512b-407d-b6fa-7762fc8a9030
📒 Files selected for processing (1)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me first check the existing review comments on this PR to understand what fixes need to be applied. 🏁 Script executed: Length of output: 39903 Now let me read the current state of the file to generate accurate fixes for the two remaining open comments. 🏁 Script executed: Length of output: 10167
Here is the full updated private static void closePdfResources() throws Exception {
Exception closeFailure = null;
PDDocument document = StaticResources.getDocument();
if (document != null) {
try {
document.close();
} catch (Exception e) {
closeFailure = e;
}
}
try {
StaticLayoutContainers.closeContrastRatioConsumer();
} catch (Exception e) {
if (closeFailure != null) {
closeFailure.addSuppressed(e);
} else {
closeFailure = e;
}
}
// cleanup static containers
clearCleanupStep("StaticResources", StaticResources::clear);
clearCleanupStep("StaticContainers", () -> StaticContainers.updateContainers(null));
clearCleanupStep("GFStaticContainers", () -> org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null));
clearCleanupStep("StaticLayoutContainers", StaticLayoutContainers::clearContainers);
clearCleanupStep("StaticStorages", StaticStorages::clearAllContainers);
clearCleanupStep("StaticCoreContainers", StaticCoreContainers::clearAllContainers);
clearCleanupStep("StaticXmpCoreContainers", StaticXmpCoreContainers::clearAllContainers);
if (closeFailure != null) {
throw closeFailure;
}
}Summary of changes:
Failed to handle agent chat message. Please try again. |
|
@coderabbitai review All suggested changes have been addressed. Could you please take another look? |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
♻️ Duplicate comments (2)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java (2)
101-101: 🛠️ Refactor suggestion | 🟠 MajorUse the same GF reset method here as startup.
updateStaticContainers()still usesorg.verapdf.gf.model.impl.containers.StaticContainers.clearAllContainers()at Line 480, but shutdown now usesupdateContainers(null). That breaks the cleanup symmetry this fix is aiming for and makes it unclear whether the same GF state is actually being released afterprocessFile()returns.♻️ Proposed fix
- clearCleanupStep("GFStaticContainers", () -> org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null)); + clearCleanupStep("GFStaticContainers", + org.verapdf.gf.model.impl.containers.StaticContainers::clearAllContainers);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java` at line 101, The shutdown cleanup currently calls org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null) inside clearCleanupStep("GFStaticContainers", ...), which is not the same reset used at startup; change that lambda to invoke the same method used at startup (updateStaticContainers()) or directly call org.verapdf.gf.model.impl.containers.StaticContainers.clearAllContainers() so the GF state is cleared symmetrically; update the clearCleanupStep invocation in DocumentProcessor to reference updateStaticContainers() (or clearAllContainers()) instead of updateContainers(null).
88-95:⚠️ Potential issue | 🟠 Major
closeContrastRatioConsumer()is still effectively best-effort.
StaticLayoutContainers.closeContrastRatioConsumer()already catches and logs its own exceptions injava/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/containers/StaticLayoutContainers.java, so this block never gets a failure to add tocloseFailure. Line 102 then clears theThreadLocalanyway, which can hide a leaked renderer whileprocessFile()still returns success.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java` around lines 88 - 95, The current call to StaticLayoutContainers.closeContrastRatioConsumer() is effectively best-effort because that method swallows/logs its own exceptions, so the surrounding catch never records a failure and the ThreadLocal is then unconditionally cleared, hiding a leaked renderer; change the code after calling closeContrastRatioConsumer() to verify whether the consumer was actually cleared (e.g., call StaticLayoutContainers.getContrastRatioConsumer() or a new StaticLayoutContainers.hasContrastRatioConsumer() helper) and if it is still non-null set closeFailure to a new IllegalStateException (or rethrow) so the leak is surfaced to processFile(), and only call the ThreadLocal-clearing helper (e.g., StaticLayoutContainers.clearContrastRatioConsumer()) when the consumer is confirmed closed; alternatively modify StaticLayoutContainers.closeContrastRatioConsumer() to propagate failure instead of swallowing so the existing catch can record the error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`:
- Line 101: The shutdown cleanup currently calls
org.verapdf.gf.model.impl.containers.StaticContainers.updateContainers(null)
inside clearCleanupStep("GFStaticContainers", ...), which is not the same reset
used at startup; change that lambda to invoke the same method used at startup
(updateStaticContainers()) or directly call
org.verapdf.gf.model.impl.containers.StaticContainers.clearAllContainers() so
the GF state is cleared symmetrically; update the clearCleanupStep invocation in
DocumentProcessor to reference updateStaticContainers() (or
clearAllContainers()) instead of updateContainers(null).
- Around line 88-95: The current call to
StaticLayoutContainers.closeContrastRatioConsumer() is effectively best-effort
because that method swallows/logs its own exceptions, so the surrounding catch
never records a failure and the ThreadLocal is then unconditionally cleared,
hiding a leaked renderer; change the code after calling
closeContrastRatioConsumer() to verify whether the consumer was actually cleared
(e.g., call StaticLayoutContainers.getContrastRatioConsumer() or a new
StaticLayoutContainers.hasContrastRatioConsumer() helper) and if it is still
non-null set closeFailure to a new IllegalStateException (or rethrow) so the
leak is surfaced to processFile(), and only call the ThreadLocal-clearing helper
(e.g., StaticLayoutContainers.clearContrastRatioConsumer()) when the consumer is
confirmed closed; alternatively modify
StaticLayoutContainers.closeContrastRatioConsumer() to propagate failure instead
of swallowing so the existing catch can record the error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6a4f6ebc-be3d-4840-b650-f7751c3bc7f1
📒 Files selected for processing (1)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java
|
Thanks for the suggestion! The cleanup already invokes StaticLayoutContainers.closeContrastRatioConsumer() and captures any thrown exceptions via closeFailure. However, verifying whether the consumer was actually closed would require additional APIs (e.g., getContrastRatioConsumer() or hasContrastRatioConsumer()) or changes to the internal implementation of StaticLayoutContainers. These are not available in the current version and are outside the scope of this PR. Given this, the current implementation performs best-effort cleanup while safely propagating any observable failures. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 34863
Your point about verification being out of scope is well taken — confirming that The implementation looks good. 🎉 ✏️ Learnings added
|
|
@coderabbitai approve All review comments have been addressed. Requesting approval. |
|
Tip For best results, initiate chat on the files or code changes.
Approving the PR. 🎉 [approve] 🧠 Learnings used |
|
@MaximPlusov @LonelyMidoriya @hyunhee-jo @bundolee @hnc-hjryu |
bundolee
left a comment
There was a problem hiding this comment.
Thanks for tackling this — the file lock issue on Windows is a real bug and your diagnosis of the root cause (PDDocument not being closed) is correct. The cleanup approach with suppressed exception chaining and isolated cleanup steps is solid.
However, this PR has a merge conflict because DocumentProcessor was refactored on main since you branched off. processFile now delegates to processFileWithResult → extractContents → generateOutputs, so the code this PR wraps in try-finally no longer exists in that form.
Could you rebase onto current main and apply the same finally-block approach to processFileWithResult?
Fixed formatting of the license URL in the comments.
@bundolee resolved, please check now. |
|
Here's how the review process will go from here:
|
|
@MaximPlusov Please review |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Fixes a file locking issue when processing PDF documents using
DocumentProcessor.processFile.Problem
After processing a PDF file, the source file remains locked and cannot be deleted on Windows systems.
This occurs because the underlying
PDDocumentis not explicitly closed, leaving an active file handle open.Root Cause
PDDocument(from veraPDF) holds a native file descriptorStaticResourcesSolution
closePdfResources()method to:PDDocumentprocessFile()method in atry-finallyblock to ensure cleanup is always executedChanges
PDDocument.close()is always called after processingStaticResourcesStaticContainersStaticLayoutContainersStaticStoragesStaticCoreContainersStaticXmpCoreContainersResult
Issue resolved by this Pull Request:
Resolves #408
Checklist:
Summary by CodeRabbit
Bug Fixes
Refactor