Fix: Handle invalid file paths in OpenDataLoaderPDF#431
Fix: Handle invalid file paths in OpenDataLoaderPDF#431pavanpai769 wants to merge 6 commits intoopendataloader-project:mainfrom
Conversation
Signed-off-by: pavanpai769 <151814231+pavanpai769@users.noreply.github.com>
Signed-off-by: pavanpai769 <151814231+pavanpai769@users.noreply.github.com>
Signed-off-by: pavanpai769 <151814231+pavanpai769@users.noreply.github.com>
Signed-off-by: pavanpai769 <151814231+pavanpai769@users.noreply.github.com>
WalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java (1)
41-47:⚠️ Potential issue | 🟡 MinorDocument thrown exceptions in the public method Javadoc.
processFile(...)now rejects invalid input withIllegalArgumentExceptionand still propagatesIOException, but the Javadoc does not declare either via@throws. Please make the API contract explicit.📝 Suggested Javadoc update
/** * Processes a PDF file to extract its content and structure based on the provided configuration. * * `@param` inputPdfName The path to the input PDF file. * `@param` config The configuration object specifying output formats and other options. + * `@throws` IllegalArgumentException if the input path is null, blank, invalid, missing, not a regular file, or not a PDF. + * `@throws` IOException if an I/O error occurs during processing. * */🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java` around lines 41 - 47, The Javadoc for the public method processFile in class OpenDataLoaderPDF is missing `@throws` tags for the runtime and checked exceptions it can produce; update the method Javadoc to explicitly document that it throws IllegalArgumentException for invalid input (e.g., null/empty inputPdfName or invalid config) and IOException for I/O failures encountered while reading/processing the PDF, using `@throws` IllegalArgumentException and `@throws` IOException lines so the API contract matches the method implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java`:
- Around line 61-88: The warning logs in OpenDataLoaderPDF (the block using
inputPdfName, path, LOGGER) currently print the full user-supplied path and must
be removed or sanitized; update each LOGGER.log call in that validation block to
avoid echoing inputPdfName or the full path and instead log a safe identifier
such as the file base name (path.getFileName().toString()) or a constant
placeholder like "<redacted-path>" (for the InvalidPathException case where path
is not available), e.g. replace messages that concatenate inputPdfName with ones
that use only the safeName or the placeholder so no absolute/user path is
emitted.
---
Outside diff comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java`:
- Around line 41-47: The Javadoc for the public method processFile in class
OpenDataLoaderPDF is missing `@throws` tags for the runtime and checked exceptions
it can produce; update the method Javadoc to explicitly document that it throws
IllegalArgumentException for invalid input (e.g., null/empty inputPdfName or
invalid config) and IOException for I/O failures encountered while
reading/processing the PDF, using `@throws` IllegalArgumentException and `@throws`
IOException lines so the API contract matches the method implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6bbb458e-d96c-4d3e-b2c0-69f9695fa559
📒 Files selected for processing (1)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java
| if (inputPdfName == null || inputPdfName.isBlank()) { | ||
| LOGGER.log(Level.WARNING,"Input PDF name is null or Empty"); | ||
| throw new IllegalArgumentException("Input PDF name is null or Empty"); | ||
| } | ||
|
|
||
| final Path path; | ||
|
|
||
| try { | ||
| path = Paths.get(inputPdfName); | ||
| } catch (InvalidPathException ex) { | ||
| LOGGER.log(Level.WARNING,"Invalid Path: " + inputPdfName); | ||
| throw new IllegalArgumentException("Invalid Path: " + inputPdfName); | ||
| } | ||
|
|
||
| if (!Files.exists(path)) { | ||
| LOGGER.log(Level.WARNING,"File not found at " + inputPdfName + " location"); | ||
| throw new IllegalArgumentException("File not found at " + inputPdfName + " location"); | ||
| } | ||
|
|
||
| if (!Files.isRegularFile(path)) { | ||
| LOGGER.log(Level.WARNING,"Not a valid file " + inputPdfName); | ||
| throw new IllegalArgumentException("Not a valid file " + inputPdfName); | ||
| } | ||
|
|
||
| if (!path.getFileName().toString().toLowerCase(Locale.ROOT).endsWith(".pdf")) { | ||
| LOGGER.log(Level.WARNING,"Not a PDF file"); | ||
| throw new IllegalArgumentException("Not a PDF file"); | ||
| } |
There was a problem hiding this comment.
Avoid logging full user-supplied paths at WARNING level.
These warning logs include raw input paths, which can leak sensitive local filesystem details (e.g., usernames/home paths) into centralized logs.
🔧 Suggested hardening
- LOGGER.log(Level.WARNING,"Input PDF name is null or Empty");
+ LOGGER.log(Level.WARNING, "Input PDF path is null or blank");
- LOGGER.log(Level.WARNING,"Invalid Path: " + inputPdfName);
+ LOGGER.log(Level.WARNING, "Invalid input PDF path");
- LOGGER.log(Level.WARNING,"File not found at " + inputPdfName + " location");
+ LOGGER.log(Level.WARNING, "Input PDF file does not exist");
- LOGGER.log(Level.WARNING,"Not a valid file " + inputPdfName);
+ LOGGER.log(Level.WARNING, "Input PDF path is not a regular file");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (inputPdfName == null || inputPdfName.isBlank()) { | |
| LOGGER.log(Level.WARNING,"Input PDF name is null or Empty"); | |
| throw new IllegalArgumentException("Input PDF name is null or Empty"); | |
| } | |
| final Path path; | |
| try { | |
| path = Paths.get(inputPdfName); | |
| } catch (InvalidPathException ex) { | |
| LOGGER.log(Level.WARNING,"Invalid Path: " + inputPdfName); | |
| throw new IllegalArgumentException("Invalid Path: " + inputPdfName); | |
| } | |
| if (!Files.exists(path)) { | |
| LOGGER.log(Level.WARNING,"File not found at " + inputPdfName + " location"); | |
| throw new IllegalArgumentException("File not found at " + inputPdfName + " location"); | |
| } | |
| if (!Files.isRegularFile(path)) { | |
| LOGGER.log(Level.WARNING,"Not a valid file " + inputPdfName); | |
| throw new IllegalArgumentException("Not a valid file " + inputPdfName); | |
| } | |
| if (!path.getFileName().toString().toLowerCase(Locale.ROOT).endsWith(".pdf")) { | |
| LOGGER.log(Level.WARNING,"Not a PDF file"); | |
| throw new IllegalArgumentException("Not a PDF file"); | |
| } | |
| if (inputPdfName == null || inputPdfName.isBlank()) { | |
| LOGGER.log(Level.WARNING, "Input PDF path is null or blank"); | |
| throw new IllegalArgumentException("Input PDF name is null or Empty"); | |
| } | |
| final Path path; | |
| try { | |
| path = Paths.get(inputPdfName); | |
| } catch (InvalidPathException ex) { | |
| LOGGER.log(Level.WARNING, "Invalid input PDF path"); | |
| throw new IllegalArgumentException("Invalid Path: " + inputPdfName); | |
| } | |
| if (!Files.exists(path)) { | |
| LOGGER.log(Level.WARNING, "Input PDF file does not exist"); | |
| throw new IllegalArgumentException("File not found at " + inputPdfName + " location"); | |
| } | |
| if (!Files.isRegularFile(path)) { | |
| LOGGER.log(Level.WARNING, "Input PDF path is not a regular file"); | |
| throw new IllegalArgumentException("Not a valid file " + inputPdfName); | |
| } | |
| if (!path.getFileName().toString().toLowerCase(Locale.ROOT).endsWith(".pdf")) { | |
| LOGGER.log(Level.WARNING, "Not a PDF file"); | |
| throw new IllegalArgumentException("Not a PDF file"); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java`
around lines 61 - 88, The warning logs in OpenDataLoaderPDF (the block using
inputPdfName, path, LOGGER) currently print the full user-supplied path and must
be removed or sanitized; update each LOGGER.log call in that validation block to
avoid echoing inputPdfName or the full path and instead log a safe identifier
such as the file base name (path.getFileName().toString()) or a constant
placeholder like "<redacted-path>" (for the InvalidPathException case where path
is not available), e.g. replace messages that concatenate inputPdfName with ones
that use only the safeName or the placeholder so no absolute/user path is
emitted.
Signed-off-by: pavanpai769 <151814231+pavanpai769@users.noreply.github.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java (1)
41-47:⚠️ Potential issue | 🟡 MinorRestore the missing
@throws IOExceptionJavadoc contract.Line 48 still declares
throws IOException, but the Javadoc no longer documents it. This creates API-doc drift.🛠️ Suggested Javadoc fix
/** * Processes a PDF file to extract its content and structure based on the provided configuration. * * `@param` inputPdfName The path to the input PDF file. * `@param` config The configuration object specifying output formats and other options. + * `@throws` IllegalArgumentException if the input path is invalid. + * `@throws` IOException if an I/O error occurs during processing. * */🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java` around lines 41 - 47, The Javadoc for the PDF processing method in class OpenDataLoaderPDF is missing the `@throws` IOException tag; update the Javadoc for the method that "Processes a PDF file to extract its content and structure" (the method that declares "throws IOException") to include an `@throws` IOException entry describing when an IOException is thrown (e.g., when the input PDF cannot be read or I/O fails), ensuring the docblock matches the method signature.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java`:
- Around line 41-47: The Javadoc for the PDF processing method in class
OpenDataLoaderPDF is missing the `@throws` IOException tag; update the Javadoc for
the method that "Processes a PDF file to extract its content and structure" (the
method that declares "throws IOException") to include an `@throws` IOException
entry describing when an IOException is thrown (e.g., when the input PDF cannot
be read or I/O fails), ensuring the docblock matches the method signature.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 26c2ad76-59dd-4c7d-9020-3e6ddfa1b820
📒 Files selected for processing (1)
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java
|
Here's how the review process will go from here:
|
Fixes the issue discussed in #375.
Changes:
Summary by CodeRabbit
Bug Fixes
Documentation