Skip to content

After executing OpenDataLoaderPDF.processFile, an error occurs when trying to delete the PDF file: **The file cannot be deleted because it is being used by another process.** #408

@caojiebing

Description

@caojiebing

Bug: PDF file cannot be deleted after OpenDataLoaderPDF.processFile (file in use by another process)

Environment

  • OpenDataLoader Version: 2.2.1
  • Java Version: OpenJDK 17.0.10
  • OS: Windows (file lock behavior observed)

Issue Description

After calling OpenDataLoaderPDF.processFile to process a PDF document, attempting to delete the source PDF file fails with the error:
The file cannot be deleted because it is being used by another process.

Root cause: The PDF PDDocument and related resources are not properly closed after processing completes, leaving an active file handle/lock on the PDF file.

Steps to Reproduce

  1. Follow the official Java Quick Start guide: https://opendataloader.org/docs/quick-start-java
  2. Process a local PDF file using OpenDataLoaderPDF.processFile(...)
  3. Immediately attempt to delete the processed PDF file
  4. Observe the file-in-use deletion error

Suggested Fix

Add resource cleanup logic to ensure all PDF-related resources (including PDDocument) are closed after processing finishes.

Changes to DocumentProcessor.java

  1. Add a new closePdfResources() private method to safely close PDF resources
  2. Wrap the existing processFile logic in a try-finally block to guarantee cleanup
// New method to release all PDF resources
private static void closePdfResources() {
    try {
        StaticLayoutContainers.closeContrastRatioConsumer();
    } catch (Exception e) {
        LOGGER.log(Level.WARNING, "Unable to close contrast ratio consumer: " + e.getMessage());
    }
    
    PDDocument document = StaticResources.getDocument();
    if (document != null) {
        try {
            document.close();
        } catch (Exception e) {
            LOGGER.log(Level.WARNING, "Unable to close PDF document: " + e.getMessage());
        }
    }
}

// Updated processFile method with try-finally cleanup
public static void processFile(String inputPdfName, Config config) throws IOException {
    try {
        preprocessing(inputPdfName, config);
        calculateDocumentInfo();
        Set<Integer> pagesToProcess = getValidPageNumbers(config);
        List<List<IObject>> contents;
        
        if (StaticLayoutContainers.isUseStructTree()) {
            contents = TaggedDocumentProcessor.processDocument(inputPdfName, config, pagesToProcess);
        } else if (config.isHybridEnabled()) {
            contents = HybridDocumentProcessor.processDocument(inputPdfName, config, pagesToProcess);
        } else {
            contents = processDocument(inputPdfName, config, pagesToProcess);
        }
        
        if (config.needsStructuredProcessing()) {
            sortContents(contents, config);
        }
        
        ContentSanitizer contentSanitizer = new ContentSanitizer(
            config.getFilterConfig().getFilterRules(),
            config.getFilterConfig().isFilterSensitiveData()
        );
        contentSanitizer.sanitizeContents(contents);
        generateOutputs(inputPdfName, contents, config);
    } finally {
        // Critical: Ensure resources are closed even if an exception occurs
        closePdfResources();
    }
}

Verification

After applying this fix:

  • PDF files are properly unlocked after processing
  • Files can be deleted immediately after processFile returns
  • No resource leaks or file locks remain

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions