Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ScratchFileBuffer not closed message when read pdf

Tags:

java

I have this code to read and extract a string from a pdf.

It works well but the log but the log repeatedly throws this message and I don't know why it is:

public class Test {
    public static void main(String[] args) {

        PDDocument doc = null;
        try {
            doc = PDDocument.load(new File("C:/prueba.pdf"));
            PDFTextStripper pdfs = new PDFTextStripper();
            String textOfPdf = "";

            textOfPdf = pdfs.getText(doc);
            String regex = "([A-Z0-9]{5}-[A-Z0-9]{5}-[A-Z0-9]{5}-[A-Z0-9]{5}-[A-Z0-9]{5}-[A-Z0-9]{5})";
            Pattern patron = Pattern.compile(regex);

            Matcher emparejador = patron.matcher(textOfPdf);
            emparejador.find();
            String text = emparejador.group(0);

            System.out.print(text);
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if (doc != null) {
                    doc.close();
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

12:52:37.335 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{25, 0}
12:52:37.336 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{26, 0}
12:52:37.336 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{28, 0}
12:52:37.336 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{27, 0}
12:52:37.336 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{30, 0}
12:52:37.337 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{31, 0}
12:52:37.338 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser - parsed=COSObject{5, 0}

12:52:37.772 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.772 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.772 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
12:52:37.773 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!

I have also tried tess4j library but the same thing happens. any ideas?

Regards

like image 595
Cybor696 Avatar asked Nov 15 '25 20:11

Cybor696


1 Answers

This is most likely an internal parser issue. By the looks of it, some of the PDF objects aren't explicitly closing the scratch files they using but are getting closed in a finalize method.

It doesn't look like an issue to me and there is not much you can do except turn off debug level logging for that class.

log4j.logger.org.apache.pdfbox.io.ScratchFileBuffer=WARN
like image 94
Marc G. Smith Avatar answered Nov 18 '25 10:11

Marc G. Smith



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!