Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I want to show Arabic text from pdf using pdfbox?

it shows letters but in reverse order of Arabic I use the following code

  PDDocument pddDocument = PDDocument.load(new File("test1.pdf"));
        PDFTextStripper textStripper = new PDFTextStripper();

        TextNormalize normalize = new TextNormalize("UTF-8");
        String Text = textStripper.getText(pddDocument);

      Text=normalize.makeLineLogicalOrder(Text, true);
      Text = normalize.normalizePres(Text);
      Text = normalize.normalizeDiac(Text);
        System.out.println(Text);
like image 570
Mohab Avatar asked Dec 20 '25 06:12

Mohab


1 Answers

The problem solved with downloading icu4j-49_1.jar from http://site.icu-project.org/download/49#TOC-ICU4J-Download putting it in class path

Then re-writing the code as follow

PDDocument pddDocument = PDDocument.load(new File("test1.pdf"));
            PDFTextStripper textStripper = new PDFTextStripper();
            String Text = textStripper.getText(pddDocument);
            System.out.println(Text);
like image 169
Mohab Avatar answered Dec 21 '25 20:12

Mohab



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!