I want to use the XPDF-based PDFTOTEXT command-line tool to look at PDF files, hoping to get UTF-8 output. I have seen others on StackOverflow getting it -- questions 4039930, 3809761 and 13618330 show that others have been able to use it.
When I use the option -enc utf-8 these messages are displayed:
Syntax Error: Couldn't find unicodeMap file for the 'utf-8' encoding
Config Error: Couldn't get text encoding
I've seen documentation that (among others) UTF-8 encoding is "predefined" but I cannot find the file that I need to point to. (I've looked at multiple different downloads of XPDF-based software and have not yet found it.)
Any pointers would be appreciated.
EDIT: I am on Windows.
You should use UTF-8 instead utf-8. See pdftotext help message:
$ pdftotext -listenc
Available encodings are:
UCS-2
ASCII7
Latin1
UTF-8
ZapfDingbats
Symbol
Proof code:
$ pdftotext -eol unix -nopgbrk -layout -enc utf-8 file.pdf
Syntax Error: Couldn't find unicodeMap file for the 'utf-8' encoding
Command Line Error: Couldn't get text encoding
$ pdftotext -eol unix -nopgbrk -layout -enc UTF-8 file.pdf
$ echo $?
0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With