Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect superscript with ItextSharp?

Hy

I am using ITextSharp to parse a pdf file to text output. I want to know if I can catch if the pdf contains subscript or superscript, does anyone knows how to make the difference between a normal character and a superscript in a pdf using ITextSharp, or other library ?

Thanks

like image 890
nba bogdan Avatar asked Jan 24 '26 06:01

nba bogdan


1 Answers

Disclaimer: I don't actually have any evidence for this but...

I would expect super/subscript to be identical to normal text. It's the same font, just smaller. If it happens to be on the same line as other text, super/sub scripts are raised and lowered - but you won't be able to detect that with some explicit meta-tag in a layout-oriented format such as PDF.

In other words, I'd guess that you need to identify super/subscripts by heuristics: finding text that's smaller and vertically displaced compared to other text on the "same" line. Whether that's easy to do or not depends on the PDF creator and the details of ITextSharp, since even identifying a "line" is not necessarily straightforward.

like image 143
Eamon Nerbonne Avatar answered Jan 25 '26 20:01

Eamon Nerbonne



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!