I want to process the output of a running program line-by-line (think tail -f) with a Python 3 script (on Linux).
The programs output, which is getting piped to the script, is encoded in latin-1, so, in Python 2, I used the codecs module to decode the input of sys.stdin properly:
#!/usr/bin/env python
import sys, codecs
sin = codecs.getreader('latin-1')(sys.stdin)
for line in sin:
print '%s "%s"' % (type (line), line.encode('ascii','xmlcharrefreplace').strip())
This worked:
<type 'unicode'> "Hi! öäß"
...
However, in Python 3, sys.stdin.encoding is UTF-8, and if I just read naively from stdin:
#!/usr/bin/env python3
import sys
for line in sys.stdin:
print ('type:{0} line:{1}'.format(type (line), line))
I get this error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 4: invalid start byte
How can I read non UTF-8 text data piped to stdin in Python 3?
import sys
import io
with io.open(sys.stdin.fileno(),'r',encoding='latin-1') as sin:
for line in sin:
print ('type:{0} line:{1}'.format(type (line), line))
yields
type:<class 'str'> line:Hi! öäß
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With