Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read Unicode file as Unicode string in Python [closed]

I have a file that is encoded in Unicode or UTF-8 (I don't know which). When I read the file in Python 3.4, the resulting string is interpreted as an ASCII string. How do I convert it to a Unicode string like u"text"?

like image 330
Melab Avatar asked May 22 '26 19:05

Melab


1 Answers

The term "Unicode" refers to the standard, not to a particular encoding. Since files in computers are binary, there exist different ways of encoding Unicode data in binary files. One of them is "UTF-8".

You can consult https://docs.python.org/3/howto/unicode.html

An example taken from this document (in the section "Reading and Writing Unicode Data")

with open('unicode.txt', encoding='utf-8') as f:
  for line in f:
    print(repr(line))

In python 3, unlike python2, unicode string constants are not written with a "u".

like image 195
Sci Prog Avatar answered May 25 '26 07:05

Sci Prog