Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

can't print character by character in a Chinese string in Python

Tags:

python

unicode

My test.txt file contains these characters:

地藏菩萨本愿经卷上
忉利天宫神通品第一

I have this simple program:

f = open("test.txt")
text = f.read()
f.close()

print text

for c in text:
    print c,

print "\n------------"

for i in range(len(text)):
    print text[i],

Here is the result:

地藏菩萨本愿经卷上
忉利天宫神通品第一
------------ 
å œ ° è — マ è マ © è ミ ¨ æ œ ¬ æ „ ¿ ç » マ å ヘ · ä ¸ Š 
å ¿ ‰ å ˆ © å ¤ © å ® « ç ¥ ž é € š å “ チ ç ¬ ¬ ä ¸ € 


å œ ° è — マ è マ © è ミ ¨ æ œ ¬ æ „ ¿ ç » マ å ヘ · ä ¸ Š 
å ¿ ‰ å ˆ © å ¤ © å ® « ç ¥ ž é € š å “ チ ç ¬ ¬ ä ¸ €

"text" gets printed out OK if I use "Print text". But both methods trying to print character by character failed.

What's happening?

like image 545
lessthanl0l Avatar asked Nov 30 '25 21:11

lessthanl0l


1 Answers

You need to decode the data read from the file to utf-8 first:

>>> with open('abc1') as f:
        text = f.read().decode('utf-8')
...     
>>> print text                              
地藏菩萨本愿经卷上 忉利天宫神通品第一
>>> for x in text:
    print x,
...     
地 藏 菩 萨 本 愿 经 卷 上   忉 利 天 宫 神 通 品 第 一

Or use io.open to open the file with required encoding:

>>> import io
>>> with io.open('abc1', encoding='utf-8') as f:
    text = f.read()
>>> for x in text:                              
    print x,
...     
地 藏 菩 萨 本 愿 经 卷 上   忉 利 天 宫 神 通 品 第 一
like image 118
Ashwini Chaudhary Avatar answered Dec 02 '25 11:12

Ashwini Chaudhary



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!