create a chinese folder use python

Question

I now encounter a problem about chinese charater. I use beautifulsoup to extract data,and want to creat a folder use the name of extracted data. data likes:

<A href="love">搴1824)</A>

I want to extract '搴1824)',so I do like

soup.find('a',href='love')

but in console,it come out:

霉(1824)

I have use '# -- coding:utf-8 --' at head of my source. It must be some encoding problem,anyone can give some good material about python work with non-english?

I want create a folder named '搴1824)' I do :

if not os.path.exists(dir_name):
        os.mkdir('./pic/'+dir_name)

when I find a folder named"霉(1824)' exists,so why it still come out:

OSError: [Errno 17] File exists: './vguagua_pic/\xc3\x90\xc3\x87\xc3\x97\xc3\xb9(1824)'

thx

kennytm · Accepted Answer

Even if your .py script is written in UTF-8, if the webpage is not, the parsed text may not be correct.

The webpage's encoding is actually GB-2312 (or GB-18030), but BeautifulSoup guessed the webpage's encoding wrongly as ISO-8859-1, and with that incorrect assumption, converting to UTF-8 and causing mojibake. We can verify:

>>> b'\xc3\x90\xc3\x87\xc3\x97\xc3\xb9'.decode('utf8').encode('latin1').decode('gb2312')
'搴

You could add from_encoding="gb2312" (in bs4) or fromEncoding="gb2312" (in 3.x) to the BeautifulSoup constructor to force the encoding, as documented in the Beautiful Soup Documentation (and also in Chinese 涓妗.

create a chinese folder use python

Tags:

python

beautifulsoup

kuafu

1 Answers

kennytm

Recent Activity

Donate For Us

create a chinese folder use python

Tags:

python

beautifulsoup

kuafu

1 Answers

kennytm

Related questions

Recent Activity

Donate For Us