Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy returning weirdly encoded string

Tags:

python

scrapy

I'm using scrapy and getting a weird response. The url looks like this (notice the utf-8 encoded check mark: https://www.example.com?sort=relevancy&utf8=%E2%9C%9

I'm getting a 200 response but the string is bytes looking like this:

b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xec\xbd\xedv\xdb\xb6\xb20\xfc?W\x81r\x9f\'\xb6OE\x8a\....

What is this? How do I handle this? Can I have scrapy automatically decode stuff that looks like this?

like image 953
superdee Avatar asked Dec 08 '25 10:12

superdee


1 Answers

The answer is on the @drec4s and @furas comments.

You can try first to decode the response

response.body.decode('utf-8')

Or also

response.body_as_unicode()

If you get decoding errors or an unreadable string you might try different encodings, but most likely the response's body is compressed. Check in the response headers for something like

content-encoding: br

Or it could also be 'gzip'

In that case, you need to ask the server to return an uncompressed response by setting in the request headers:

accept-encoding: deflate
like image 197
Way Too Simple Avatar answered Dec 10 '25 00:12

Way Too Simple