Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decode [email\xa0protected] while web scraping using python

When i am trying to extract mail id from the below tag using python lxml.html it is showing [email\xa0protected], any one help me to decode this.

<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4420366a373021283e2136042921202d27212a30262520212a6a272b29">[email&#160;protected]</a>
like image 615
Srinath Neela Avatar asked Feb 04 '26 08:02

Srinath Neela


1 Answers

Finally, I found the answer:

fp = '4420366a373021283e2136042921202d27212a30262520212a6a272b29' # taken from data-cfemail html attribut which holds encrypted email

    def deCFEmail(fp):
        try:
            r = int(fp[:2],16)
            email = ''.join([chr(int(fp[i:i+2], 16) ^ r) for i in range(2, len(fp), 2)])
            return email
        except (ValueError):
            pass

Using the above code, we can decode CloudFare's base58 value to text.

Example:

s = '4420366a373021283e2136042921202d27212a30262520212a6a272b29'

print(deCFEmail(s))
like image 199
Srinath Neela Avatar answered Feb 05 '26 22:02

Srinath Neela