In python3 I want to extract information from a page using requests and beautifulsoup
import requests
from bs4 import BeautifulSoup
link = "https://portal.stf.jus.br/processos/listarPartes.asp?termo=AECIO%20NEVES%20DA%20CUNHA"
try:
res = requests.get(link)
except (requests.exceptions.HTTPError, requests.exceptions.RequestException, requests.exceptions.ConnectionError, requests.exceptions.Timeout) as e:
print(str(e))
except Exception as e:
print("Exceção")
html = res.content.decode('utf-8')
soup = BeautifulSoup(html, "lxml")
pag = soup.find('div', {'id': 'total'})
print(pag)
In this case the information is in an HTML snippet like this:
<div id="total" style="display: inline-block"><input type="hidden" name="totalProc" id="totalProc" value="35">35</div>
What I want to access is value, in this case 35. Capture number "35"
That's why I used "pag = soup.find('div', {'id': 'total'})". To slowly isolate just the number 35
But the content returned was just: <div id="total" style="display: inline-block"><img src="ajax-loader.gif"/></div>
Please does anyone know how to capture value content only?
It is dynamically pulled from another XHR call you can find in the network tab
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://portal.stf.jus.br/processos/totalProcessosPartes.asp?termo=AECIO%20NEVES%20DA%20CUNHA&total=0')
soup = bs(r.content, 'lxml')
print(soup.select_one('#totalProc')['value'])
With regex
import requests, re
r = requests.get('https://portal.stf.jus.br/processos/totalProcessosPartes.asp?termo=AECIO%20NEVES%20DA%20CUNHA&total=0')
soup = bs(r.content, 'lxml')
print(re.search('value=(\d+)',r.text).groups(0)[0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With