<tr class="list even">
<td class="list">5</td>
<td class="list"><s>BI</s>→MU</td>
<td class="list"><s>TEACHER</s>→TEACHER</td>
<td class="list">Hello I am a Text</td>
<td class="list">5b</td>
<td class="list">BI3</td></tr>
This is one of the table rows. There are some with one row as an inline header, but idc for them.
So, I want to only get the rows that contains the string "8f" but not only the td, the whole tr
In case there are multiple rows containing the string it should give me a list from them
for row in soup.find_all('tr', class_='list even'):
if '5b' in row.text:
print(row)
for cell in row.find_all('td'):
if "5b" not in cell.text:
print(cell.text)
for row in soup.find_all('tr', class_='list odd'):
if '5b' in row.text:
for cell in row.find_all('td'):
if "5b" not in cell.text:
print(cell.text)
I have this now, but it adds a newline before the last table field: https://haste.thevillage.chat/foguvakixa.py
if "5b" not in cell.text:
This is because if i request the data for 5d i dont need to know again that its 5d. So this just filters the class itselfs out
You could use pandas read_html to grab table then filter on klasse column
import pandas as pd
def get_lectures_two(df, klasse):
new_df = df[df['(Klasse(n))'] == klasse]
return new_df
def get_df(url):
df = pd.read_html(url)[0]
df = df[~df['Stunde'].str.contains("LEHRER")]
return df
df = get_df('https://niwla23.gitlab.io/download/vertreterdemo.html')
print(get_lectures_two(df, '5b'))
With bs4 4.7.1 + you can use :contains and :has, along with the appropriate column index via nth-of-type to target the appropriate rows (I use pandas here just to quickly generate a nice tabular output for viewing.... you already have the lists of lists from bs4 and could use csv to write for example)
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
def get_lectures(klasse):
rows = []
for row in soup.select(f'.mon_list tr:has(td:nth-of-type(5):contains("{klasse}"))'):
rows.append([td.text.replace('\xa0','') for td in row.select('td')])
return rows
r = requests.get('https://niwla23.gitlab.io/download/vertreterdemo.html')
soup = bs(r.content, 'lxml')
headers = [th.text for th in soup.select('th.list')]
klasse = '5b'
df = pd.DataFrame(get_lectures(klasse), columns = headers)
print(df)
Try the following code.fetch the row text and check if its having 5b
from bs4 import BeautifulSoup
import requests
res=requests.get("http://niwla23.gitlab.io/download/vertreterdemo.html")
soup=BeautifulSoup(res.text,'lxml')
for row in soup.find_all('tr', class_='list even'):
if '5b' in row.text:
print(row.text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With