Python: How to get a table row by a string inside it via BeatifullSoup?

Question

<tr class="list even">
    <td class="list">5</td>
    <td class="list"><s>BI</s>→MU</td>
    <td class="list"><s>TEACHER</s>→TEACHER</td>
    <td class="list">Hello I am a Text</td>
    <td class="list">5b</td>
    <td class="list">BI3</td></tr>

This is one of the table rows. There are some with one row as an inline header, but idc for them.

So, I want to only get the rows that contains the string "8f" but not only the td, the whole tr In case there are multiple rows containing the string it should give me a list from them

for row in soup.find_all('tr', class_='list even'):
    if '5b' in row.text:
        print(row)
        for cell in row.find_all('td'):
            if "5b" not in cell.text:
                print(cell.text)

for row in soup.find_all('tr', class_='list odd'):
    if '5b' in row.text:
        for cell in row.find_all('td'):
            if "5b" not in cell.text:
                print(cell.text)

I have this now, but it adds a newline before the last table field: https://haste.thevillage.chat/foguvakixa.py

if "5b" not in cell.text:

This is because if i request the data for 5d i dont need to know again that its 5d. So this just filters the class itselfs out

QHarr · Accepted Answer

You could use pandas read_html to grab table then filter on klasse column

import pandas as pd

def get_lectures_two(df, klasse):    
    new_df = df[df['(Klasse(n))'] == klasse]
    return new_df

def get_df(url):
    df = pd.read_html(url)[0]
    df = df[~df['Stunde'].str.contains("LEHRER")]
    return df

df = get_df('https://niwla23.gitlab.io/download/vertreterdemo.html')
print(get_lectures_two(df, '5b'))

With bs4 4.7.1 + you can use :contains and :has, along with the appropriate column index via nth-of-type to target the appropriate rows (I use pandas here just to quickly generate a nice tabular output for viewing.... you already have the lists of lists from bs4 and could use csv to write for example)

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

def get_lectures(klasse):
    rows = []
    for row in soup.select(f'.mon_list tr:has(td:nth-of-type(5):contains("{klasse}"))'):
        rows.append([td.text.replace('\xa0','') for td in row.select('td')])
    return rows

r = requests.get('https://niwla23.gitlab.io/download/vertreterdemo.html')
soup = bs(r.content, 'lxml')
headers = [th.text for th in soup.select('th.list')]
klasse = '5b'

df = pd.DataFrame(get_lectures(klasse), columns = headers)
print(df)

KunduK · Answer

Try the following code.fetch the row text and check if its having 5b

from bs4 import BeautifulSoup
import requests
res=requests.get("http://niwla23.gitlab.io/download/vertreterdemo.html")
soup=BeautifulSoup(res.text,'lxml')

for row in soup.find_all('tr', class_='list even'):
    if '5b' in row.text:
        print(row.text)

Python: How to get a table row by a string inside it via BeatifullSoup?

Tags:

python

html

beautifulsoup

web-scraping

Niwla23

2 Answers

QHarr

KunduK

Recent Activity

Donate For Us

Python: How to get a table row by a string inside it via BeatifullSoup?

Tags:

python

html

beautifulsoup

web-scraping

Niwla23

2 Answers

QHarr

KunduK

Related questions

Recent Activity

Donate For Us