I am trying to scrape some data off a website. The data that I want is listed in a table, but there are multiple tables and no ID's. I then had the idea that I would find the header just above the table I was searching for and then use that as an indicator.
This has really troubled me, so as a last resort, I wanted to ask if there were someone who knows how to BeautifulSoup to find the table. A snipped of the HTML code is provided beneath, thanks in advance :)
The table I am interested in, is the table right beneath <h2>Mine neaste vagter</h2>
        <h2>Min aktuelle vagt</h2>
        
        
            <div>
                <a href='/shifts/detail/595212/'>Flere detaljer</a>
            <p>Vagt starter: <b>11/06 2021 - 07:00</b></p>
            <p>Vagt slutter: <b>11/06 2021 - 11:00</b></p>
            
            
            
                <h2>Masker</h2>
                <table class='list'>
                    <tr><th>Type</th><th>Fra</th><th> </th><th>Til</th></tr>
                    
                    <tr>
                        <td>Fri egen regningD</td>
                        <td>07:00</td>
                        <td> - </td>
                        <td>11:00</td>
                    </tr>
                    
                </table>
            
            </div>
        
    <hr>
    
    
    
    
    
    
    
    
    
    
    
    
    
        <h2>Mine neaste vagter</h2>
        <table class='list'>
            <tr>
                <th class="alignleft">Dato</th>
                <th class="alignleft">Rolle</th>
                <th class="alignleft">Tidsrum</th>
                <th></th>
                <th class="alignleft">Bytte</th>
                <th class="alignleft" colspan='2'></th>
            </tr>
            
                <tr class="rowA separator">
                    
                        <td>
                            <h3>12/6</h3>
                        </td>
                    
                    <td>Kundeservice</td>
                    <td>18:00 → 21:30 (3.5 t)</td>
                    <td style="max-width: 20em;"></td>
                    <td>
                      
                        <a href="/shifts/ajax/popup/595390/" class="swap shiftpop">
                          Byt denne vagt
                        </a>
                      
                    </td>
                    
                    <td><a href="/shifts/detail/595390/">Detaljer</td>
                      
                      <td>
                        
                           
                        
                    </td>
                </tr>
Here are two approaches to find the correct <table>:
Since the table you want is the last one in the HTML, you can use find_all() and using index slicing [-1] to find the last table:
print(soup.find_all("table", class_="list")[-1])
Find the h2 element by text, and the use the find_next() method to find the table:
print(soup.find(lambda tag: tag.name == "h2" and "Mine neaste vagter" in tag.text).find_next("table"))
You can use :-soup-contains (or just :contains) to target the <h2> by its text and then use find_next to move to the table:
from bs4 import BeautifulSoup as bs
html = '''your html'''
soup = bs(html,  'lxml')
soup.select_one('h2:-soup-contains("Mine neaste vagter")').find_next('table')
This is assuming the HTML, as shown, is returned by whatever access method you are using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With