Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slicing function on <for> loop

I am a beginner coder, using python 3.7.1 on a windows 10 with Visual Code Studio.

As exercise I am trying to scrape from a webpage some data organized by a table.

Now, I want to extract some information only, which is nested into the <td valign="top" style="width:25%;">Parte edibile, %</td><td align="left" valign="top" style="font-weight:bold;">75</td> values. As delimiter here I have <td> ... </td>

I did try many way to get only the first and second of each row, since the third is NOT interesting for me, it is just a waste of memory which I don't need.

To do that, I am using a 'for' loop but as I've understood by BeautifulSoup spreadsheet, when it takes a loop, all nested arguments of each row are united in one and therefore if I want slice [0:1] = >> the first and second "string" arguments <td> </td>, is not possible.

Here's the simple loop 'for':

for alim in soup.find_all('td')[0:1]: return alim.text

Am I correct? Anyone could propose me some smarter solution to solve my question?

Thank you in advance for any advice. Max

like image 353
Massimo Avatar asked Mar 21 '26 09:03

Massimo


2 Answers

If I understand correctly, you have table with 3+ columns and you are interested only in first two columns.

To extract data from first two columns, you have many possibilities. One is using CSS selectors:

data = '''
    <table>
    <tr>
        <td valign="top" style="width:25%;">I. Parte edibile, %</td>
        <td align="left" valign="top" style="font-weight:bold;">I. 75</td>
        <td>This doesn't interest me</td>
    </tr>
    <tr>
        <td valign="top" style="width:25%;">II. Parte edibile, %</td>
        <td align="left" valign="top" style="font-weight:bold;">II. 75</td>
        <td>II. This doesn't interest me</td>
    </tr>
    <tr>
        <td valign="top" style="width:25%;">III. Parte edibile, %</td>
        <td align="left" valign="top" style="font-weight:bold;">III. 75</td>
        <td>III. This doesn't interest me</td>
    </tr>
    </table>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

for col1, col2 in zip(soup.select('td:nth-of-type(1)'), soup.select('td:nth-of-type(2)')):
    print('{: <25} {}'.format(col1.text, col2.text))

Prints:

I. Parte edibile, %       I. 75
II. Parte edibile, %      II. 75
III. Parte edibile, %     III. 75

Or you can use list slicing:

rows = []
for tr in soup.select('tr'):
    rows.append([td.text for td in tr.select('td')[0:2]])

for row in rows:
    print('{: <25} {}'.format(*row))

EDIT: For parsing the page http://www.bda-ieo.it/test/ComponentiAlimento.aspx?Lan=Ita&foodid=1300_2 you can use this code:

from bs4 import BeautifulSoup
import requests

url = 'http://www.bda-ieo.it/test/ComponentiAlimento.aspx?Lan=Ita&foodid=1300_2'

soup = BeautifulSoup(requests.get(url).text, 'html.parser')

for col1, col2 in zip(soup.select('#tblComponenti > tr.testonormale > td:nth-of-type(1)'), soup.select('#tblComponenti > tr.testonormale > td:nth-of-type(2)')):
    print('{: <70} {}'.format(col1.text, col2.text))

Prints:

Parte edibile, %                                                       75
Energia, ricalcolata, kJ                                               406
Energia, Ric con fibra, kJ                                             406
Energia, ricalcolata, kcal                                             96
Energia, Ric con fibra, kcal                                           96
Proteine totali, g                                                     16,8
   Proteine animali, g                                                 16,8
   Proteine vegetali, g                                                0,0
Lipidi totali, g                                                       2,6
   Lipidi animali, g                                                   2,6
   Lipidi vegetali, g                                                  0,0
Colesterolo, mg                                                        61
Carboidrati disponibili (MSE), g                                       1,5
   Amido (MSE), g                                                      0,0
   Carboidrati solubili (MSE), g                                       1,5
Fibra alimentare totale, g                                             0,0
Alcol, g                                                               0,0
Acqua, g                                                               76,5
Ferro, mg                                                              2,8
Calcio, mg                                                             148
Sodio, mg                                                              104
Potassio, mg                                                           278
Fosforo, mg                                                            196
Zinco, mg                                                              4,20
Magnesio, mg                                                           22
Rame, mg                                                               1,00
Selenio, µg                                                            37,0
Cloro, mg                                                              130
Iodio, µg                                                              29
Manganese, mg                                                          0,07
Zolfo, mg                                                              150
Vitamina B1, Tiamina, mg                                               0,06
Vitamina B2, Riboflavina, mg                                           0,26
Vitamina C, mg                                                         0
Niacina, mg                                                            14,00
Vitamina B6, mg                                                        0,14
Folati totali, µg                                                      9
Acido pantotenico, mg                                                  0,65
Biotina, µg                                                            6,0
Vitamina B12, µg                                                       0,6
Retinolo equivalente                                                   32
   Retinolo eq. (RE), µg                                               32
   Retinolo, µg                                                        tr
   ß-carotene eq., µg                                                  0,29
Vitamina E (ATE), mg                                                   11,00
Vitamina D, µg                                                         1,30
Acidi grassi saturi totali, g                                          0,00
Somma degli acidi butirrico, caproico, caprilico e caprico, g          0,00
Acido laurico, g                                                       0,14
Acido miristico, g                                                     1,01
Acido palmitico, g                                                     0,13
Acido stearico, g                                                      tr
Acido arachidico, g                                                    0,00
Acido beenico, g                                                       0,40
Acidi grassi monoinsaturi totali, g                                    0,00
Acido miristoleico, g                                                  0,10
Acido palmitoleico, g                                                  0,17
Acido oleico, g                                                        0,01
Acidi eicosenoico, g                                                   0,01
Acido erucico, g                                                       0,85
Acidi grassi polinsaturi totali, g                                     0,01
Acido linoleico, g                                                     0,01
Acido linolenico, g                                                    tr
Acido arachidonico, g                                                  0,27
Acido eicosapentaenoico (EPA), g                                       0,52
Acido decosaesaenoico (DHA), g                                         0,04
Altri acidi grassi polinsaturi, g                                      175
Triptofano, mg                                                         726
Treonina, mg                                                           823
Isoleucina, mg                                                         1330
Leucina, mg                                                            1379
Lisina, mg                                                             349
Metionina, mg                                                          183
Cistina, mg                                                            595
Fenilalanina, mg                                                       425
Tirosina, mg                                                           759
Valina, mg                                                             758
Arginina, mg                                                           675
Istidina, mg                                                           919
Alanina, mg                                                            1764
Acido aspartico, mg                                                    2261
Acido glutammico, mg                                                   722
Glicina, mg                                                            460
Prolina, mg                                                            650
Serina, mg                                                             1,5
Glucosio, g                                                            0,0
Fruttosio, g                                                           0,0
Galattosio, g                                                          0,0
Saccarosio (MSE), g                                                    0,0
Maltosio (MSE), g                                                      0,0
like image 142
Andrej Kesely Avatar answered Mar 24 '26 00:03

Andrej Kesely


If the return type is a list you should use [0:2] as the final number is non-inclusive (however the return will jump out of the loop) so needs to be changed slightly:

result = []
for alim in soup.find_all('td')[0:2]:
     result.append(alim.text)
return result
like image 21
Joshua Loader Avatar answered Mar 23 '26 22:03

Joshua Loader



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!