Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NonType Object when transforming a scraped table to DataFrame

I am trying to scrape a list of stocks tickers that are displayed in a table in the following link: http://www.advfn.com/nyse/newyorkstockexchange.asp?companies=A I scraped the table using beautiful soup but when I transform it to Pandas Data Frame I get an error:

TypeError: 'NoneType' object is not callable

I tried the following code:

url = 'http://www.advfn.com/nyse/newyorkstockexchange.asp?companies=A'
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
table = soup.find("table",{"class":"market tab1"})
df = pd.read_html(table)

but it does not work. How do I solve it? and why do I get the error?

full error log:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
    796         try:
--> 797             tables = p.parse_tables()
    798         except Exception as caught:

~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in parse_tables(self)
    212     def parse_tables(self):
--> 213         tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
    214         return (self._build_table(table) for table in tables)

~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _build_doc(self)
    618                 # try to parse the input in the simplest way
--> 619                 r = parse(self.io, parser=parser)
    620             try:

~/anaconda3/lib/python3.7/site-packages/lxml/html/__init__.py in parse(filename_or_url, parser, base_url, **kw)
    939         parser = html_parser
--> 940     return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
    941 

src/lxml/etree.pyx in lxml.etree.parse()

src/lxml/parser.pxi in lxml.etree._parseDocument()

TypeError: 'NoneType' object is not callable

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-23-c3e05c494f63> in <module>
      5 table = soup.find("table",{"class":"market tab1"})
      6 #print(table)
----> 7 df = pd.read_html(table)

~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only)
    985                   decimal=decimal, converters=converters, na_values=na_values,
    986                   keep_default_na=keep_default_na,
--> 987                   displayed_only=displayed_only)

~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
    799             # if `io` is an io-like object, check if it's seekable
    800             # and try to rewind it before trying the next parser
--> 801             if hasattr(io, 'seekable') and io.seekable():
    802                 io.seek(0)
    803             elif hasattr(io, 'seekable') and not io.seekable():

TypeError: 'NoneType' object is not callable

beg of table:

<table cellpadding="0" cellspacing="1" class="market tab1" width="610">
<colgroup><col/><col/><col class="c"/></colgroup>
<tr><td class="tabh" colspan="3"><b>Companies listed on the NYSE</b></td></tr>
<tr><th>Equity</th><th>Symbol</th><th>Info</th></tr>
<tr class="ts0"><td align="left"><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/stock-price">A K Steel</a></td><td><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/stock-price">AKS</a></td><td><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/chart"><img src="/s/stock-chart.gif"/></a><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/news"><img src="/s/stock-news.gif"/></a><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/financials"><img src="/s/fundamentals.gif"/></a><a href="http://ih.advfn.com/stock-market/NYSE/a-k-steel-AKS/trades"><img src="/s/stock-trades.gif"/></a></td></tr>
like image 450
Ben2pop Avatar asked Oct 18 '25 08:10

Ben2pop


1 Answers

You are passing a <class 'bs4.element.Tag'> element into pandas read_html. You need to convert it to a string.

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = 'http://www.advfn.com/nyse/newyorkstockexchange.asp?companies=A'
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
table = soup.find("table",{"class":"market tab1"})
df = pd.read_html(str(table))
print(df)

Outputs:

[                                    0       1     2
0        Companies listed on the NYSE     NaN   NaN
1                              Equity  Symbol  Info
2                           A K Steel     AKS   NaN
3                               A M R     AMR   NaN
4                      A M R Cp 7.875     AAR   NaN
5                               A V X     AVX   NaN
6                               A a R     AIR   NaN
7               A.h. Belo Corporation     AHC   NaN
8                         Aaron Rents   RNT.A   NaN
9                         Aaron Rents     RNT   NaN
10                        Aarons Cl A   AAN.A   NaN
11                        Aarons Inc.     AAN   NaN
12               Ab Svensk Cdss Arbmn     CBJ   NaN
13                   Ab Svensk Ekport     AXF   NaN
14               Ab Svensk Ekportkrdt     SQT   NaN
15               Ab Svensk Ekportkred     DVK   NaN
16               Ab Svensk Ekportkred     IWK   NaN
17               Ab Svensk Ekportkred     RCW   NaN
18               Ab Svensk Ekportkred     EOA   NaN
19                 Ab Svensk Msci Arn     MIS   NaN
20                  Ab Svensk Russell     REU   NaN
21                  Ab Svensk Sp Arns     SAD   NaN
22                  Ab Svensk Sp Arns     MHG   NaN
23                                Abb     ABB   NaN
24                        Abbott Labs     ABT   NaN
25                Abercrombie & Fitch     ANF   NaN
26                            Abitibi     ABY   NaN
27                                Abm     ABM   NaN
28                             Acadia     AKR   NaN
29                  Acc Bear Amex Egy     IMW   NaN
..                                ...     ...   ...
194                           Ashland     ASH   NaN
195                   Aspen Insurance     AHL   NaN
196  Assisted Living Concepts (nevada     ALC   NaN
197                Associated Estates     AEC   NaN
198                          Assurant     AIZ   NaN
199                  Assured Guaranty     AGO   NaN
200                           Astoria      AF   NaN
201                       Astrazeneca     AZN   NaN
202                 Atlanta Gas Light     ATG   NaN
203                    Atlas Pipeline     APL   NaN
204        Atlas Pipeline Holdings Lp     AHD   NaN
205                             Atmos     ATO   NaN
206                               Att       T   NaN
207                               Att     ATT   NaN
208                   Atwood Oceanics     ATW   NaN
209                      Au Optronics     AUO   NaN
210                           Autoliv     ALV   NaN
211                        Autonation      AN   NaN
212                          Autozone     AZO   NaN
213              Av Svensk Ekportkred     NEH   NaN
214                         Avalonbay     AVB   NaN
215              Aventine Renew Enrgy     AVR   NaN
216                    Avery Dennison     AVY   NaN
217                  Avis Budget Grp.     CAR   NaN
218                            Avista     AVA   NaN
219                             Avnet     AVT   NaN
220                     Avon Products     AVP   NaN
221                               Axa     AXA   NaN
222                              Axis     AXS   NaN
223                               Azz     AZZ   NaN

[224 rows x 3 columns]]
like image 154
Bitto Bennichan Avatar answered Oct 20 '25 22:10

Bitto Bennichan