I'm trying to crawl coinmarketcap.com with BeautifulSoup (I know there is an API, for training purposes, I want to use BeautifulSoup). Every piece of information crawled so far was pretty easy to select, but now I like to get the "Holder Statistics" looking like this:
holder stats
My testing code for selecting the specific div containing the desired information looks like this:
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com/currencies/bitcoin/holders/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
holders = soup.select('div', class_='n0m7sa-0 kkBhMM')
print(holders)
The output of print(holders) is not the expected content of the div, but rather the whole html content of the website. I append a picture of this because the output code would be too long.
Output Code
Does anybody know, why this is the case?
You should use .select() when you want to use as css selector. In this case, holders = soup.select('div', class_='n0m7sa-0 kkBhMM') the class part is essentially ignored...and it finds all the <div> with any class. To specify that particular class use either the .find_all(), or change your .select()
holders = soup.select('div.n0m7sa-0.kkBhMM')
or
holders = soup.find_all('div', class_='n0m7sa-0 kkBhMM')
Now in both of these cases, it will return None or an empty list. That is because that class attribute is not in the source html. This site is dynamic, so those classes are generated after the initial request. So you either need to use Selenium to render the page first, then pull the html, or see if there's an api to get the data source directly.
There is an api to get the data:
import requests
import pandas as pd
alpha = ['count', 'ratio']
payload = {
'id': '1',
'range': '7d'}
for each in alpha:
url = f'https://api.coinmarketcap.com/data-api/v3/cryptocurrency/detail/holders/{each}'
jsonData = requests.get(url, params=payload).json()['data']['points']
if each == 'count':
count_df = pd.DataFrame.from_dict(jsonData,orient='index')
count_df = count_df.rename(columns={0:'Total Addresses'})
else:
ratio_df = pd.DataFrame.from_dict(jsonData,orient='index')
df = count_df.merge(ratio_df, how='left', left_index=True, right_index=True)
df = df.sort_index()
Output:
print(df.to_string())
Total Addresses topTenHolderRatio topTwentyHolderRatio topFiftyHolderRatio topHundredHolderRatio
2021-11-24T00:00:00Z 39279627 5.25 7.19 10.51 13.26
2021-11-25T00:00:00Z 39255811 5.25 7.19 10.49 13.22
2021-11-26T00:00:00Z 39339840 5.25 7.19 10.51 13.24
2021-11-27T00:00:00Z 39391849 5.23 7.11 10.45 13.18
2021-11-28T00:00:00Z 39505340 5.24 7.11 10.45 13.18
2021-11-29T00:00:00Z 39502099 5.24 7.11 10.43 13.16
2021-11-30T00:00:00Z 39523000 5.24 7.11 10.38 13.12
Your Other option is that the data is within the <script> tags in json format. S0 you can pull it out from the initial request site that way too:
from bs4 import BeautifulSoup
import requests
import json
import re
url = 'https://coinmarketcap.com/currencies/bitcoin/holders/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
jsonStr = str(soup.find('script', {'id':'__NEXT_DATA__'}))
jsonStr = re.search(r"({.*})", jsonStr).groups()[0]
jsonData = json.loads(jsonStr)['props']['initialProps']['pageProps']['info']['holders']
df = pd.DataFrame(jsonData).drop('holderList', axis=1).drop_duplicates()
Output:
print(df.to_string())
holderCount dailyActive topTenHolderRatio topTwentyHolderRatio topFiftyHolderRatio topHundredHolderRatio
0 39523000 963625 5.24 7.11 10.38 13.12
For the Social Stats in the Project Info, that's within a specific api:
import requests
import pandas as pd
url = 'https://api.coinmarketcap.com/data-api/v3/project-info/detail?slug=bitcoin'
jsonData = requests.get(url).json()
socialStats = jsonData['data']['socialStats']
row = {}
for k, v in socialStats.items():
if type(v) == dict:
row.update(v)
else:
row.update({k:v})
df = pd.DataFrame([row])
Output:
print(df.to_string())
cryptoId commits contributors stars forks watchers lastCommitAt members updatedTime
0 1 31588 836 59687 30692 3881 2021-11-30T00:09:02.000Z 3617460 2021-11-30T16:00:02.365Z
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With