I want to extract the 20th table from a Wikipedia page https://en.wikipedia.org/wiki/...
I now use this code, but it only extracts the first heading table.
the_url <- "https://en.wikipedia.org/wiki/..."
tb <- the_url %>% read_html() %>%
html_node("table") %>%
html_table(fill = TRUE)
What should I do to get the specific one? Thank you!!
Instead of indexing where table position could move, you could anchor according to relationship to element with id prize_money. Return just a single node for efficiency. Avoid longer xpaths as they can be fragile.
library(rvest)
table <- read_html('https://en.wikipedia.org/wiki/2018_FIFA_World_Cup#Prize_money') %>%
html_node(xpath = "//*[@id='Prize_money']/parent::h4/following-sibling::table[1]") %>%
html_table(fill = T)
since you have a specific table you want to scrape you can identify in in the html_node() call by using the xpath of the webpage element:
library(dplyr)
library(rvest)
the_url <- "https://en.wikipedia.org/wiki/2018_FIFA_World_Cup"
the_url %>%
read_html() %>%
html_nodes(xpath='/html/body/div[3]/div[3]/div[5]/div[1]/table[20]') %>%
html_table(fill=TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With