I have a problem with logging in in my script. Despite all other good answers that I found on stackoverflow, none of the solutions worked for me.
I am scraping a web forum for my PhD research, its URL is http://forum.axishistory.com.
The webpage I want to scrape is the memberlist - a page that lists the links to all member profiles. One can only access the memberlist if logged in. If you try to access the memberlist without logging in, it shows you the log in form.
The URL of the memberlist is this: http://forum.axishistory.com/memberlist.php.
I tried the httr-package:
library(httr)
members  <-  GET("http://forum.axishistory.com/memberlist.php", authenticate("username", "password"))
members_html <- html(members)
The output is the log in form.
Then I tried RCurl:
library(RCurl)
members_html <- htmlParse(getURL("http://forum.axishistory.com/memberlist.php", userpwd = "username:password"))
members_html
The output is the log in form - again.
Then i tried the list() function from this topic - Scrape password-protected website in R :
handle <- handle("http://forum.axishistory.com/")
path   <- "ucp.php?mode=login"
login <- list(
  amember_login = "username"
  ,amember_pass  = "password"
  ,amember_redirect_url = 
    "http://forum.axishistory.com/memberlist.php"
)
response <- POST(handle = handle, path = path, body = login)
and again! The output is the log in form.
The next thing I am working on is RSelenium, but after all these attempts I am trying to figure out whether I am probably missing something (probably something completely obvious).
I have looked at other relevant posts in here, but couldn't figure out how to apply the code to my case:
How to use R to download a zipped file from a SSL page that requires cookies
Scrape password-protected website in R
How to use R to download a zipped file from a SSL page that requires cookies
https://stackoverflow.com/questions/27485311/scrape-password-protected-https-website-in-r
Web scraping password protected website using R
Thanks to Simon I found the answer here: Using rvest or httr to log in to non-standard forms on a webpage
library(rvest)
url       <-"http://forum.axishistory.com/memberlist.php"
pgsession <-html_session(url)
pgform    <-html_form(pgsession)[[2]]
filled_form <- set_values(pgform,
                      "username" = "username", 
                      "password" = "password")
submit_form(pgsession,filled_form)
memberlist <- jump_to(pgsession, "http://forum.axishistory.com/memberlist.php")
page <- html(memberlist)
usernames <- html_nodes(x = page, css = "#memberlist .username") 
data_usernames <- html_text(usernames, trim = TRUE) 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With