I use curl all the time at the command line, to request an url and parse it's markup.
I do this easily for authenticated pages by going in Chrome, loading the url, and then opening the inspector, finding the url at the top of the Network history, right clicking it, and choose Copy | Copy as Curl
I'd like to do the same with a single page application, that of course runs tons of other things to render itself, like javascript, or whatever.
Are there any tools out there that will let me easily change the "curl" to something else, and it will download the generated source of the page?
e.g. Normally I'd run this to get the source of the authenticated page if it wasn't a single page application (copied from Chrome)
curl 'https://mywebsite.com/singlePageApplication' \
-H 'Connection: keep-alive' \
-H 'Pragma: no-cache' \
-H 'Cache-Control: no-cache' \
-H 'Accept-Language: en,en-US;q=0.9' \
-H 'Cookie: session=XXX"
I'd like to be able to just switch that to something else, and it take in all the headers and preferably, exactly the same syntax as curl, and give me the generated source.
downloadGeneratedSource 'https://mywebsite.com/singlePageApplication' \
-H 'Connection: keep-alive' \
-H 'Pragma: no-cache' \
-H 'Cache-Control: no-cache' \
-H 'Accept-Language: en,en-US;q=0.9' \
-H 'Cookie: session=XXX"
Does this exist anywhere?
As root and Brad Parks pointed out in their comment, Selenium, PhantomJS or Pupeteer are fancy tools designed to emulate the behavior of a browsing user and thus allow you to download the source code of single-page app (SPA) in an easy-configurable manner.
On the other hand, you are right that cURL can do similar things if used in a script. In the early 2000s I used wget in combination with grep, awk, sed and perl to automatize the regular download of access-controlled pages with dynamic URLs created using CGI. This is indeed a scenario very comparable to nowadays SPAs.
I chose wget over curl because pipe-processing its output was easier, but it was necessary to tailor such a script to your specific use-case. If you are fluent in RegEx, that was a job a couple of minutes since the target URLs had some syntax I could look for - maybe you could do the same?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With