I know of wget -mkp http://example.com to mirror a site and all of its internally-linked files.
But, I need to backup a site where all the images are stored on a separate domain. How could I download those images as well with wget, and update the src tags accordingly?
Thank you!
A slightly modified version of @PatrickHorn's answer:
First cd into top directory containing downloaded files.
"first wget to find pages recursively, albeit only from that one domain"
wget --recursive --timestamping -l inf --no-remove-listing --page-requisites http://site.com
"second wget which spans hosts but does not retrieve pages recursively"
find site.com -name '*.htm*' -exec wget --no-clobber --span-hosts --timestamping --page-requisites http://{} \;
I've tried this, and it seems to have mostly worked - I get all the .htm(l) pages from just the site I'm after, then the external files. I haven't yet been able to change the links to be relative to the local copies of the external files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With