Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you use wget (with mk option) to mirror a site and its externally-linked images?

I know of wget -mkp http://example.com to mirror a site and all of its internally-linked files.

But, I need to backup a site where all the images are stored on a separate domain. How could I download those images as well with wget, and update the src tags accordingly?

Thank you!

like image 729
Britney A Avatar asked Nov 28 '25 03:11

Britney A


1 Answers

A slightly modified version of @PatrickHorn's answer:

First cd into top directory containing downloaded files.

"first wget to find pages recursively, albeit only from that one domain"

wget --recursive --timestamping -l inf --no-remove-listing --page-requisites http://site.com

"second wget which spans hosts but does not retrieve pages recursively"

find site.com -name '*.htm*' -exec wget --no-clobber --span-hosts --timestamping --page-requisites http://{} \;

I've tried this, and it seems to have mostly worked - I get all the .htm(l) pages from just the site I'm after, then the external files. I haven't yet been able to change the links to be relative to the local copies of the external files.

like image 110
cofiem Avatar answered Dec 02 '25 04:12

cofiem