Saturday, April 6, 2019

archiving - What is the best way to archive (spider) a site that is going to be removed?












Three different blogs that I read have recently announced that they are going to be discontinued and removed from the web. Although the archived pages will probably be in Google's cache for a few weeks after they've gone and some of the pages will be in the Way Back Machine I'd like to archive those sites to my hard disk for future reference.



What is the best way to do this? Is there any software that transforms a blog (e.g. Blogspot) into a chronological PDF?


Answer



I would start with using WGET to archive the sites as they are (in html), afterwards conversion to PDF is simple.



See http://www.tufat.com/s_html2ps_html2pdf.htm and http://www.gnu.org/software/wget/


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...