Friday, December 1, 2017

Download and automatically rename to the hyperlink text all PDF files on a page



I use Chrono Download Manager for bulk downloading all files of X type on a given page and it works very well.



I am interested in downloading many PDF files from a website, all of which have non-descriptive filenames. The hyperlinked text for each file, however, is perfectly descriptive.



Is there any reasonable way for a non-coder to pull off downloading all of those files and automatically renaming each one so the filename is the same as the hyperlinked text for the download?




If it makes a difference, this is the page.



Thanks!


Answer



The following procedure with aria2 is not full automatic. You have to manually copy and paste all the download links in a simple text file. but aria2 can automatically download and rename according to that text file.



So how do you make that text file? First create a new text file in any text editor. Let that text file name aria2-script.txt or any name you want. Put the dircet download links in it. Remember to put the direct download links only otherwise aria2 download the webpage. Here is the syntax of that aria2-script.txt file:



http://example-link.com/direct-link/fileA.pdf

out=fileA.pdf
checksum=sha-1=sha-goes-here


You may skip the checksum part. Add many links as you want. Remember to put TWO spaces before out= and checksum= (and other options) otherwise aria2 will take it as an URL. For example, your text file will be:



https://www.csb.gov/assets/Record/Board_Action_Report_-_Notation_Item_2018-57.pdf
out=Recommendation 2012-03-I-CA-R14, from the Chevron Refinery Fire investigation.pdf

https://www.csb.gov/assets/Record/Board_Action_Report_-_Notation_Item_2018-56.pdf

out=Recommendation 2012-03-I-CA-R13, from the Chevron Refinery Fire investigation.pdf


Copy those direct download links by right clicking on the file link in that webpage. Now download aria2 from it's GitHub release page, open command prompt in that folder and run the command:



aria2c.exe --check-certificate=false --dir="Folder" --input-file="aria2-script.txt"


The --check-certificate=false option is to just remove certification complexity. There are many options to speed up the download procedure. aria2 will automatically rename those files. Remember to put full path of aria2c.exe and aria2-script.txt file. For further details read this aria2 options list and aria2 input file.


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...