Tuesday, June 12, 2018

`no such file or directory` when mirroring with wget



I'm trying to save a local version of the clojure docs with the wget command:



wget --user-agent=firefox --ignore-length -c -km 
"http://clojuredocs.org/quickref/Clojure Core"`



but keep getting a no such file or directory when it reaches some links like http://clojuredocs.org/clojure_core/clojure.core/rem which obviously exist because you can get there with a browser. I'm guessing this is a problem with the way wget is building/concatenating the path. How do you fix this?. I've tried other options like --user-agent and --ignore-length but I keep getting the same results as seen below(you'll have to open the image to see the messages properly).



enter image description here



This seems to be a problem with mirroring because the command:



wget http://clojuredocs.org/clojure_core/clojure.core/rem



works okay.


Answer



Your problem originates from the behavior of wget to save the URL http://clojuredocs.org/clojure_core to a file named ./clojuredocs.org/clojure_core, but the URL http://clojuredocs.org/clojure_core/ (notice the trailing slash) to a file named ./clojuredocs.org/clojure_core/index.html.



Once the file ./clojuredocs.org/clojure_core is created, following downloads of e.g. http://clojuredocs.org/clojure_core/something are doomed to fail, because wget can't create a directory ./clojuredocs.org/clojure_core anymore.



This was reported as bug #29647 on the GNU Wget Bugtracker.



With the provided patch (which obviously didn't make into the official source code) this problem vanishes and wget is forced to create the directory first. So, the download can continue.




However, http://clojuredocs.org/clojure_core gets saved as ./clojuredocs.org/clojure_core.1, not as ./clojuredocs.org/clojure_core/index.html.



I cannot judge if the link-converter (-k) is smart enough to make the links in this mirrored local copy working... I stopped the download after a few minutes. (I'm too impatient ;))


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...