I'm trying to save a local version of the clojure docs with the wget
command:
wget --user-agent=firefox --ignore-length -c -km
"http://clojuredocs.org/quickref/Clojure Core"`
but keep getting a no such file or directory
when it reaches some links like http://clojuredocs.org/clojure_core/clojure.core/rem which obviously exist because you can get there with a browser. I'm guessing this is a problem with the way wget
is building/concatenating the path. How do you fix this?. I've tried other options like --user-agent
and --ignore-length
but I keep getting the same results as seen below(you'll have to open the image to see the messages properly).
This seems to be a problem with mirroring because the command:
wget http://clojuredocs.org/clojure_core/clojure.core/rem
works okay.
Answer
Your problem originates from the behavior of wget
to save the URL http://clojuredocs.org/clojure_core
to a file named ./clojuredocs.org/clojure_core
, but the URL http://clojuredocs.org/clojure_core/
(notice the trailing slash) to a file named ./clojuredocs.org/clojure_core/index.html
.
Once the file ./clojuredocs.org/clojure_core
is created, following downloads of e.g. http://clojuredocs.org/clojure_core/something
are doomed to fail, because wget
can't create a directory ./clojuredocs.org/clojure_core
anymore.
This was reported as bug #29647 on the GNU Wget Bugtracker.
With the provided patch (which obviously didn't make into the official source code) this problem vanishes and wget
is forced to create the directory first. So, the download can continue.
However, http://clojuredocs.org/clojure_core
gets saved as ./clojuredocs.org/clojure_core.1
, not as ./clojuredocs.org/clojure_core/index.html
.
I cannot judge if the link-converter (-k
) is smart enough to make the links in this mirrored local copy working... I stopped the download after a few minutes. (I'm too impatient ;)
)
No comments:
Post a Comment