I have a URL checker that I use in Perl. I was wondering how something like this would be done in Clojure. I have a file with thousands of URLs and I'd like the output file to contain the URL (minus http://, https://) and a simple :1 for valid and :0 for false. Ideally, I could check each site concurrently, considering that this is one of Clojure's strengths.
http://www.google.com
http://www.cnn.com
http://www.msnbc.com
http://www.abadurlisnotgood.com
www.google.com:1
www.cnn.com:1
www.msnbc.com:1
www.abadurlisnotgood.com:0
I assume by "valid URL" you mean HTTP response 200. This might work. It requires clojure-contrib. Change map to pmap to attempt to make it parallel, like Arthur Ulfeldt mentioned.
(use '(clojure.contrib duck-streams
java-utils
str-utils))
(import '(java.net URL
URLConnection
HttpURLConnection
UnknownHostException))
(defn check-url [url]
(str (re-sub #"^(?i)http:/+" "" url)
":"
(try
(let [c (cast HttpURLConnection
(.openConnection (URL. url)))]
(if (= 200 (.getResponseCode c))
1
0))
(catch UnknownHostException _
0))))
(defn check-urls-from-file [filename]
(doseq [line (map check-url
(read-lines (as-file filename)))]
(println line)))
Given your example as input:
user> (check-urls-from-file "urls.txt")
www.google.com:1
www.cnn.com:1
www.msnbc.com:1
www.abadurlisnotgood.com:0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With