Loading...

Download a list of url's in Python

:heavy_exclamation_mark: This post is older than a year. Consider some information might not be accurate anymore. :heavy_exclamation_mark:

This article covers how to download a url in python.

There are 2 possibilities:

  • wget
  • urllib

wget

To download a file you can use the os.system module and use wget of the Linux operating system. This won’t work for Windows directly. You may install wget for Windows or using cygwin.

import os
  h = os.popen('wget -q -O foo1.txt http://foo.html')
  h.close()
  s = open('foo1.txt').read()

The option -q in wget is quiet, i.e. it turns off wget’s output. Use it if you don’t want to see the output. For example you have a text file with links like download.txt.

http://media.cinhtau.net/01.jpg
http://media.cinhtau.net/02-03.jpg
http://media.cinhtau.net/04.jpg
http://media.cinhtau.net/05.jpg
http://media.cinhtau.net/06-07.jpg
http://media.cinhtau.net/08.jpg
http://media.cinhtau.net/09.jpg
http://media.cinhtau.net/10-11.jpg
http://media.cinhtau.net/12.jpg
http://media.cinhtau.net/13.jpg
http://media.cinhtau.net/14.jpg
http://media.cinhtau.net/15.jpg
http://media.cinhtau.net/16.jpg
http://media.cinhtau.net/17.jpg

Now you want do download each link in this file, you write a small python program that reads the file contents and do the work with wget for you.

__author__="tan"
__date__ ="$Jul 05, 2009 9:38:04 AM$"
import os
if __name__ == "__main__":
    print "Download";
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-f", "--file", dest="file")
(options, args) = parser.parse_args()
if len(args) < 0:
    parser.error("We need a download list!")
# reading contents
file = open(options.file, "r")
try:
    for line in file:
        line = line.rstrip('\n')
        #now download link
        h = os.popen('wget ' + line)
        h.close()
finally:
    file.close()

Now invoke the python programme with this option and enjoy the work.

python download.py -f download.txt

urllib

Another possibility is to use the ‘‘urllib’’ module with equivalent functions of wget.

import sys, urllib
def reporthook(*a): print a
for url in sys.argv[1:]:
     i = url.rfind('/')
     file = url[i+1:]
     print url, "->", file
     urllib.urlretrieve(url, file, reporthook)
Please remember the terms for blog comments.