Coder Perfect

What is the best way to download a file over HTTP?

Problem

I have a little program that downloads an MP3 file from a website on a regular basis and then creates/updates a podcast XML file that I’ve put to iTunes.

Python is used to write the text processing that creates/updates the XML file. To obtain the real MP3 file, I use wget inside a Windows.bat file. I’d like the complete utility to be written in Python.

I couldn’t find a way to actually download the file in Python, so I used wget instead.

So, how can I use Python to get the file?

Asked by Owen

Solution #1

One more, using urlretrieve:

import urllib
urllib.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

(Use import urllib.request and urllib.request.urlretrieve for Python 3+)

Another one, this time with a “progressbar”

import urllib2

url = "http://download.thinkbroadband.com/10MB.zip"

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break

    file_size_dl += len(buffer)
    f.write(buffer)
    status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
    status = status + chr(8)*(len(status)+1)
    print status,

f.close()

Answered by PabloG

Solution #2

Use urllib.request.urlopen():

import urllib.request
with urllib.request.urlopen('http://www.example.com/') as f:
    html = f.read().decode('utf-8')

This is the simplest way to use the library, as it does not include any error handling. You may also perform more advanced things like change headers.

The technique is in urllib2 on Python 2:

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()

Answered by Corey

Solution #3

Use the python requests library in 2012.

>>> import requests
>>> 
>>> url = "http://download.thinkbroadband.com/10MB.zip"
>>> r = requests.get(url)
>>> print len(r.content)
10485760

You may get it by running pip install requests.

Because the API is substantially simpler, Requests has numerous advantages over the alternatives. This is especially true if authentication is required. In this instance, urllib and urllib2 are extremely inconvenient and uncomfortable to use.

2015-12-30

The progress bar has received a lot of positive feedback. Sure, it’s cool. There are various commercially available options presently, including tqdm:

from tqdm import tqdm
import requests

url = "http://download.thinkbroadband.com/10MB.zip"
response = requests.get(url, stream=True)

with open("10MB", "wb") as handle:
    for data in tqdm(response.iter_content()):
        handle.write(data)

This is pretty much the same implementation that @kvance outlined 30 months ago.

Answered by hughdbrown

Solution #4

import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
with open('test.mp3','wb') as output:
  output.write(mp3file.read())

The wb in open(‘test.mp3′,’wb’) opens a file in binary mode (and deletes any existing files) so you can save data instead of just text.

Answered by Grant

Solution #5

Answered by bmaupin

Post is based on https://stackoverflow.com/questions/22676/how-to-download-a-file-over-http