Coder Perfect

In Python, decode UTF-8 urls.


As a Python beginner, I’ve spent a lot of time learning the language. I’m not sure how I’d ever decipher such a URL:

to the following in Python 2.7: title==прaвовая+

url=urllib.unquote(url.encode(“utf8”)) returns a very unattractive result.

There is still no solution; any assistance would be greatly appreciated.

Asked by swordholder

Solution #1

You want to decode the data, which is UTF-8 encoded bytes escaped with URL quoting, using urllib.parse.unquote(), which transparently handles decoding from percent-encoded data to UTF-8 bytes and subsequently to text:

from urllib.parse import unquote

url = unquote(url)


>>> from urllib.parse import unquote
>>> url = ''
>>> unquote(url)

urllib.unquote() is the Python 2 counterpart, however it returns a bytestring, which you’d have to decode manually:

from urllib import unquote

url = unquote(url).decode('utf8')

Answered by Martijn Pieters

Solution #2

You can use urllib.parse if you’re running Python 3.

url = """"""

import urllib.parse



Answered by pavan

Solution #3

With the requests library, you can also achieve the desired result:

import requests

url = ""

print(f"Before: {url}")
print(f"After:  {requests.utils.unquote(url)}")


$ python3


If you’re currently using requests and don’t want to use another library, this could be useful.

Answered by ivanleoncz

Solution #4

HTML entities can be included in URLs. This also replaces them.

#from urllib import unquote #earlier python version
from urllib.request import unquote
from html import unescape

Answered by Roland Puntaier

Post is based on