Python

Question

I'm involved in a web project. I have to choose the best ways to represent the code, so that other people can read it without problems/headaches/whatever.

The "problem" I've tackled now is to show a nice formatted url (will be taken from a "title" string).

So, let's suppose we have a title, fetched from the form:

title = request.form['title'] # 'Hello World, Hello Cat! Hello?'

then we need a function to format it for inclusion in the url (it needs to become 'hello_world_hello_cat_hello'), so for the moment I'm using this one which I think sucks for readability:

str.replace(title, ' ', '-').str.replace(title, '!', '').str.replace(title, '?', '').str.replace(string, ',' '').lower()

What would be a good way to compact it? Is there already a function for doing what I'm doing?

I'd also like to know which characters/symbols I should strip from the url.

Paulo Bu · Accepted Answer

You can use urlencode() which is the way for url-encode strings in Python.

If otherwise you want a personalized encoding as your expected output and all you want to do is leave the words in the final string you can use the re.findall function to grab them and later join them with and underscore:

>>>s = 'Hello World, Hello Cat! Hello?'
>>>'_'.join(re.findall(r'\w+',s)).lower()
'hello_world_hello_cat_hello'

What this does is:

g = re.findall(r'\w+',s) # ['Hello', 'World', 'Hello', 'Cat', 'Hello']
s1 = '_'.join(g) # 'Hello_World_Hello_Cat_Hello'
s1.lower() # 'hello_world_hello_cat_hello'

This technique also works well with numbers in the string:

>>>s = 'Hello World, Hello Cat! H123ello? 123'
>>>'_'.join(re.findall(r'\w+',s)).lower()
'hello_world_hello_cat_h123ello_123'

Another way which I think should be faster is to actually replace non alphanumeric chars. This can be accomplished with re.sub by grabbing all the non alphanumerics toghether and replace them with _ like this:

>>>re.sub(r'\W+','_',s).lower()
'hello_world_hello_cat_h123ello_123'

Well... not really, speed tests:

$python -mtimeit -s "import re" -s "s='Hello World, Hello Cat! Hello?'" "'_'.join(re.findall(r'\w+',s)).lower()"
100000 loops, best of 3: 5.08 usec per loop


$python -mtimeit -s "import re" -s "s='Hello World, Hello Cat! Hello?'" "re.sub(r'\W+','_',s).lower()"
100000 loops, best of 3: 6.55 usec per loop

jshanley · Answer

You could use urlencode() from the urllib module in python2 or urllib.parse module in python3.

This will work assuming you're trying to use the text in the query string of your URL.

title = {'title': 'Hello World, Hello Cat! Hello?'} # or get it programmatically as you did
encoded = urllib.urlencode(title)
print encoded # title=Hello+World%2C+Hello+Cat%21+Hello%3F

Python - Shortest way to format a string to an url

Tags:

replace

slug

2 Answers

Paulo Bu

jshanley

Recent Activity

Donate For Us