Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest way to loop through a list and create a single string?

For example:

list = [{"title_url": "joe_white", "id": 1, "title": "Joe White"},
        {"title_url": "peter_black", "id": 2, "title": "Peter Black"}]

How can I efficiently loop through this to create:

Joe White, Peter Black
<a href="/u/joe_white">Joe White</a>,<a href="/u/peter_black">Peter Black</a>

Thank you.

like image 555
ensnare Avatar asked Nov 22 '25 09:11

ensnare


2 Answers

The first is pretty simple:

', '.join(item['title'] for item in list)

The second requires something more complicated, but is essentially the same:

','.join('<a href="/u/%(title_url)s">%(title)s</a>' % item for item in list)

Both use generator expressions, which are similar to list comprehensions without the need for an extra list creation

like image 106
Michael Mrozek Avatar answered Nov 24 '25 23:11

Michael Mrozek


Here are some speed comparisons to check these two methods that you've been given.

First, we create the list of 100000 entries; boring and perhaps not a genuine sample due to having shorter strings, but I'm not worried about that now.

>>> items = [{"title_url": "abc", "id": i, "title": "def"} for i in xrange(100000)]

First, Michael Mrozek's answer:

>>> def michael():
...     ', '.join(item['title'] for item in items)
...     ','.join('<a href="/u/%(title_url)s">%(title)s</a>' % item for item in items)
... 

Nice and simple. Then systempuntoout's answer (note that at this stage I'm just comparing the iteration performance, and so I've switched the %s and tuple formatting for %()s dict formatting; I'll time the other method later):

>>> def systempuntoout():
...     titles = []
...     urls = []
...     for item in items:
...             titles.append(item['title'])
...             urls.append('<a href="/u/%(title_url)s">%(title)s</a>' % item)
...     ', '.join(titles)
...     ','.join(urls)
... 

Very well. Now to time them:

>>> import timeit
>>> timeit.timeit(michael, number=100)
9.6959049701690674
>>> timeit.timeit(systempuntoout, number=100)
11.306489944458008

Summary: don't worry about going over the list twice, combined with generator comprehension it's less expensive than the overhead of list.append; Michael's solution is about 15% faster on 100000 entries.

Secondly, there's whether you should use '%(...)s' % dict() or '%s' % tuple(). Taking Michael's answer as the faster and simpler of the two, here's michael2:

>>> def michael2():
...     ', '.join(item['title'] for item in items)
...     ','.join('<a href="/u/%s">%s</a>' % (item['title_url'], item['title']) for item in items)
... 
>>> timeit.timeit(michael2, number=100)
7.8054699897766113

And so we come to the clear conclusion here that the string formatting is faster with a tuple than a dict - almost 25% faster. So if performance is an issue and you're dealing with large quantities of data, use this method michael2.

And if you want to see something really scary, take systempuntoout's original answer with class intact:

>>> def systempuntoout0():
...     class node():
...             titles = []
...             urls = []
...             def add_name(self, a_title):
...                     self.titles.append(a_title)
...             def add_link(self, a_title_url, a_title):
...                     self.urls.append('<a href="/u/%s">%s</a>' % (a_title_url, a_title))
...     node = node()
...     for entry in items:
...             node.add_name(entry["title"])
...             node.add_link(entry["title_url"], entry["title"])
...     ', '.join(node.titles)
...     ','.join(node.urls)
... 
>>> timeit.timeit(systempuntoout0, number=100)
15.253098011016846

A shade under twice as slow as michael2.


One final addition, to benchmark str.format as introduced in Python 2.6, "the future of string formatting" (though I still don't understand why, I like my %, thank you very much; especially as it's faster).

>>> def michael_format():
...     ', '.join(item['title'] for item in items)
...     ','.join('<a href="/u/{title_url}">{title}</a>'.format(**item) for item in items)
... 
>>> timeit.timeit(michael_format, number=100)
11.809207916259766
>>> def michael2_format():
...     ', '.join(item['title'] for item in items)
...     ','.join('<a href="/u/{0}">{1}</a>'.format(item['title_url'], item['title']) for item in items)
... 
>>> timeit.timeit(michael2_format, number=100)
9.8876869678497314

11.81 instead of 9.70, 9.89 instead of 7.81 - it's 20-25% slower (consider also that it's only the second expression in the function which uses it, as well.

like image 28
Chris Morgan Avatar answered Nov 24 '25 23:11

Chris Morgan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!