Coder Perfect

Lambda Plus filter vs. list comprehension


I have a basic filtering requirement: I have a list and I need to filter it by one of the items’ attributes.

My code was as follows:

my_list = [x for x in my_list if x.attribute == value]

But then I thought to myself, “Wouldn’t it be better if I wrote it this way?”

my_list = filter(lambda x: x.attribute == value, my_list)

It’s more understandable, and the lambda may be removed to improve performance if necessary.

Is there any disadvantage to using the second method? Is there a difference in performance? Is it possible that I’m missing the Pythonic WayTM totally and should try something different (like using itemgetter instead of lambda)?

Asked by Agos

Solution #1

It’s surprising how much beauty differs from person to person. List comprehension is much clearer to me than filter+lambda, but use whichever method you like.

There are two things that could cause your filter usage to slow down.

The first is function call overhead: filter is likely to be slower than list comprehension as soon as you use a Python function (whether built by def or lambda). It’s almost definitely not enough to make a difference, and you shouldn’t worry about performance until you’ve timed your code and discovered a bottleneck, but the difference will be noticeable.

Another potential overhead is that the lambda is compelled to access a scoped variable (value). This is slower than accessing a local variable, since list comprehension in Python 2.x only works with local variables. If you’re using Python 3.x, the list comprehension runs in its own function, so it’ll be accessing values through a closure as well, and this distinction won’t be relevant.

Another alternative is to use a generator rather than a list comprehension:

def filterbyvalue(seq, value):
   for el in seq:
       if el.attribute==value: yield el

Then you’ve replaced both list comprehension and filter with a hopefully meaningful function name in your main code (which is where readability truly matters).

Answered by Duncan

Solution #2

In Python, this is almost a religious problem. Despite Guido’s initial plans to remove map, filter, and reduce from Python 3, there was enough protest that only reduce was moved from built-ins to functools.reduce.

List comprehensions are easier for me to read. Because all of the behaviour is on the surface rather than inside the filter function, the expression I for I in list if i.attribute == value] makes it clearer what is going on.

The performance difference between the two ways is minor, so I wouldn’t be too concerned about it. This is something I would only optimize if it was the bottleneck in your application, which is improbable.

Also, since the BDFL wanted filters removed from the language, list comprehensions must be more Pythonic as a result;-)

Answered by Tendayi Mawushe

Solution #3

Because any speed difference is certain to be insignificant, deciding whether to utilize filters or list comprehensions is a matter of personal preference. In general, I prefer comprehensions (as do the majority of the other responses here), but there is one instance in which I prefer filter.

A very frequent use case is pulling out the values of some iterable X subject to a predicate P(x):

[x for x in X if P(x)]

However, there are situations when you wish to apply a function to the values first:

[f(x) for x in X if P(f(x))]

Consider the following scenario.

primes_cubed = [x*x*x for x in range(1000) if prime(x)]

This, I believe, is significantly more appealing than employing a filter. But consider this:

prime_cubes = [x*x*x for x in range(1000) if prime(x*x*x)]

We want to filter against the post-computed value in this scenario. Aside from the cost of computing the cube twice (imagine a more expensive calculation), there’s also the issue of breaching the DRY aesthetic by writing the statement twice. In this scenario, I’d probably use

prime_cubes = filter(prime, [x*x*x for x in range(1000)])

Answered by I. J. Kennedy

Solution #4

Although filtering is the “faster method,” the “Pythonic way” is to ignore such considerations until performance is crucial (in which case you wouldn’t be using Python!).

Answered by Umang

Solution #5

I just wanted to point out that in Python 3, filter() is an iterator object, so you’d have to send your filter function call to list() to get the filtered list. So here’s how to do it in Python 2:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = filter(lambda num: num % 2 == 0, lst_a)

The values of lists b and c are identical, and they were completed around the same time that filter() was equivalent [x for x in y if z]. In 3, however, the identical code would result in list c holding a filter object rather than a filtered list. To get the same results in 3:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = list(filter(lambda num: num %2 == 0, lst_a))

The issue is that list() takes an iterable as an input and uses it to generate a new list. Because you have to iterate through the output from filter() as well as the original list, using filter in this fashion in Python 3 takes up to twice as long as using [x for x in y if z].

Answered by Jim50

Post is based on