Coder Perfect

Subtraction of a list in Python


I’d like to try something similar:

>>> x = [1,2,3,4,5,6,7,8,9,0]  
>>> x  
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0]  
>>> y = [1,3,5,7,9]  
>>> y  
[1, 3, 5, 7, 9]  
>>> y - x   # (should return [2,4,6,8,0])

Python lists, however, do not support this. What’s the best approach to go about it?

Asked by daydreamer

Solution #1

Make use of a list comprehension technique:

[item for item in x if item not in y]

You can utilize the – infix syntax by just typing:

class MyList(list):
    def __init__(self, *args):
        super(MyList, self).__init__(args)

    def __sub__(self, other):
        return self.__class__(*[item for item in self if item not in other])

Then you may use it like this:

x = MyList(1, 2, 3, 4)
y = MyList(2, 5, 2)
z = x - y   

If you don’t need list properties (for example, ordering), use sets, as suggested by the other replies.

Answered by aaronasterling

Solution #2

Use set difference

>>> z = list(set(x) - set(y))
>>> z
[0, 8, 2, 4, 6]

Or you might just have x and y be sets so you don’t have to do any conversions.

Answered by quantumSoup

Solution #3

if you have a problem with duplicates or ordering items:

I for I in an if not I in b or b.remove(i)] I for I in an if not I in b or b.remove(i)]

a = [1,2,3,3,3,3,4]
b = [1,3]
result: [2, 3, 3, 3, 4]

Answered by 2 revs

Solution #4

This is referred to as a “set subtraction” operation. For this, use the set data structure.

In Python 2.7:

x = {1,2,3,4,5,6,7,8,9,0}
y = {1,3,5,7,9}
print x - y


>>> print x - y
set([0, 8, 2, 4, 6])

Answered by Santa

Solution #5

The answer you’re looking for in a lot of circumstances is:

ys = set(y)

[item for item in x if item not in ys]

This is a combination of aaronasterling’s and quantumSoup’s responses.

For each element in x, aaronasterling’s version performs len(y) item comparisons, which requires quadratic time. Because quantumSoup’s version uses sets, it performs a single constant-time set search for each element in x—but it loses the order of your elements because it converts both x and y to sets.

You achieve the best of both worlds—linear time and order preservation—by transforming only y into a set and iterating x in order.

However, it still has the same issue as quantumSoup’s version: it requires hashable elements. It’s pretty much in the essence of sets to do so. ** What do you do if you want to subtract a list of dicts from another list of dicts, but the list to subtract is long?

The problem is solved if you can decorate your values in such a way that they are hashable. For instance, consider a flat dictionary whose values are hashable:

ys = {tuple(item.items()) for item in y}

[item for item in x if tuple(item.items()) not in ys]

You can still utilize this technique if your types are a little more difficult (for example, if you frequently deal with JSON-compatible values that are hashable, or lists or dicts whose values are recursively the same type). Some kinds, however, are unable to be transformed into anything hashable.

If your items aren’t hashable and can’t be made so, but they’re comparable, sorting and using bisect can get you log-linear time (O(N*log M), which is a lot better than the O(N*M) time of the list solution but not as good as the O(N+M) time of the set solution):

ys = sorted(y)
def bisect_contains(seq, item):
    index = bisect.bisect(seq, item)
    return index < len(seq) and seq[index] == item

[item for item in x if bisect_contains(ys, item)]

You’re stuck with the quadratic solution if your items aren’t hashable or similar.

* You could also achieve this with a pair of OrderedSet objects, for which recipes and third-party modules are available. However, I believe this is a straightforward situation.

** Set lookups are constant time because all they have to do is hash the value and see if that hash has an entry. This won’t work if it can’t hash the value.

Answered by abarnert

Post is based on