Coder Perfect

The quickest way to see if a value in a list exists

Problem

What is the quickest way to determine whether a value exists in a list (a list with millions of elements) and its index?

As in this example, I know that all of the values in the list are unique.

The first method I try is (in my real code, it takes 3.8 seconds):

a = [4,2,3,1,5,6]

if a.count(7) == 1:
    b=a.index(7)
    "Do something with variable b"

The second way I try is (2x faster than my real code: 1.9 seconds):

a = [4,2,3,1,5,6]

try:
    b=a.index(7)
except ValueError:
    "Do nothing"
else:
    "Do something with variable b"

Methods suggested by a Stack Overflow member (2.74 seconds for my actual code):

a = [4,2,3,1,5,6]
if 7 in a:
    a.index(7)

The first method takes 3.81 seconds and the second method takes 1.88 seconds in my real code. It’s a step forward, but:

I’m new to Python and programming; is there a faster way to accomplish the same tasks and save time in the process?

A more detailed explanation of my application is as follows:

I can get a list of particles using the Blender API:

particles = [1, 2, 3, 4, etc.]

I can get the position of a particle from there:

particles[x].location = [x,y,z]

And for each particle, I check if it has a neighbor by scanning each particle’s location as follows:

if [x+1,y,z] in particles.location
    "Find the identity of this neighbour particle in x:the particle's index
    in the array"
    particles.index([x+1,y,z])

Asked by Jean-Francois Gallant

Solution #1

7 in a

Things’s the clearest and quickest method to do it.

You may alternatively use a set, but creating one from your list may take longer than the time saved by speedier membership testing. The only way to be positive is to conduct thorough benchmarking. (This, of course, is dependent on the operations you require.)

Answered by Rafe Kettler

Solution #2

For huge lists, as others have said, it can be very slow. Here are some performance comparisons for in, set, and bisect. It’s worth noting that the time (in seconds) is in log scale.

Code for testing:

import random
import bisect
import matplotlib.pyplot as plt
import math
import time


def method_in(a, b, c):
    start_time = time.time()
    for i, x in enumerate(a):
        if x in b:
            c[i] = 1
    return time.time() - start_time


def method_set_in(a, b, c):
    start_time = time.time()
    s = set(b)
    for i, x in enumerate(a):
        if x in s:
            c[i] = 1
    return time.time() - start_time


def method_bisect(a, b, c):
    start_time = time.time()
    b.sort()
    for i, x in enumerate(a):
        index = bisect.bisect_left(b, x)
        if index < len(a):
            if x == b[index]:
                c[i] = 1
    return time.time() - start_time


def profile():
    time_method_in = []
    time_method_set_in = []
    time_method_bisect = []

    # adjust range down if runtime is too long or up if there are too many zero entries in any of the time_method lists
    Nls = [x for x in range(10000, 30000, 1000)]
    for N in Nls:
        a = [x for x in range(0, N)]
        random.shuffle(a)
        b = [x for x in range(0, N)]
        random.shuffle(b)
        c = [0 for x in range(0, N)]

        time_method_in.append(method_in(a, b, c))
        time_method_set_in.append(method_set_in(a, b, c))
        time_method_bisect.append(method_bisect(a, b, c))

    plt.plot(Nls, time_method_in, marker='o', color='r', linestyle='-', label='in')
    plt.plot(Nls, time_method_set_in, marker='o', color='b', linestyle='-', label='set')
    plt.plot(Nls, time_method_bisect, marker='o', color='g', linestyle='-', label='bisect')
    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc='upper left')
    plt.yscale('log')
    plt.show()


profile()

Answered by xslittlegrass

Solution #3

You may make a set out of your things. Set lookups are extremely time-saving.

Try:

s = set(a)
if 7 in s:
  # do stuff

edit You mention in a comment that you’d like to know the element’s index. Sets, on the other hand, have no concept of element position. Pre-sorting your list and then using binary search every time you need to find an entry is another option.

Answered by NPE

Solution #4

def check_availability(element, collection: iter):
    return element in collection

Usage

check_availability('a', [1,2,3,4,'a','b','c'])

This, I believe, is the quickest way to see if a given value is in an array.

Answered by Tiago Moutinho

Solution #5

The question that started it all was:

As a result, there are two things to look for:

To do this, I changed the @xslittlegrass code to always compute indexes and added a new function.

Results

Methods are:

Method 5 is the fastest, according to the results.

Intriguingly, the try and set approaches take the same amount of time.

Test Code

import random
import bisect
import matplotlib.pyplot as plt
import math
import timeit
import itertools

def wrapper(func, *args, **kwargs):
    " Use to produced 0 argument function for call it"
    # Reference https://www.pythoncentral.io/time-a-python-function/
    def wrapped():
        return func(*args, **kwargs)
    return wrapped

def method_in(a,b,c):
    for i,x in enumerate(a):
        if x in b:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_try(a,b,c):
    for i, x in enumerate(a):
        try:
            c[i] = b.index(x)
        except ValueError:
            c[i] = -1

def method_set_in(a,b,c):
    s = set(b)
    for i,x in enumerate(a):
        if x in s:
            c[i] = b.index(x)
        else:
            c[i] = -1
    return c

def method_bisect(a,b,c):
    " Finds indexes using bisection "

    # Create a sorted b with its index
    bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])

    for i,x in enumerate(a):
        index = bisect.bisect_left(bsorted,(x, ))
        c[i] = -1
        if index < len(a):
            if x == bsorted[index][0]:
                c[i] = bsorted[index][1]  # index in the b array

    return c

def method_reverse_lookup(a, b, c):
    reverse_lookup = {x:i for i, x in enumerate(b)}
    for i, x in enumerate(a):
        c[i] = reverse_lookup.get(x, -1)
    return c

def profile():
    Nls = [x for x in range(1000,20000,1000)]
    number_iterations = 10
    methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
    time_methods = [[] for _ in range(len(methods))]

    for N in Nls:
        a = [x for x in range(0,N)]
        random.shuffle(a)
        b = [x for x in range(0,N)]
        random.shuffle(b)
        c = [0 for x in range(0,N)]

        for i, func in enumerate(methods):
            wrapped = wrapper(func, a, b, c)
            time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))

    markers = itertools.cycle(('o', '+', '.', '>', '2'))
    colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
    labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))

    for i in range(len(time_methods)):
        plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))

    plt.xlabel('list size', fontsize=18)
    plt.ylabel('log(time)', fontsize=18)
    plt.legend(loc = 'upper left')
    plt.show()

profile()

Answered by DarrylG

Post is based on https://stackoverflow.com/questions/7571635/fastest-way-to-check-if-a-value-exists-in-a-list