Problem
What is the quickest way to determine whether a value exists in a list (a list with millions of elements) and its index?
As in this example, I know that all of the values in the list are unique.
The first method I try is (in my real code, it takes 3.8 seconds):
a = [4,2,3,1,5,6]
if a.count(7) == 1:
b=a.index(7)
"Do something with variable b"
The second way I try is (2x faster than my real code: 1.9 seconds):
a = [4,2,3,1,5,6]
try:
b=a.index(7)
except ValueError:
"Do nothing"
else:
"Do something with variable b"
Methods suggested by a Stack Overflow member (2.74 seconds for my actual code):
a = [4,2,3,1,5,6]
if 7 in a:
a.index(7)
The first method takes 3.81 seconds and the second method takes 1.88 seconds in my real code. It’s a step forward, but:
I’m new to Python and programming; is there a faster way to accomplish the same tasks and save time in the process?
A more detailed explanation of my application is as follows:
I can get a list of particles using the Blender API:
particles = [1, 2, 3, 4, etc.]
I can get the position of a particle from there:
particles[x].location = [x,y,z]
And for each particle, I check if it has a neighbor by scanning each particle’s location as follows:
if [x+1,y,z] in particles.location
"Find the identity of this neighbour particle in x:the particle's index
in the array"
particles.index([x+1,y,z])
Asked by Jean-Francois Gallant
Solution #1
7 in a
Things’s the clearest and quickest method to do it.
You may alternatively use a set, but creating one from your list may take longer than the time saved by speedier membership testing. The only way to be positive is to conduct thorough benchmarking. (This, of course, is dependent on the operations you require.)
Answered by Rafe Kettler
Solution #2
For huge lists, as others have said, it can be very slow. Here are some performance comparisons for in, set, and bisect. It’s worth noting that the time (in seconds) is in log scale.
Code for testing:
import random
import bisect
import matplotlib.pyplot as plt
import math
import time
def method_in(a, b, c):
start_time = time.time()
for i, x in enumerate(a):
if x in b:
c[i] = 1
return time.time() - start_time
def method_set_in(a, b, c):
start_time = time.time()
s = set(b)
for i, x in enumerate(a):
if x in s:
c[i] = 1
return time.time() - start_time
def method_bisect(a, b, c):
start_time = time.time()
b.sort()
for i, x in enumerate(a):
index = bisect.bisect_left(b, x)
if index < len(a):
if x == b[index]:
c[i] = 1
return time.time() - start_time
def profile():
time_method_in = []
time_method_set_in = []
time_method_bisect = []
# adjust range down if runtime is too long or up if there are too many zero entries in any of the time_method lists
Nls = [x for x in range(10000, 30000, 1000)]
for N in Nls:
a = [x for x in range(0, N)]
random.shuffle(a)
b = [x for x in range(0, N)]
random.shuffle(b)
c = [0 for x in range(0, N)]
time_method_in.append(method_in(a, b, c))
time_method_set_in.append(method_set_in(a, b, c))
time_method_bisect.append(method_bisect(a, b, c))
plt.plot(Nls, time_method_in, marker='o', color='r', linestyle='-', label='in')
plt.plot(Nls, time_method_set_in, marker='o', color='b', linestyle='-', label='set')
plt.plot(Nls, time_method_bisect, marker='o', color='g', linestyle='-', label='bisect')
plt.xlabel('list size', fontsize=18)
plt.ylabel('log(time)', fontsize=18)
plt.legend(loc='upper left')
plt.yscale('log')
plt.show()
profile()
Answered by xslittlegrass
Solution #3
You may make a set out of your things. Set lookups are extremely time-saving.
Try:
s = set(a)
if 7 in s:
# do stuff
edit You mention in a comment that you’d like to know the element’s index. Sets, on the other hand, have no concept of element position. Pre-sorting your list and then using binary search every time you need to find an entry is another option.
Answered by NPE
Solution #4
def check_availability(element, collection: iter):
return element in collection
Usage
check_availability('a', [1,2,3,4,'a','b','c'])
This, I believe, is the quickest way to see if a given value is in an array.
Answered by Tiago Moutinho
Solution #5
The question that started it all was:
As a result, there are two things to look for:
To do this, I changed the @xslittlegrass code to always compute indexes and added a new function.
Results
Methods are:
Method 5 is the fastest, according to the results.
Intriguingly, the try and set approaches take the same amount of time.
Test Code
import random
import bisect
import matplotlib.pyplot as plt
import math
import timeit
import itertools
def wrapper(func, *args, **kwargs):
" Use to produced 0 argument function for call it"
# Reference https://www.pythoncentral.io/time-a-python-function/
def wrapped():
return func(*args, **kwargs)
return wrapped
def method_in(a,b,c):
for i,x in enumerate(a):
if x in b:
c[i] = b.index(x)
else:
c[i] = -1
return c
def method_try(a,b,c):
for i, x in enumerate(a):
try:
c[i] = b.index(x)
except ValueError:
c[i] = -1
def method_set_in(a,b,c):
s = set(b)
for i,x in enumerate(a):
if x in s:
c[i] = b.index(x)
else:
c[i] = -1
return c
def method_bisect(a,b,c):
" Finds indexes using bisection "
# Create a sorted b with its index
bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])
for i,x in enumerate(a):
index = bisect.bisect_left(bsorted,(x, ))
c[i] = -1
if index < len(a):
if x == bsorted[index][0]:
c[i] = bsorted[index][1] # index in the b array
return c
def method_reverse_lookup(a, b, c):
reverse_lookup = {x:i for i, x in enumerate(b)}
for i, x in enumerate(a):
c[i] = reverse_lookup.get(x, -1)
return c
def profile():
Nls = [x for x in range(1000,20000,1000)]
number_iterations = 10
methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
time_methods = [[] for _ in range(len(methods))]
for N in Nls:
a = [x for x in range(0,N)]
random.shuffle(a)
b = [x for x in range(0,N)]
random.shuffle(b)
c = [0 for x in range(0,N)]
for i, func in enumerate(methods):
wrapped = wrapper(func, a, b, c)
time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))
markers = itertools.cycle(('o', '+', '.', '>', '2'))
colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))
for i in range(len(time_methods)):
plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))
plt.xlabel('list size', fontsize=18)
plt.ylabel('log(time)', fontsize=18)
plt.legend(loc = 'upper left')
plt.show()
profile()
Answered by DarrylG
Post is based on https://stackoverflow.com/questions/7571635/fastest-way-to-check-if-a-value-exists-in-a-list