Problem
I needed to develop a version of random.choice that was weighted (each element in the list has a different probability for being selected). I came up with the following:
def weightedChoice(choices):
"""Like random.choice, but each element can have a different chance of
being selected.
choices can be any iterable containing iterables with two items each.
Technically, they can have more than two items, the rest will just be
ignored. The first item is the thing being chosen, the second item is
its weight. The weights can be any numeric values, what matters is the
relative differences between them.
"""
space = {}
current = 0
for choice, weight in choices:
if weight > 0:
space[current] = choice
current += weight
rand = random.uniform(0, current)
for key in sorted(space.keys() + [current]):
if rand < key:
return choice
choice = space[key]
return None
This function appears to me to be unnecessarily complicated and unattractive. I’m hoping that everyone here can make some thoughts for how to improve it or alternative approaches. Cleanliness and readability of code are more essential to me than efficiency.
Asked by Colin
Solution #1
Since version 1.7.0, NumPy has a choice function that supports probability distributions.
from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
p=probability_distribution)
It’s worth noting that probability distribution is a series that follows list of candidates. You can also adjust the behavior by using the replace=False keyword to prevent drawn elements from being replaced.
Answered by Ronan Paixão
Solution #2
Since Python 3.6, the random module has a function called choices.
In [1]: import random
In [2]: random.choices(
...: population=[['a','b'], ['b','a'], ['c','b']],
...: weights=[0.2, 0.2, 0.6],
...: k=10
...: )
Out[2]:
[['c', 'b'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['c', 'b']]
According to the documentation, random.choices will sample with replacement:
For the sake of completeness, here’s a note:
If you need to sample without replacement, you can use numpy.choice, which has a replace option that regulates this behavior, as @ronan-great paixo’s answer shows.
Answered by vishes_shell
Solution #3
def weighted_choice(choices):
total = sum(w for c, w in choices)
r = random.uniform(0, total)
upto = 0
for c, w in choices:
if upto + w >= r:
return c
upto += w
assert False, "Shouldn't get here"
Answered by Ned Batchelder
Solution #4
from random import random
from bisect import bisect
def weighted_choice(choices):
values, weights = zip(*choices)
total = 0
cum_weights = []
for w in weights:
total += w
cum_weights.append(total)
x = random() * total
i = bisect(cum_weights, x)
return values[i]
>>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)])
'WHITE'
Split this into two functions, one to construct the cumulative weights and the other to bisect to a random point, if you need to make more than one option.
Answered by Raymond Hettinger
Solution #5
You can use numpy.random.choice if you don’t mind using numpy.
For example:
import numpy
items = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
elems = [i[0] for i in items]
probs = [i[1] for i in items]
trials = 1000
results = [0] * len(items)
for i in range(trials):
res = numpy.random.choice(items, p=probs) #This is where the item is selected!
results[items.index(res)] += 1
results = [r / float(trials) for r in results]
print "item\texpected\tactual"
for i in range(len(probs)):
print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])
You can do it without a loop if you know how many choices you need to make ahead of time:
numpy.random.choice(items, trials, p=probs)
Answered by pweitzman
Post is based on https://stackoverflow.com/questions/3679694/a-weighted-version-of-random-choice