Coder Perfect

Random.choice with a weighted version

Problem

I needed to develop a version of random.choice that was weighted (each element in the list has a different probability for being selected). I came up with the following:

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    """
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None

This function appears to me to be unnecessarily complicated and unattractive. I’m hoping that everyone here can make some thoughts for how to improve it or alternative approaches. Cleanliness and readability of code are more essential to me than efficiency.

Asked by Colin

Solution #1

NumPy features a choose function that supports probability distributions since version 1.7.0.

from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
              p=probability_distribution)

It’s worth noting that probability distribution is a series that follows list of candidates. You can also adjust the behavior by using the replace=False keyword to prevent drawn elements from being replaced.

Answered by Ronan Paixão

Solution #2

Since Python 3.6, the random module has a function called choices.

In [1]: import random

In [2]: random.choices(
...:     population=[['a','b'], ['b','a'], ['c','b']],
...:     weights=[0.2, 0.2, 0.6],
...:     k=10
...: )

Out[2]:
[['c', 'b'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['c', 'b']]

According to the documentation, random.choices will sample with replacement:

For the sake of completeness, here’s a note:

If you need to sample without replacement, you can use numpy.choice, which has a replace option that regulates this behavior, as @ronan-great paixo’s answer shows.

Answered by vishes_shell

Solution #3

def weighted_choice(choices):
   total = sum(w for c, w in choices)
   r = random.uniform(0, total)
   upto = 0
   for c, w in choices:
      if upto + w >= r:
         return c
      upto += w
   assert False, "Shouldn't get here"

Answered by Ned Batchelder

Solution #4

from random import random
from bisect import bisect

def weighted_choice(choices):
    values, weights = zip(*choices)
    total = 0
    cum_weights = []
    for w in weights:
        total += w
        cum_weights.append(total)
    x = random() * total
    i = bisect(cum_weights, x)
    return values[i]

>>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)])
'WHITE'

Split this into two functions, one to construct the cumulative weights and the other to bisect to a random point, if you need to make more than one option.

Answered by Raymond Hettinger

Solution #5

You can use numpy.random.choice if you don’t mind using numpy.

For example:

import numpy

items  = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
elems = [i[0] for i in items]
probs = [i[1] for i in items]

trials = 1000
results = [0] * len(items)
for i in range(trials):
    res = numpy.random.choice(items, p=probs)  #This is where the item is selected!
    results[items.index(res)] += 1
results = [r / float(trials) for r in results]
print "item\texpected\tactual"
for i in range(len(probs)):
    print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])

You can do it without a loop if you know how many choices you need to make ahead of time:

numpy.random.choice(items, trials, p=probs)

Answered by pweitzman

Post is based on https://stackoverflow.com/questions/3679694/a-weighted-version-of-random-choice