Problem
I’m looking for a simple algorithm to pick five random entries from a generic list. I’d like to get 5 random elements from a Liststring>, for example.
Asked by JC Grubbs
Solution #1
Using linq:
YourList.OrderBy(x => rnd.Next()).Take(5)
Answered by Ers
Solution #2
Iterate through the list, changing the chance of selection for each element to (number required)/ (number left)
So, if there are 40 items, the first has a 5/40 chance of getting chosen. If it is, the next has a 4/39 chance, and if it is not, the following has a 5/39 chance. You’ll have your five items by the time you reach to the end, and you’ll probably have them all before then.
This method is known as selection sampling, and it is a subset of reservoir sampling. It performs similarly to shuffled input, except it allows the sample to be generated without changing the original data.
Answered by Kyle Cronin
Solution #3
public static List<T> GetRandomElements<T>(this IEnumerable<T> list, int elementsCount)
{
return list.OrderBy(arg => Guid.NewGuid()).Take(elementsCount).ToList();
}
Answered by vag
Solution #4
This is a more difficult problem than it appears, mostly because many mathematically valid solutions will not allow you to hit all of the options (more on this below).
First, here are three simple-to-implement generators that are right if you have a truly random number:
Kyle’s solution, which is O, is a zero (n).
(1) Create a list of n pairs [(0, rand), (1, rand), (2, rand),…], sort them by the second coordinate, and retrieve your random subset using the first k (for you, k=5) indices. Although it takes O(n log n) time, I believe it is simple to implement.
(2) Create an empty list s = [] that will become the indices for k random elements as it grows. Choose a number r at random from 0 to 1, 2,…, n-1, r = rand percent n, then add it to s. Next, put r = rand percent (n-1) in s; to avoid collisions, add to r the number of items less than it in s. Then, using r = rand percent (n-2), repeat the process until you have k distinct elements in s. The worst-case execution time is O(k2). As a result, for k n, this may be faster. You can implement it in O(k log k) if you keep s sorted and track which contiguous intervals it contains, but it takes more work.
@Kyle – Thanks for your comment. You are correct; nevertheless, on second thinking, I agree with your response. I read it quickly at first and mistookly assumed you meant to choose each element sequentially with fixed probability k/n, which would have been incorrect – but your adaptive approach appears to be accurate. Sorry for the inconvenience.
Now for the kicker: there are nk/k! choices of k element subset out of n elements asymptotically (for fixed k, n increasing) [this is an estimate of (n select k)]. These figures are enormous if n is large and k is not very small. In any typical 32 bit random number generator, the best cycle length you can expect for is 232 = 2564. So, if we have a list of 1000 elements and want to pick 5 at random, a conventional random number generator isn’t going to hit all of them. These techniques should be OK as long as you’re fine with an option that works well for smaller collections and always “looks” random.
Addendum: After writing this, I discovered how difficult it is to effectively execute notion (2), therefore I wanted to explain this response. You’ll need an array-like structure that supports O(log m) searches and inserts to get O(k log k) time, which a balanced binary tree can provide. Here’s some pseudopython using such a structure to create an array called s:
# Returns a container s with k distinct random numbers from {0, 1, ..., n-1}
def ChooseRandomSubset(n, k):
for i in range(k):
r = UniformRandom(0, n-i) # May be 0, must be < n-i
q = s.FirstIndexSuchThat( s[q] - q > r ) # This is the search.
s.InsertInOrder(q ? r + q : r + len(s)) # Inserts right before q.
return s
I recommend going over a few sample scenarios to check how well this translates the above English description.
Answered by Tyler
Solution #5
I believe the chosen response is correct and enjoyable. But I did it a little differently because I also wanted the results in random order.
static IEnumerable<SomeType> PickSomeInRandomOrder<SomeType>(
IEnumerable<SomeType> someTypes,
int maxCount)
{
Random random = new Random(DateTime.Now.Millisecond);
Dictionary<double, SomeType> randomSortTable = new Dictionary<double,SomeType>();
foreach(SomeType someType in someTypes)
randomSortTable[random.NextDouble()] = someType;
return randomSortTable.OrderBy(KVP => KVP.Key).Take(maxCount).Select(KVP => KVP.Value);
}
Answered by Frank Schwieterman
Post is based on https://stackoverflow.com/questions/48087/select-n-random-elements-from-a-listt-in-c-sharp