Coder Perfect

Parallel.ForEach vs Task.Factory.StartNew

Problem

What’s the difference between the code examples below? Isn’t it true that they’ll both be using threadpool threads?

If I want to invoke a function for each item in a collection, for example,

Parallel.ForEach<Item>(items, item => DoSomething(item));

vs

foreach(var item in items)
{
  Task.Factory.StartNew(() => DoSomething(item));
}

Asked by stackoverflowuser

Solution #1

The first choice is far superior.

Parallel. Internally, ForEach divides your collection into work items using a PartitionerT>. It will not do one task per item, but rather batch them together to reduce overhead.

The second option will create a Task for each item in your collection. While the results will be (almost) same, this will add significantly more overhead than is necessary, especially for large collections, and slow overall runtimes.

FYI – The Partitioner can be controlled by utilizing the proper Parallel overloads. If desired, use ForEach. See Custom Partitioners on MSDN for further information.

At execution, the key difference is that the second will act asynchronously. Parallel can be used to mimic this. By completing the following tasks, you will be able to help each other.

Task.Factory.StartNew( () => Parallel.ForEach<Item>(items, item => DoSomething(item)));

You can still use the partitioners this way, but you won’t be blocked until the procedure is finished.

Answered by Reed Copsey

Solution #2

I ran a little test using “Parallel.For” and “Task” objects to run a method “1,000,000,000 (one billion)” times.

When I looked at the processor time, I saw that Parallel was more efficient. Parallel.For breaks your task into little work pieces and executes them in parallel on all cores in the most efficient way possible. While constructing a large number of task objects (FYI, TPL will use thread pooling internally), each execution on each job will be moved, causing extra stress in the box, as shown in the experiment below.

I also made a short video that explains the basics of TPL and shows how Parallel works. Using your core more efficiently than conventional jobs and threads http://www.youtube.com/watch?v=No7QqSc5cl8

Experiment 1

Parallel.For(0, 1000000000, x => Method1());

Experiment 2

for (int i = 0; i < 1000000000; i++)
{
    Task o = new Task(Method1);
    o.Start();
}

Answered by Shivprasad Koirala

Solution #3

Task.Factory will explicitly create a new task instance for each item and return before they are finished, whereas Parallel.ForEach will optimize (and may not even start additional threads) and block until the loop is finished (asynchronous tasks). It is significantly more efficient to use Parallel.Foreach.

Answered by Sogger

Solution #4

When tasks have a large operation to finish, this is a realistic circumstance. Shivprasad’s method is more concerned with object generation and memory allocation than with computing. I conducted a study using the following method:

public static double SumRootN(int root)
{
    double result = 0;
    for (int i = 1; i < 10000000; i++)
        {
            result += Math.Exp(Math.Log(i) / root);
        }
        return result; 
}

This procedure takes roughly 0.5 seconds to complete.

I used Parallel to call it 200 times:

Parallel.For(0, 200, (int i) =>
{
    SumRootN(10);
});

Then, using the old-fashioned method, I called it 200 times:

List<Task> tasks = new List<Task>() ;
for (int i = 0; i < loopCounter; i++)
{
    Task t = new Task(() => SumRootN(10));
    t.Start();
    tasks.Add(t);
}

Task.WaitAll(tasks.ToArray()); 

The first case took 26656 milliseconds to finish, whereas the second took 24478 milliseconds. I said it several times. Every time, the second approach is a fraction of a second faster.

Answered by user1089583

Post is based on https://stackoverflow.com/questions/5009181/parallel-foreach-vs-task-factory-startnew