Coder Perfect

Python [duplicate] Multiprocessing vs Threading


I’m attempting to figure out why multiprocessing is better than threading. I understand that multiprocessing circumvents the Global Interpreter Lock, but what other benefits does it provide, and can threading accomplish the same?

Asked by John

Solution #1

Here are some of the advantages and disadvantages I came up with.

Answered by Jeremy Brown

Solution #2

Threads are used in the threading module, while processes are used in the multiprocessing module. Threads and processes share memory, whereas threads and processes have separate memory. With multiprocessing, this makes it a little more difficult to share objects between processes. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.

It takes a little longer to spawn processes than it does to spawn threads.

Answered by Sjoerd

Solution #3

The purpose of threading is to make applications more responsive. Consider the following scenario: you have a database connection and you need to respond to user input. Without threading, if the database connection is busy the application will not be able to respond to the user. By splitting off the database connection into a separate thread you can make the application more responsive. Also because both threads are in the same process, they can access the same data structures – good performance, plus a flexible software design.

The program isn’t actually executing two things at once because of the GIL; instead, we’ve put the database’s resource lock onto a different thread so that CPU time can be shifted between it and the user interaction. The threads are allotted equal amounts of CPU time.

Multiprocessing is useful when you need to get more than one item done at the same time. Consider the following scenario: your program must connect to six databases and conduct a sophisticated matrix transformation on each dataset. Putting each job in a separate thread might help a little because when one connection is idle another one could get some CPU time, but the processing would not be done in parallel because the GIL means that you’re only ever using the resources of one CPU. Each work can execute on its own CPU and at maximum efficiency if it is placed in a Multiprocessing process.

Answered by Simon Hibbs

Solution #4

Python documentation quotes

The canonical version of this answer is now at the dupliquee question: What are the differences between the threading and multiprocessing modules?

The following Python documentation quotes about Process vs Threads and the GIL have been highlighted: In CPython, what is the global interpreter lock (GIL)?

Experiments with process vs. thread

I ran some benchmarks to demonstrate the difference more clearly.

On an 8 hyperthread CPU, I timed CPU and IO bound tasks for varying amounts of threads in the benchmark. The amount of work delivered each thread is always the same, therefore more threads equals more total work.

The results were:

Plot data.


Test code:

#!/usr/bin/env python3

import multiprocessing
import threading
import time
import sys

def cpu_func(result, niters):
    A useless CPU bound function.
    for i in range(niters):
        result = (result * result * i + 2 * result * i * i + 3) % 10000000
    return result

class CpuThread(threading.Thread):
    def __init__(self, niters):
        self.niters = niters
        self.result = 1
    def run(self):
        self.result = cpu_func(self.result, self.niters)

class CpuProcess(multiprocessing.Process):
    def __init__(self, niters):
        self.niters = niters
        self.result = 1
    def run(self):
        self.result = cpu_func(self.result, self.niters)

class IoThread(threading.Thread):
    def __init__(self, sleep):
        self.sleep = sleep
        self.result = self.sleep
    def run(self):

class IoProcess(multiprocessing.Process):
    def __init__(self, sleep):
        self.sleep = sleep
        self.result = self.sleep
    def run(self):

if __name__ == '__main__':
    cpu_n_iters = int(sys.argv[1])
    sleep = 1
    cpu_count = multiprocessing.cpu_count()
    input_params = [
        (CpuThread, cpu_n_iters),
        (CpuProcess, cpu_n_iters),
        (IoThread, sleep),
        (IoProcess, sleep),
    header = ['nthreads']
    for thread_class, _ in input_params:
    print(' '.join(header))
    for nthreads in range(1, 2 * cpu_count):
        results = [nthreads]
        for thread_class, work_size in input_params:
            start_time = time.time()
            threads = []
            for i in range(nthreads):
                thread = thread_class(work_size)
            for i, thread in enumerate(threads):
            results.append(time.time() - start_time)
        print(' '.join('{:.6e}'.format(result) for result in results))

Upstream on GitHub + charting code in the same directory

In a Lenovo ThinkPad P51 laptop with an Intel Core i7-7820HQ CPU (4 cores / 8 threads), RAM: 2x Samsung M471A2K43BB1-CRC (2x 16GiB), and SSD: Samsung MZVLB512HAJQ-000L7 (3,000 MB/s), it was tested on Ubuntu 18.10 with Python 3.6.7.

Determine which threads are active at any given time.

With the target= option of threading, you can run a callback anytime a thread is scheduled, according to this post Threads and multiprocessing are the same thing. Process.

This allows us to see which thread is active at any one time. When we’re done, we’ll see something like this (I made this graph up):

            + Active threads / processes           +
|Thread   1 |********     ************             |
|         2 |        *****            *************|
|Process  1 |***  ************** ******  ****      |
|         2 |** **** ****** ** ********* **********|
            + Time -->                             +

which would show that:

Answered by Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Solution #5

Isolation is the main benefit. A crashing process will not bring down other processes, whereas a crashing thread will very certainly cause chaos.

Answered by Marcelo Cantos

Post is based on