Problem
I’m attempting to figure out why multiprocessing is better than threading. I understand that multiprocessing circumvents the Global Interpreter Lock, but what other benefits does it provide, and can threading accomplish the same?
Asked by John
Solution #1
Here are some of the advantages and disadvantages I came up with.
Answered by Jeremy Brown
Solution #2
Threads are used in the threading module, while processes are used in the multiprocessing module. Threads and processes share memory, whereas threads and processes have separate memory. With multiprocessing, this makes it a little more difficult to share objects between processes. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.
It takes a little longer to spawn processes than it does to spawn threads.
Answered by Sjoerd
Solution #3
The purpose of threading is to make applications more responsive. Consider the following scenario: you have a database connection and you need to respond to user input. Without threading, if the database connection is busy the application will not be able to respond to the user. By splitting off the database connection into a separate thread you can make the application more responsive. Also because both threads are in the same process, they can access the same data structures – good performance, plus a flexible software design.
The program isn’t actually executing two things at once because of the GIL; instead, we’ve put the database’s resource lock onto a different thread so that CPU time can be shifted between it and the user interaction. The threads are allotted equal amounts of CPU time.
Multiprocessing is useful when you need to get more than one item done at the same time. Consider the following scenario: your program must connect to six databases and conduct a sophisticated matrix transformation on each dataset. Putting each job in a separate thread might help a little because when one connection is idle another one could get some CPU time, but the processing would not be done in parallel because the GIL means that you’re only ever using the resources of one CPU. Each work can execute on its own CPU and at maximum efficiency if it is placed in a Multiprocessing process.
Answered by Simon Hibbs
Solution #4
Python documentation quotes
The canonical version of this answer is now at the dupliquee question: What are the differences between the threading and multiprocessing modules?
The following Python documentation quotes about Process vs Threads and the GIL have been highlighted: In CPython, what is the global interpreter lock (GIL)?
Experiments with process vs. thread
I ran some benchmarks to demonstrate the difference more clearly.
On an 8 hyperthread CPU, I timed CPU and IO bound tasks for varying amounts of threads in the benchmark. The amount of work delivered each thread is always the same, therefore more threads equals more total work.
The results were:
Plot data.
Conclusions:
Test code:
#!/usr/bin/env python3
import multiprocessing
import threading
import time
import sys
def cpu_func(result, niters):
'''
A useless CPU bound function.
'''
for i in range(niters):
result = (result * result * i + 2 * result * i * i + 3) % 10000000
return result
class CpuThread(threading.Thread):
def __init__(self, niters):
super().__init__()
self.niters = niters
self.result = 1
def run(self):
self.result = cpu_func(self.result, self.niters)
class CpuProcess(multiprocessing.Process):
def __init__(self, niters):
super().__init__()
self.niters = niters
self.result = 1
def run(self):
self.result = cpu_func(self.result, self.niters)
class IoThread(threading.Thread):
def __init__(self, sleep):
super().__init__()
self.sleep = sleep
self.result = self.sleep
def run(self):
time.sleep(self.sleep)
class IoProcess(multiprocessing.Process):
def __init__(self, sleep):
super().__init__()
self.sleep = sleep
self.result = self.sleep
def run(self):
time.sleep(self.sleep)
if __name__ == '__main__':
cpu_n_iters = int(sys.argv[1])
sleep = 1
cpu_count = multiprocessing.cpu_count()
input_params = [
(CpuThread, cpu_n_iters),
(CpuProcess, cpu_n_iters),
(IoThread, sleep),
(IoProcess, sleep),
]
header = ['nthreads']
for thread_class, _ in input_params:
header.append(thread_class.__name__)
print(' '.join(header))
for nthreads in range(1, 2 * cpu_count):
results = [nthreads]
for thread_class, work_size in input_params:
start_time = time.time()
threads = []
for i in range(nthreads):
thread = thread_class(work_size)
threads.append(thread)
thread.start()
for i, thread in enumerate(threads):
thread.join()
results.append(time.time() - start_time)
print(' '.join('{:.6e}'.format(result) for result in results))
Upstream on GitHub + charting code in the same directory
In a Lenovo ThinkPad P51 laptop with an Intel Core i7-7820HQ CPU (4 cores / 8 threads), RAM: 2x Samsung M471A2K43BB1-CRC (2x 16GiB), and SSD: Samsung MZVLB512HAJQ-000L7 (3,000 MB/s), it was tested on Ubuntu 18.10 with Python 3.6.7.
Determine which threads are active at any given time.
With the target= option of threading, you can run a callback anytime a thread is scheduled, according to this post https://rohanvarma.me/GIL/. Threads and multiprocessing are the same thing. Process.
This allows us to see which thread is active at any one time. When we’re done, we’ll see something like this (I made this graph up):
+--------------------------------------+
+ Active threads / processes +
+-----------+--------------------------------------+
|Thread 1 |******** ************ |
| 2 | ***** *************|
+-----------+--------------------------------------+
|Process 1 |*** ************** ****** **** |
| 2 |** **** ****** ** ********* **********|
+-----------+--------------------------------------+
+ Time --> +
+--------------------------------------+
which would show that:
Answered by Ciro Santilli 新疆再教育营六四事件法轮功郝海东
Solution #5
Isolation is the main benefit. A crashing process will not bring down other processes, whereas a crashing thread will very certainly cause chaos.
Answered by Marcelo Cantos
Post is based on https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python