Best Way to Learn Python in 2020 : Episode 3
In Episode you are going learn Concurrent and parallel programing
The days of single-core processors are far gone.
Nowadays whether you are buying an off-the-shelf laptop or a high-end server for your business, your processor will definitely have multiple cores.
And sometimes, your program needs to take advantage of these multiple cores to run things in parallel.
This can potentially lead to increased throughput, higher performance, and better responsiveness.
But let me be clear about one thing here.
If high performance and increased throughput are absolutely crucial, Python wouldn’t be the best language to support parallel programming.
In this situation, I would personally prefer golang instead (or good old C).
But since this is an article about Python, let’s keep our focus on Python.
Before you dive in and write your first parallel program, there are some parallel processing concepts that you should learn about first.
Here are some of these concepts.
Mutual Exclusion
When you have some data that is shared across multiple threads or processes, it is important to synchronize access to these shared resources.
If you don’t, a race condition can happen which might lead to unexpected and sometimes disastrous consequences. I will talk more about race conditions later.
Mutual exclusion means that one thread blocks the further progress of other concurrent threads that require the use of the shared resource.
Locks
Locks is one of the various implementations of mutual exclusion.
To understand what locks are, you can think about them from a conceptual perspective.
If a thread wants to access a shared resource, this thread must grab a lock before it’s granted access to that resource.
And after it’s done with the resource, it releases this lock.
If the lock is not available because it is grabbed by another thread, then the thread has to wait for the lock to be released first.
This simple concept guarantees that at most one thread can have access to a shared resource at a time.
Deadlocks
A deadlock is when your program comes to a complete halt because some of the threads can’t progress further because they can’t acquire a lock.
For example, imagine Thread A is waiting on Thread B to release a lock. At the same time, Thread B is waiting on Thread A to release another lock that Thread A is currently holding.
In this dire situation, neither Thread A nor Thread B can progress any further so your program is hosed!
This is what a deadlock is.
And it happens more often than you think.
To make the situation worse, it’s also one of the hardest problems to debug.
Race conditions
As I mentioned earlier, a race condition is a situation that arises when accessing a shared resource isn’t protected (for example, by locks).
This can lead to disastrous unexpected outcomes.
Take a look at this example.
import threading
# x is a shared value
x = 0
COUNT = 1000000
def inc():
global x
for _ in range(COUNT):
x += 1
def dec():
global x
for _ in range(COUNT):
x -= 1
t1 = threading.Thread(target=inc)
t2 = threading.Thread(target=dec)
t1.start()
t2.start()
t1.join()
t2.join()
print(x)
Here is what the code above does. There is a shared global variable x that is initialized to 0.
Two functions inc and dec run in parallel. inc() increments the value of x 1 million times whereas dec() decrements the value of x 1 million times.
By quickly going through the code, it can be concluded that the final value of x should be 0… but is it?
Here is what I get when I run the above code.
$ python3 race.py
158120
$ python3 race.py
137791
$ python3 race.py
-150265
$ python3 race.py
715644
The reason why this is happening is that the shared resource x is not protected (by locks for example).
Python’s Parallel Programming
Only after you’re comfortable with the concepts discussed above that you are ready to learn how to write concurrent programs in Python.
First, you should learn how Python’s definition of multiprocessing is different from multithreading. (By the way, this is completely unrelated to threads and processes from an OS perspective).
To understand this distinction between multiprocessing and multithreading from Python’s view, you will need to learn and understand the global interpreter lock (GIL).
You will also need to learn about the threading, queue, and multiprocessing Python modules.
Comments
Post a Comment