Author: Ben James / Source: Hackaday

The ability to execute code in parallel is crucial in a wide variety of scenarios. Concurrent programming is a key asset for web servers, producer/consumer models, batch number-crunching and pretty much any time an application is bottlenecked by a resource.
It’s sadly the case that writing quality concurrent code can be a real headache, but this article aims to demonstrate how easy it is to get started writing threaded programs in Python.
Due to the large number of modules available in the standard library which are there to help out with this kind of thing, it’s often the case that simple concurrent tasks are surprisingly quick to implement.We’ll walk through the difference between threads and processes in a Python context, before reviewing some of the different approaches you can take and what they’re best suited for.
(Python 3 is used for the duration of the article.)
The Global Interpreter Lock
It’s impossible to talk about concurrent programming in Python without mentioning the Global Interpreter Lock, or GIL. This is because of the large impact it has on which approach you select when writing asynchronous Python. The most important thing to note is that it is only a feature of CPython (the widely used “reference” Python implementation), it’s not a feature of the language. Jython and IronPython, among other implementations, have no GIL.
The GIL is controversial because it only allows one thread at a time to access the Python interpreter. This means that it’s often not possible for threads to take advantage of multi-core systems. Note that if there are blocking operations which happen outside Python, long-wait tasks like I/O for instance, then the GIL is not a bottleneck and writing a threaded program will still be a benefit. However, if the blocking operations are largely crunching through CPython bytecode, then the GIL becomes a bottleneck.
Why was the GIL introduced at all? It makes memory management much simpler with no possibility of simultaneous access or race conditions, and it makes C extensions easier to write and easier to wrap.
The upshot of all this is that if you need true parallelism and need to leverage multi-core CPUs, threads won’t cut it and you need to use processes. A separate process means a separate interpreter with separate memory, its own GIL, and true parallelism. This guide will give examples of both thread and process architectures.
The concurrent.futures module
The concurrent.futures
module is a well-kept secret in Python, but provides a uniquely simple way to implement threads and processes. For many basic applications, the easy to use Pool
interface offered here is sufficient.
Here’s an example where we want to download some webpages, which will be much quicker if done in parallel.
123456789101112131415161718192021222324 | """Download webpages in threads.""" import requests from concurrent.futures import ThreadPoolExecutor download_list = [ { 'name' : 'google' , 'url' : "http://google.com" }, { 'name' : 'reddit' , 'url' : "http://reddit.com" }, { 'name' : 'ebay' , 'url' : "http://ebay.com" }, { 'name' : 'bbc' , 'url' : "http://bbc.co.uk" } ] def download_page(page_info): """Download and save webpage.""" r = requests.get(page_info[ 'url' ]) with open (page_info[ 'name' ] + '.html' , 'w' ) as save_file: save_file.write(r.text) if __name__ = = '__main__' : pool = ThreadPoolExecutor(max_workers = 10 ) for download in download_list: pool.submit(download_page, download) |
Most of the code is just setting up our downloader example; it’s only the last block which contains the threading-specific code. Note how easy it is to create a dynamic pool of workers using ThreadPoolExecutor
and submit a task. We could even simplify the last two lines to one using map
:
22 | pool. map (download_page, download_list) |
Using threads works well in this case since the blocking operation that benefits from concurrency is the act of fetching the webpage. This means that the GIL is not an issue and threading is an ideal solution. However, if the operation in question was something which was CPU intensive within Python, processes would likely be more appropriate because of the…
The post Thread Carefully: An Introduction To Concurrent Python appeared first on FeedBox.