@@ -226,13 +226,212 @@ Numba
226226-----
227227.. todo :: Write about Numba and the autojit compiler for NumPy
228228
229- Threading
230- :::::::::
229+ Concurrency
230+ :::::::::::
231+
232+
233+ Concurrent.futures
234+ ------------------
235+
236+ The `concurrent.futures `_ module is a module in the standard library that
237+ provides a "high-level interface for asynchronously executing callables". It
238+ abstracts away a lot of the more complicated details about using multiple
239+ threads or processes for concurrency, and allows the user to focus on
240+ accomplishing the task at hand.
241+
242+ The `concurrent.futures `_ module exposes two main classes, the
243+ `ThreadPoolExecutor ` and the `ProcessPoolExecutor `. The ThreadPoolExecutor
244+ will create a pool of worker threads that a user can submit jobs to. These jobs
245+ will then be executed in another thread when the next worker thread becomes
246+ available.
247+
248+ The ProcessPoolExecutor works in the same way, except instead of using multiple
249+ threads for its workers, it will use multiple processes. This makes it possible
250+ to side-step the GIL, however because of the way things are passed to worker
251+ processes, only picklable objects can be executed and returned.
252+
253+ Because of the way the GIL works, a good rule of thumb is to use a
254+ ThreadPoolExecutor when the task being executed involves a lot of blocking
255+ (i.e. making requests over the network) and to use a ProcessPoolExecutor
256+ executor when the task is computationally expensive.
257+
258+ There are two main ways of executing things in parallel using the two
259+ Executors. One way is with the `map(func, iterables) ` method. This works
260+ almost exactly like the builtin `map() ` function, except it will execute
261+ everything in parallel. :
262+
263+ .. code-block :: python
264+
265+ from concurrent.futures import ThreadPoolExecutor
266+ import requests
267+
268+ def get_webpage (url ):
269+ page = requests.get(url)
270+ return page
271+
272+ pool = ThreadPoolExecutor(max_workers = 5 )
273+
274+ my_urls = [' http://google.com/' ]* 10 # Create a list of urls
231275
276+ for page in pool.map(get_webpage, my_urls):
277+ # Do something with the result
278+ print (page.text)
279+
280+ For even more control, the `submit(func, *args, **kwargs) ` method will schedule
281+ a callable to be executed ( as `func(*args, **kwargs) `) and returns a `Future `_
282+ object that represents the execution of the callable.
283+
284+ The Future object provides various methods that can be used to check on the
285+ progress of the scheduled callable. These include:
286+
287+ cancel()
288+ Attempt to cancel the call.
289+ cancelled()
290+ Return True if the call was successfully cancelled.
291+ running()
292+ Return True if the call is currently being executed and cannot be
293+ cancelled.
294+ done()
295+ Return True if the call was successfully cancelled or finished running.
296+ result()
297+ Return the value returned by the call. Note that this call will block until
298+ the scheduled callable returns by default.
299+ exception()
300+ Return the exception raised by the call. If no exception was raised then
301+ this returns `None `. Note that this will block just like `result() `.
302+ add_done_callback(fn)
303+ Attach a callback function that will be executed (as `fn(future) `) when the
304+ scheduled callable returns.
305+
306+
307+ .. code-block :: python
308+
309+ from concurrent.futures import ProcessPoolExecutor, as_completed
310+
311+ def is_prime (n ):
312+ if n % 2 == 0 :
313+ return n, False
314+
315+ sqrt_n = int (n** 0.5 )
316+ for i in range (3 , sqrt_n + 1 , 2 ):
317+ if n % i == 0 :
318+ return n, False
319+ return n, True
320+
321+ PRIMES = [
322+ 112272535095293 ,
323+ 112582705942171 ,
324+ 112272535095293 ,
325+ 115280095190773 ,
326+ 115797848077099 ,
327+ 1099726899285419 ]
328+
329+ futures = []
330+ with ProcessPoolExecutor(max_workers = 4 ) as pool:
331+ # Schedule the ProcessPoolExecutor to check if a number is prime
332+ # and add the returned Future to our list of futures
333+ for p in PRIMES :
334+ fut = pool.submit(is_prime, p)
335+ futures.append(fut)
336+
337+ # As the jobs are completed, print out the results
338+ for number, result in as_completed(futures):
339+ if result:
340+ print (" {} is prime" .format(number))
341+ else :
342+ print (" {} is not prime" .format(number))
343+
344+ The `concurrent.futures `_ module contains two helper functions for working with
345+ Futures. The `as_completed(futures) ` function returns an iterator over the list
346+ of futures, yielding the futures as they complete.
347+
348+ The `wait(futures) ` function will simply block until all futures in the list of
349+ futures provided have completed.
350+
351+ For more information, on using the `concurrent.futures `_ module, consult the
352+ official documentation.
232353
233354Threading
234355---------
235356
357+ The standard library comes with a `threading `_ module that allows a user to
358+ work with multiple threads manually.
359+
360+ Running a function in another thread is as simple as passing a callable and
361+ it's arguments to `Thread `'s constructor and then calling `start() `:
362+
363+ .. code-block :: python
364+
365+ from threading import Thread
366+ import requests
367+
368+ def get_webpage (url ):
369+ page = requests.get(url)
370+ return page
371+
372+ some_thread = Thread(get_webpage, ' http://google.com/' )
373+ some_thread.start()
374+
375+ To wait until the thread has terminated, call `join() `:
376+
377+ .. code-block :: python
378+
379+ some_thread.join()
380+
381+ After calling `join() `, it is always a good idea to check whether the thread is
382+ still alive (because the join call timed out):
383+
384+ .. code-block :: python
385+
386+ if some_thread.is_alive():
387+ print (" join() must have timed out." )
388+ else :
389+ print (" Our thread has terminated." )
390+
391+ Because multiple threads have access to the same section of memory, sometimes
392+ there might be situations where two or more threads are trying to write to the
393+ same resource at the same time or where the output is dependent on the sequence
394+ or timing of certain events. This is called a `data race `_ or race condition.
395+ When this happens, the output will be garbled or you may encounter problems
396+ which are difficult to debug. A good example is this `stackoverflow post `_.
397+
398+ The way this can be avoided is by using a `Lock `_ that each thread needs to
399+ acquire before writing to a shared resource. Locks can be acquired and released
400+ through either the contextmanager protocol (`with ` statement), or by using
401+ `acquire() ` and `release() ` directly. Here is a (rather contrived) example:
402+
403+
404+ .. code-block :: python
405+
406+ from threading import Lock, Thread
407+
408+ file_lock = Lock()
409+
410+ def log (msg ):
411+ with file_lock:
412+ open (' website_changes.log' , ' w' ) as f:
413+ f.write(changes)
414+
415+ def monitor_website (some_website ):
416+ """
417+ Monitor a website and then if there are any changes,
418+ log them to disk.
419+ """
420+ while True :
421+ changes = check_for_changes(some_website)
422+ if changes:
423+ log(changes)
424+
425+ websites = [' http://google.com/' , ... ]
426+ for website in websites:
427+ t = Thread(monitor_website, website)
428+ t.start()
429+
430+ Here, we have a bunch of threads checking for changes on a list of sites and
431+ whenever there are any changes, they attempt to write those changes to a file
432+ by calling `log(changes) `. When `log() ` is called, it will wait to acquire
433+ the lock with `with file_lock: `. This ensures that at any one time, only one
434+ thread is writing to the file.
236435
237436Spawning Processes
238437------------------
@@ -248,3 +447,8 @@ Multiprocessing
248447.. _`New GIL` : http://www.dabeaz.com/python/NewGIL.pdf
249448.. _`Special care` : http://docs.python.org/c-api/init.html#threads
250449.. _`David Beazley's` : http://www.dabeaz.com/GIL/gilvis/measure2.py
450+ .. _`concurrent.futures` : https://docs.python.org/3/library/concurrent.futures.html
451+ .. _`Future` : https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future
452+ .. _`threading` : https://docs.python.org/3/library/threading.html
453+ .. _`stackoverflow post` : http://stackoverflow.com/questions/26688424/python-threads-are-printing-at-the-same-time-messing-up-the-text-output
454+ .. _`data race` : https://en.wikipedia.org/wiki/Race_condition
0 commit comments