Performance penalty in using imap() vs map() in python multiprocessing

In order to checkout Python’s multiprocessing functionality and quite inspired by Benjamin Scherrey’s presentation on Test Driven Development, I wrote a pi calculator using the monte carlo method that utilizes the Python multiprocessing module.

For those who are not aware, roughly speaking the monte carlo simulator is like blindly throwing darts at a circle that is framed perfectly in a square where the length of the square is equal to the diameter of a circle. The ratio of darts that lands inside the circle vs outside circle will give us pi. For further information you can take a look here.

Initially the code was utilizing the map() function which needed to have a list passed to it containing an item for every coordinate I needed to process. When the list got really large (2 ^ 24) I realized that the python interpreter was taking up alot of RAM so I decided to use iterators instead of passing a really large list.

The initial concerns with the iterator was that because an item had to be requested on every iteration, there would be a significant performance penalty as each worker in the pool would have to wait until an item gets taken off before the next calculation could be performed.

My first version of the code was as follows:

import math
import random
from multiprocessing import Pool

# Keep seed constant
random.seed(1)

'''
Main Work function
deleteme is there because we need to pass it something for it to work with map()
'''
def do_work(deleteme):

	# Not in circle by default
	returnValue = 0

	coordinate = (random.random(), random.random())

	# Set return to 1 if coordinate within circle
	if math.hypot(coordinate[0], coordinate[1]) < 1:
		returnValue = 1

	return returnValue

if __name__ == "__main__":

	# Total number of items we're going through
	TOTAL = 2**24

	# start a pool with 4 processes
	pool = Pool(processes=4)

	# give it work in work chunks of 65536
	result = pool.map(do_work, xrange(TOTAL), 65536)

	#outputs pi
	print  ( float(sum(result)) / float(TOTAL) ) * 4

When run, this code takes up 140MB of RAM and takes 34.87 user seconds to execute. I then substituted __main__ with the following:

if __name__ == "__main__":

	# Total number of items we're going through
	TOTAL = 2**24

	# start a pool with 4 processes
	pool = Pool(processes=4)

	# give it work in work chunks of 65536
	result = pool.imap_unordered(do_work, xrange(TOTAL), 65536)

	totalResult = sum(x for x in result)

	print  ( float(totalResult) / float(TOTAL) ) * 4

The results were 12MB of RAM and 39.01 user seconds. This translated to a 1 second penalty which is insignificant.

Conclusion:

It seems that my concerns were unfounded, python does a good job making sure that there are always results waiting to be processed and any overhead involved is minuscule.

Share

Copyright © 2010. All Rights Reserved.

Leave a Reply

Your email address will not be published. Required fields are marked *