viernes, 3 de septiembre de 2010

Parallel::Iterator. Independent tasks are independent

OH HAI!

Today's module is Andy Armstrong Parallel::Iterator. I discovered it through Dagolden's blog, and tried it by myself that same day.

The idea is simple. You may have many slow tasks to do that don't have dependencies between each other. A typical example is fetching for webs using lwp, but I can think of lots of processes doing similar things, for exmple, ssh-ing some command to many different servers.

So it's like an autothreading map. Well sort of.

As tasks are not guaranteed to end in order, and you probably want to know which source procuded each result, the function you'll apply to each element will have to take not one but two parameters, the first being just an index that you can throw away. At least it seems so. I'm not sure I understand it fully, but for the moment that's what I gathered.

The module provides two sets of functions, iterate, and iterate_as_(array|hash), iterate returning tuples ($index, $result), and the others returning the wanted structure.

But what puzzles me is that I cannot easily migrate a normal map to this parallel::iterator because functions sent to map have to accept an extra parameter (just to throw it away?). Two days ago I read another post related to it but it didn't comment anything about that 'strange' use of it.

So I hacked a higher order function that mimics map signature but uses parallel::iterator. I'm probably missing something because it's strange the author didn't provide something like that in the module. Anyway, here it is:



I'm using the iterate_as_array because I want to mimic map signature, so the array to iterate can't be lazy built. That's another feature of Parallel-Iterator : not only the evaluation of the method can be lazy but also the generation of the list.

Ideas? Suggestions? Insults? Go on and comment :)