In this example, the work can be unevenly divided. The sequence of heavy computations at the end of the list is may require an adjustment to the default tasksize
>
|
|
| (2.1.1) |
>
|
|
node 2 computing 78125
node 1 computing 625
node 3 computing 9765625
node 0 computing 5
node 0 computing 25
node 1 computing 3125
node 0 computing 125
node 1 computing 15625
node 2 computing 390625
node 2 computing 1953125
node 3 computing 9765625
node 3 computing 9765625
| |
With only a few numbers in the computation, the default is to divide the operation into equal portions. Node 0 gets the first three data points, node 1 gets the next three, and so on. Because, in this case, the work is unevenly distributed, node 3 ends up computing the three hardest cases and becomes the bottleneck.
To optimize the timing on this kind of example, you can set a smaller tasksize. With tasksize=1, the algorithm will split the data into chunks of 1 element (instead of 3-elements in the previous example). There are 12 tasks but only 4 compute nodes. Each node will finish one task before asking for another.
>
|
|
node 0 computing 5
node 1 computing 25
node 3 computing 625
node 2 computing 125
node 0 computing 3125
node 1 computing 15625
node 2 computing 78125
node 3 computing 390625
node 0 computing 1953125
node 1 computing 9765625
node 2 computing 9765625
node 3 computing 9765625
| |