Loops and Iterations
The repeated execution of a similar sequence of instruction over a set of data is one of the most common yet also most powerful elements of digital computers. It is the moment when the effort of programming pays off, exceeding the cost of creating the program by using it more than once.
ANKHOR supports various types of repeated execution and I will cover five of them in this article: implicit, loop, recursive, while and shard. The article uses a little fractal generator as a sample to provide some eye candy.
The Fractal Algorithm
I will use the well- known Mandelbrot set as a fractal image. The algorithm is based on the repeated execution of a simple formula Mn+1=Mn2+R where M and R are numbers taken from the complex plane. The sequence of Ms can take one of three routes; it can explode into infinity, get into a repeating sequence or dwindle down to zero.
The traditional way of presenting this set, is to repeatedly evaluate it for a sub section of the complex plane until either an iteration limit is exceeded or the absolute value of M has exceeded a boundary value and is known to explode towards infinity. Values of R that lead to an exploding M are coloured using a palette based on the first iteration that the corresponding value of M has exceeded the boundary. Values that are still unknown after the number of allowed iterations has been executed or have reached zero are coloured black.
The corresponding sheet looks like this:
The first operator generates a sub section of the 2D complex planet, the second operator performs the iterations and the third renders the final set as a 3D image. We will cover the workings of the second operator in this article.
Implicit iteration is the most common form of repeated execution in ANKHOR and we can find it in the innermost core of the executed graph. It is used whenever two or more tables are combined using e.g. an arithmetic operator. The operator iterates in parallel over all cells of the incoming tables and generates a result for each of it.
This iteration is not visible to the sheets user and does not need to be explicitly invoked, therefore I refer to it as implicit repetition or implicit iteration.
Most table operations can be handled using implicit iteration, but it will not be sufficient if as a third dimension or a recursive dependency is needed.
A loop operator splits all tables that are entered via the data inputs into their individual rows and executes its body graph individually for each of these rows. Loop dependent variables can be carried from one iteration to the next using the state connectors of the operator.
In our case the loop is running over a list of iteration indices from one to the maximum number of allowed iterations. The state connectors carry the following values:
- m : The current state of the set
- r : The position in the complex plane for the set element
- n : The current / final number of iterations for the set element
- d : The final absolute value of a set element when exceeding the boundary
- limit : The boundary value
The inner graph of the loop consists of three sections:
The iteration step calculates the next value of the sequence. This value is then checked for exceeding the boundary condition (or having exceeded before). The final step selects the correct values to forward into the next iteration.
The first highlighted value path follows the iteration value through the iteration step and into the next iteration. The value is not forwarded any further once it exceeds the boundary to avoid numerical overflow.
The second path highlights the calculation of the boundary condition. It evaluates to true, if the absolute value of the iteration exceeds the boundary and the condition was not evaluated to true before.
There is a clear drawback of this version. The loop is executed the maximum number of iterations for all elements of the set regardless of their boundary checks.
The idea of this version is to split the set into two subsets after each iteration, one for the elements that have exceeded the boundary and one for the elements that have not. The second set is the evaluated recursively until the set is empty or a maximum recursion depth has been reached.
This time we have a macro operator with three sections, the iteration step that calculates the next iteration of the set elements and the index, the check of the boundary condition and the recursion. The conditional operator includes the lambda operator that performs the actual recursion step. The operator receives the unwound list of M and R values, and thus evaluates the boundary condition to a list of Boolean values. The conditional operator splits the set based on this Boolean list and executes its body only for those values that are in the TRUE set. The sets are merged back by the operator after the conditional operator is finished.
The recursive version only executes set element iterations that have not yet reached the boundary, and it can stop early if all elements of the tested set tend towards infinity. Unfortunately recursion tends to be more expensive than iteration computation wise, so we have penalty here.
We can combine the flexibility of the recursive evaluation with the simplicity of the loop by using a while loop operator. This operator executes its body graph as long as the “while” input receives a logic true value. There is one downside to the recursive evaluation. The while operator does not split our set into a finished and a still working set, instead we have to perform this ourselves. So in each iteration the set is split into two, the elements that have exceeded the boundary condition leave the while loop through the state output – the elements that don’t continue into the next iteration via the state output. To recombine the results we have to output the index (or starting position) of each set elements together with the values.
The “rowindex” operator generates the indices and the “findrow” together with the “segrow” operators bring them back into sequence.
The inner workings of the loop body graph is more complex than the previous because we have to calculate the terminating condition as well as the loop split.
The three sections are again colour coded, and it looks like a bastard of the original loop and the recursive evaluation. The bank of “select” operators has grown a bit, because it has to cover the set index now as well.
We can now advance to a grid scale, distributing the evaluation of the loop to several machines of an ANKHOR compute grid. This is done by embedding our “while” loop version into a “shard” loop.
The shard loop operator works similar to a normal loop, splitting the rows of the data inputs. The rows are then distributed to all nodes in the grid and the body graph executed on all machines. The results are sent back to the ANKHOR workstation and re-assembled into tables. Values that are passed via the state inputs are distributed as one data block to all participating nodes.
The row split does not happen on a per- row- level but on groups of rows called shards. When the “shard” operator is selected, then the Property Editor shows its parameters:
The “shard” operator has some additional configuration parameters that control how the rows are distributed.
- Min Rows : Minimum number of rows per shard. Some algorithms use multiple rows and thus work better when executed on groups of rows.
- Max Shards : Maximum number of shards per table. Splitting the data in too many small shards causes more communication overhead than benefit, so this is a way to limit the overhead of distributing the job.
- Num Parallel : Number of shards executed in parallel on the local machine.
The distributed execution can be monitored using the ANKHORGridAdmin sheet:
I have executed the four variants of the iteration loop with six different sub sets of the complex planes. The corresponding images can be seen at the top of the article. The compute grid for the sharded version consisted of two additional computers on the same local network. These were the running times:
It is clearly visible how performance improves with each variation.
The Power of Loops
Loops are a powerful element of programming languages but usually not found in either Spreadsheets or basic SQL. ANKHOR offers a variety of repetitive execution mechanisms, starting from simple implicit repetitions and going up todu distributed evaluation on a grid.
Loops can be embedded into other loops to e.g. generate animations.
Download FlowSheet and sample data (ZIP archive)