Dataflow Graphs – Or: Visual Programming for Non-Programmers

A dataflow graph, as incarnated by FlowSheets, is the visualisation of a program’s execution with the help of a graph, thereby focussing on the data instead of the program structure. Directed edges in the dataflow graph represent the data values (e.g. numbers, strings or even tables) and nodes correspond to the operations that are applied to these data values:

Simple Dataflow Graph Example (ANKHOR FlowSheet)

A node’s operation is executed when all data values on the incoming edges become available, which happens when all upstream operations have finished. If one of the input values changes, the operation must be re-evaluated, generating a new value on its output. The graph enters a steady state when all output values of all nodes have been calculated. It will not change any further unless one of the inputs changes again.

Dataflow graphs are related to functional programming – each variable (an edge of the graph) is the target of exactly one assignment operation (the output of one node). With FlowSheets, you therefore get immediate access to the advantages of this powerful paradigm - without having to write or read a single line of code in textual form:

  • All intermediate results are visible at all times and do not have to be “imagined” by the user. A graph’s execution can be inspected for any step of the processing, including loops.
  • Nothing in the graph changes any more once it has been computed, so there is no need to walk the execution in one’s mind or to guess a „before“ step that might have caused the current state.
  • Final and all intermediate results are immediately calculated whenever one of the inputs or the structure of the graph changes, thus giving direct feedback for all editing operations.
  • As all intermediate results are present in the graph, one can incrementally build a graph by simply connecting previous results with new operations. The process of programming becomes more natural and follows the natural way of problem solving by attacking it step by step.
  • Dataflow graphs are acyclic: The result of an operation does not have an effect on the operation itself or one of its inputs. Therefore, one can easily find the values that cause the result of an operation by walking the graph backwards. This is a big advantage in the case of an error: One can simply find the root cause by following the error’s path backwards.
  • The graph evaluation is run in parallel on all available CPU cores without requiring special consideration by the programmer. The dataflow structure implicitly contains all dependencies and therefore yields the parallel steps as part of its execution.