- 03-01-2011: an additional scheme variant has been added.
- 03-02-2011: minor modifications
The archetypal implementation of imperative iterators are Java's subclasses of the Iterator interface. Everyone who has ever implemented an Iterator should acknowledge the pain that is to explicitly maintain state between calls. The essential problem is that the state has to be explicitly maintained. In particular there are two different concepts of state: the iterator "low level" state and the iteration state. In the iterator we have to save "what we have done" and explicitly do so.
Then every time next() or hasNext() are called, we have to do some more computation, save our internal state and return the control explicitly to our caller. This is the implementation of a Javish iterator in clojure to pre-order visit a tree like structure.
I decided to use Clojure instead of Java because the example structure was easier to write, the example could be just run in a REPL, i prefer Clojure to Java and other irrelevant details. However, that is essentially the way iterators are written in object oriented languages (included nice ones, like Python) or languages that have to interoperate with OO languages.
The principal issues is that the "value returning" logic has to be mingled with the iterator bookkeeping logic; I believe that this (although more versatile) is far more complicated that simply adding all the values in a (flat) structure and returning the structure. The reason why this is not the dominant approach is that it is less flexible and less efficient. We also notice that iterators can be infinite.
Ah, if you like OO mumbo-jumbo, this is of course the Iterator pattern.
In this section we briefly examine Python yield statement. The corresponding yield expression (which essentially allows limited coroutines) is not considered here. If a Python function contains the yield keyword, its semantic changes.
The yield statement may only be used inside functions. A function that contains a yield statement is called a generator function.
When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed. Instead a generator-iterator object is returned; this conforms to the iterator protocol, so in particular can be used in for-loops in a natural way.
Each time the .next() method of a generator-iterator is invoked, the code in the body of the generator-function is executed until a yield or return statement (see below) is encountered, or until the end of the body is reached.
If a yield statement is encountered, the state of the function is frozen, and the value of expression_list is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call.
Essentially instead of using the structure of the previous example, we could have written:
The idea is that when yield is encountered the generator "returns" the value and when next() is subsequently invoked, it resumes execution from that spot. Essentially it is a neat syntactical facility that "saves" somewhere the continuation of the computation in order to be ready to resume from that point.
Ok. I said it. Continuation. That is. When the goal is to experiment with control, scheme comes to mind. This is our representation of the tree:
Now I provide an easy example for doing the preorder "functionally". The idea is to provide a function taking a callback function that is called on each element of the tree when is visited. The function also returns the return values of the callback function (in reverse order); anyway, use display as a callback function and we have our familiar example:
Here we did not use continuations. In the following example, I use continuations to save the state of the computation where to resume (after the yield) and to jump out (simulating python yield statement) with another continuation. I think this code should be written more clearly; I'm no expert with continuations (I usually find them too general), so suggestions are welcome.
When generator is called, it returns a function that every time that is called returns the successive element. It behaves like an almost like an iterator and has the structure of Python generator functions. Boilerplate code could be avoided using a macro. Essentially, lines 2-6 are boilerplate and similarly are lines 11-13 and 18. The difference with a "java like" iterator is that when there is no way to test if there is a "next" element (no hasNext method, closures are objects with just one method! ;)) and it returns null forever. So the implicit protocol is "do not put null objects in the tree" and if you get null, then you are done. Other strategies could be used, e.g., instead of returning the value, a list containing the value is returned; empty list means that the iteration completed; list containing an empty list meant that the tree contained a null valued node.
We could have used continuation passing style and that would perhaps have been more similar (even though far more complex to understand and write) than the corresponding Clojure way to do it.
In the comments a different scheme continuation based solution has been suggested by Marco. It is far more elegant than my original solution, but is structurally different from the Python code of the sections above. In fact, it does not use an explicit stack to manage the tree traversal (as both the Python version and my scheme version do), but relies on more recursive calls, essentially using the implicit call stack.
What about lazy-seq?
Essentially we specified that iterators are needed because they are efficient and more general. Indeed in languages such as Python, generators are an easier way to write iterators. What about Clojure? The example at the very beginning is not the way things should be done in Clojure. On the other hand the iterator in scheme would be a good API for Clojure as well. In fact it is the very same pattern that we have with map (Haskell programmers here may want to extend the discussion on "mapping" on different algebraic types, such as Trees, and not only Lists).
Imperative iterators are just a design pattern. Call/cc is a feature so powerful you can use it build almost every other control abstraction. Yield can be used to implement coroutines (which could also be implemented with call/cc). And what about lazy-seq? Lazy-seq is nothing like that. Many Lisp books show a simple example where macros are used to implement lazy evaluation (or at least list lazy evaluation, which is what matters most). lazy-seq basically does that. The difference is that other Lisps are usually not built around the idea of lazy sequences the way clojure is.
Consequently, most libraries and APIs do not work (or do not play well) with lazy evaluation. They could be made to work, but it is simply outside Common Lisp pragmatics, and there is not anything wrong with that.
But back to lazy-seq... why it is here? Consider the code below.
I want to point out the structure of the tree-seq-aux function. Indeed, the function is not recursive. It simply return a lazy-seq. The second parameter of the cons cell we put in the lazy-seq, is another call to tree-seq-aux. However, this is not recursion: when lazy-seq is evaluated, we already returned from the previous call.
But thing are even more interesting than this. That cons cell is somewhat similar to the iterator we saw at the very beginning. Its car (first) is the current value. It is "next". And what is its cdr (rest)? Suppose that our iterator was not stateful. Suppose instead that calling next() returns both the current value and another iterator which will provide the successive values. Well... basically it is our cons cell.
Pair<Value, ConstIterator<Value>> v = it.next();
That is essentially our cons cell. But in fact we somehow have the same pattern we had in the continuation based example. We "save" what to do next (akin to the jump continuation) in the cdr of the cons cell; however, we do not have to evaluate it right now because of the lazy-eval. As I already said, if we used continuation passing style, we would have placed the recursive call which we put in the lazy-seq cons cdr in the explicit continuation.
The advantage is that it is easier to write, easier to understand, easier to use. Unfortunately is much more specific and powerful. But I don't mind practical tools, when they have the elegance lazy-seq has.