Wednesday, December 19, 2007

iterate revisited

I am going to talk a bit about the iterate statement in this post. If you don't know what it is, look at the web page behind the hyperlink.

The iterate statement was derived from the FOR statement from PASCAL and BASIC (and their derivatives). I had to generalize it in order to allow it being used with user-defined types instead of just numbers; my intention is to have it support the behavior of 'foreach' in some other languages. So I came up with some scheme to support the use of iterators. I defined some operators that need to be supported by the types used in an iterate statement. For example:

The variable should be initializable by the from-expression. The variable should support increment when using "step ++". The variable must be able to be compared to the to-expression. In an "iterate v : t from x to y step z" the compiler would first compare x and y, to see which one is the largest value. If x <= y then the step would be considered to be an incremental value, and otherwise a decrementing value. And there are operators needed that can decide when the to-value is reached for both cases; so you would need comparisons v <= y (for an incrementing step) and y<= v (for a decrementing step) to be available. But this scheme is not good enough. When I came up with this I assumed the use of a "begin" and "end" iterator, where the "end" iterator should point at the last element of the collection. In C++ the 'end' iterator points one past the last element. I thought that my change would not be a problem, but I forgot to think of what would happen when iterating over an empty collection. Such a collection does not have a first or last element iterator!

So I will have to change the current way of doing things. One possible way is having the 'end' iterator should be done as in C++, and point to one past the last element. This means an iterate will look as follows:
iterate v : c.iterator from c.begin to --c.end
# do something
It would require the programmer to remember using a "--" in front of the "end" iterator, because the "end" iterator does not point to a valid element in the collection. The iterator also must support the predecessor operator, but what does it return when used on the "end" iterator in an empty collection?

Another possibility is having iterate extract some other info from the collection before starting the loop. It could first query the iterators for the instance of the collection class they belong to (and both iterators must return the same value) and then ask the collection class whether it's non-empty and skip the loop if it's not. But this would require allowing invalid iterators returned from the collection in order to return their collection object when the collection is empty.

And a third option would be to specify the collection object explicitly in the iterate statement itself. Something like:
iterate i in c
# do something for each element

# - or -

iterate i in c from someStartIterator # optional "to" or "step" omitted here
# do something for each element, starting from some given iterator
This would need an extra keyword "in" (already silently supported by the latest compilers), and some predefined methods in the collection, required by the iterate statement for determining the non-emptyness of the collection and its start and end values. It would not change the existing iterate behaviour but only extend it.

I don't know yet what I'll choose as the solution. Maybe you'll see the change as it appears in a new compiler release, or maybe I'll write about it before I implement it.

No comments: