Saturday, February 07, 2009

Having "pure" functions

I have been thinking on this new feature for some time, but I'm not sure if it's a good idea. Just in case, the keyword for it has already been reserved in the 0.4.2 release of the compiler front end.

The idea is to introduce "pure" functions. Any procedure that is qualified as "pure" would guarantee to have no side effects and to have it's result only depend on the values of its parameters. The main purpose of this is to enable extra optimizations by the compiler, and to guarantee that some functions that do not need side-effects will not have them. So if you have some container class, its "count()" procedure (that returns the number of elements in the container) can be qualified as "pure" because it doesn't (and shouldn't) alter the state of the container, and because as long as the container is not modified, it will keep returning the same value over and over again. This implies that we consider the "this" pointer an implicit parameter for each non-static procedure, and that "pure" is a more restrictive version of the "const" qualifier for procedures. If you need a "pure" procedure that doesn't depend on its hidden "this" parameter, then you should also qualify it as "static".

Now the implications of this new feature:
  1. Because "pure" is actually a more restrictive version of "const", you can have a normal and a "pure" version of the same procedure, just like you can put a normal and a "const" procedure with the same parameters in a class.
  2. You can override a "const" procedure with a "pure" one.
  3. You can only override a "pure" procedure with a "pure" one.
  4. The compiler needs to check if a "pure" procedure really has no side-effects and if it only uses its parameters to calculate the result (!).
Of those four implications, the last two have a large impact.

The third one could make things a bit more difficult than they are now. If you design a class that will be inherited from, you will have to be careful that any procedure you mark as "pure" will never have to be overridden with one that isn't pure. Could be trivial for functions like "count()", but for other cases it might be a (very) difficult decision.

The fourth one might even be a show-stopper for including the feature. If the compiler needs to check that a procedure is really "pure", then it would need to put restrictions on each statement and expression used in that procedure. You would not be allowed to access any class fields, from the same class or from any other class. All expressions used should be pure as well. Any assignment statements should only modify local variables. If you declare variables in the procedure, their constructors should also be pure (no access to outside data and no side-effects elsewhere). It would be possible to have the compiler check those things, but it would put a major burden on the programmer. He would need to have "pure" in his mind everytime he writes a constructor for a class, because it might need to be called at some point somewhere in a "pure" procedure. He would need to mark all (or most) of his user-defined operators as "pure", unless I make "pure" the default or even a requirement for all binary (and unary?) operators. All these things would cause the programmer's code to be filled with things marked as "pure", just in case. Most programmers would probably just avoid "pure" entirely because it makes things too difficult.

So I'm unsure wether or not to include the feature. It could certainly be useful. But would it be worth the effort?

Another possibility is to use "pure" just as an attribute that informs the compiler about potential optimisations. But in that case someone could mark a procedure as "pure" while it does have side-effects.

At this time, I am still thinking about the feature. Feedback is welcome, as usual.

No comments: