Wednesday, February 25, 2009

changing the 'iterate' statement

I have been thinking about changing the iterate statement for some time now, and I've written about it before. Well, I have decided to finally do something about it, because the current 'iterate' is useless to me at this time (I cannot implement it in my llvm-new branch of the compiler if I know that the statement will change completely anyway).

To be honest, I still haven't completely figured out a complete solution yet. So I will provide a temporary solution to get a usable 'iterate' statement for now, in a way that will (hopefully) be compatible with the final solution. What kind of final solution am I looking for? It will involve some kind of range type, likely combined with an iterator concept. Iterators will be needed soon for the new interface of the 'string' type.

The new syntax is as follows:
iterate variable in [begin .. end]
This syntax is a hardcoded version of something that the new range feature will provide. I have intentionally kept it very simple; it does not provide stepping or combining multiple iterates into one. A simple example of how it can be used:
procedure fac(n : nat) : nat
var f : nat = 1

iterate i in [2 .. ++n]
f *= i

return f
As you can see, there is an important semantical difference with the old syntax: the end specification is no longer inclusive. Another important difference is that the iteration process will be strictly incremental, so the body will not be executed at all if begin is not smaller than end. When iterating an index variable through the elements of a container, you will need to use the size of the container in end:
iterate i in [0 .. v.size]
I will try to implement the new syntax and semantics as soon as possible in the front end (trunk), and then add codegen for it in the llvm-new branch.

Sunday, February 15, 2009

Website moved again

My website has moved again. The new location:

And the pages about Hyper are here:

I hope it's the last time I need to move my website. Having to update all links to it is not that much fun really.

Saturday, February 07, 2009

Having "pure" functions

I have been thinking on this new feature for some time, but I'm not sure if it's a good idea. Just in case, the keyword for it has already been reserved in the 0.4.2 release of the compiler front end.

The idea is to introduce "pure" functions. Any procedure that is qualified as "pure" would guarantee to have no side effects and to have it's result only depend on the values of its parameters. The main purpose of this is to enable extra optimizations by the compiler, and to guarantee that some functions that do not need side-effects will not have them. So if you have some container class, its "count()" procedure (that returns the number of elements in the container) can be qualified as "pure" because it doesn't (and shouldn't) alter the state of the container, and because as long as the container is not modified, it will keep returning the same value over and over again. This implies that we consider the "this" pointer an implicit parameter for each non-static procedure, and that "pure" is a more restrictive version of the "const" qualifier for procedures. If you need a "pure" procedure that doesn't depend on its hidden "this" parameter, then you should also qualify it as "static".

Now the implications of this new feature:
  1. Because "pure" is actually a more restrictive version of "const", you can have a normal and a "pure" version of the same procedure, just like you can put a normal and a "const" procedure with the same parameters in a class.
  2. You can override a "const" procedure with a "pure" one.
  3. You can only override a "pure" procedure with a "pure" one.
  4. The compiler needs to check if a "pure" procedure really has no side-effects and if it only uses its parameters to calculate the result (!).
Of those four implications, the last two have a large impact.

The third one could make things a bit more difficult than they are now. If you design a class that will be inherited from, you will have to be careful that any procedure you mark as "pure" will never have to be overridden with one that isn't pure. Could be trivial for functions like "count()", but for other cases it might be a (very) difficult decision.

The fourth one might even be a show-stopper for including the feature. If the compiler needs to check that a procedure is really "pure", then it would need to put restrictions on each statement and expression used in that procedure. You would not be allowed to access any class fields, from the same class or from any other class. All expressions used should be pure as well. Any assignment statements should only modify local variables. If you declare variables in the procedure, their constructors should also be pure (no access to outside data and no side-effects elsewhere). It would be possible to have the compiler check those things, but it would put a major burden on the programmer. He would need to have "pure" in his mind everytime he writes a constructor for a class, because it might need to be called at some point somewhere in a "pure" procedure. He would need to mark all (or most) of his user-defined operators as "pure", unless I make "pure" the default or even a requirement for all binary (and unary?) operators. All these things would cause the programmer's code to be filled with things marked as "pure", just in case. Most programmers would probably just avoid "pure" entirely because it makes things too difficult.

So I'm unsure wether or not to include the feature. It could certainly be useful. But would it be worth the effort?

Another possibility is to use "pure" just as an attribute that informs the compiler about potential optimisations. But in that case someone could mark a procedure as "pure" while it does have side-effects.

At this time, I am still thinking about the feature. Feedback is welcome, as usual.

Monday, February 02, 2009

Compiler development update

So here's an update on how things went after my previous post ("Compiler development roadmap").

After the "typerefactor" branch was more or less finished, I tried to merge it into (a copy of) the "llvm" branch. But the merge didn't work, so I decided to do things in a different way.

I created a completely new branch called "llvm-new", starting from the code of the "typerefactor" branch. Then I added the LLVM 2.4 sources to it, using the CMake build system from LLVM itself. To get the thing completely compiled and working I had to update to a SVN version of the LLVM code, though. And then I started writing my LLVM back end from scratch.

Wanting to avoid the mistakes of the first "llvm" branch, I started immediately on the implementation of a complete mechanism to manage types and to do type conversions. This means dealing with real class types, that are passed as a (this-)pointer, with primitive types that are passed as a value directly, passing parameters by reference or by value, passing values to "inout" parameters, referencing/dereferencing, indirectly returning values (using a extra hidden function parameter), etc... Eventually this implementation turned out pretty good, so it's no longer a PITA to write back end code like passing a value to a parameter.

Now I'm working on the implementation of codegen for all types (classes actually), expressions and statements. For most things implementation is easy, because I can look at the old implementation in the "llvm" branch and port that code to the new way of doing things. This new way is not just the typepassing/-conversion infrastructure; I've also changed the way I emit instructions. Previously I created the LLVM IR directly, but now I use the IRBuilder utility class from LLVM. And I now have a utility class that makes it a lot easier to emit basic blocks and branches to them.

So things are going forward slowly. I think that my "llvm-new" branch now has about 75% of the codegen functionality that was in the "llvm" branch. But things are still primitive; the compiler spits out lot of debug output followed by LLVM IR. You can try it if you want to, the code is available on the Launchpad project page:

So most of my time is currently spent on the "llvm-new" branch. But that doesn't mean that the other branches are dead, however. The "typerefactor" branch now acts as a merge bridge between "llvm-new" and "main" (the 'official' front end branch). All front end related changes in "llvm-new" are regularly merged into "typerefactor". And those improvements are then merged into the "main" branch. The reason for doing things this way is that I like to keep the differences between "llvm-new" and "typerefactor" (in their front end code, at least) as small as possible. And now I can do some more or less invasive changes in the "main" branch without worrying about breaking the back end code.

As you might have noticed, progress has been going rather slowly these days. That's because I found a job as a software developer, using .NET (C# and VB). But at home it's still only Linux (Gentoo) for me. I'd prefer not to depend on Microsoft personally, but I need something to pay the bills of course.