Sunday, October 26, 2008

Compiler development roadmap

You probably already know that there are two important versions of the compiler in development: the "main" branch with only a front-end (called "trunk" on Launchpad), and the "llvm" branch with an experimental LLVM-based back-end. Their source code can be found on the Launchpad project page:

https://code.launchpad.net/hyper

That page contains another development branch as well, called "typerefactor". This branch is a heavily modified version of the main branch with the purpose of refactoring the handling of types and type conversions. I started the typerefactor branch because the code of the llvm branch has become a bit difficult and fragile; the typerefactor branch will improve the front-end part of the compiler so that developing the LLVM back-end will be much easier.

Another thing on my TO-DO list is the upgrade of the back-end to LLVM 2.4. This means that I will have to replace my custom CMake build system for LLVM by the official one. Yes, you read that correctly: LLVM now has an (experimental) CMake-based build system of its own. I posted my code for building LLVM with CMake on the LLVM mailing list quite a while ago, and now someone has used that code as a start for writing a real CMake build system for LLVM (mine was very Unix-oriented and was just enough for using LLVM with my compiler front-end).

So currently I'm planning to do the following:
  1. Finish the typerefactor code.
  2. Create another branch, "llvm-experimental", based on the llvm branch, and merge the typerefactor code into it. It's possible that this will require some extra changes to the front-end code, and those will be merged into the typerefactor branch again.
  3. Upgrade the one of the llvm branches to LLVM 2.4. What branch I will use will depend on how long it takes to finish item 2 and on how difficult the build system transition will be.
  4. When item 2 is done, merge the typerefactor branch into the main branch.
  5. When items 2 and 3 are done, merge the llvm-experimental branch into the llvm branch.
  6. Implement codegen for everything the front-end currently supports (in the llvm branch).
  7. Merge the llvm branch into the main branch.
This is what I have in mind for the future, but it is not guaranteed that development will truly follow this roadmap. And it doesn't account for any front-end-only changes that I might do in the mean time.

Some things that will need to be done to the front-end:
  • Add support for 'references' (or restricted pointers)
  • Create a Unicode interface for the "string" type (no more random access)
  • Change the handling of operator overloading for binary operators (no more global list)
Others things are on my possible TO-DO list as well, such as adding support for "pure" procedures, but that feature will need to be thought out (and published here) first.