Sunday, October 26, 2008

Compiler development roadmap

You probably already know that there are two important versions of the compiler in development: the "main" branch with only a front-end (called "trunk" on Launchpad), and the "llvm" branch with an experimental LLVM-based back-end. Their source code can be found on the Launchpad project page:

https://code.launchpad.net/hyper

That page contains another development branch as well, called "typerefactor". This branch is a heavily modified version of the main branch with the purpose of refactoring the handling of types and type conversions. I started the typerefactor branch because the code of the llvm branch has become a bit difficult and fragile; the typerefactor branch will improve the front-end part of the compiler so that developing the LLVM back-end will be much easier.

Another thing on my TO-DO list is the upgrade of the back-end to LLVM 2.4. This means that I will have to replace my custom CMake build system for LLVM by the official one. Yes, you read that correctly: LLVM now has an (experimental) CMake-based build system of its own. I posted my code for building LLVM with CMake on the LLVM mailing list quite a while ago, and now someone has used that code as a start for writing a real CMake build system for LLVM (mine was very Unix-oriented and was just enough for using LLVM with my compiler front-end).

So currently I'm planning to do the following:
  1. Finish the typerefactor code.
  2. Create another branch, "llvm-experimental", based on the llvm branch, and merge the typerefactor code into it. It's possible that this will require some extra changes to the front-end code, and those will be merged into the typerefactor branch again.
  3. Upgrade the one of the llvm branches to LLVM 2.4. What branch I will use will depend on how long it takes to finish item 2 and on how difficult the build system transition will be.
  4. When item 2 is done, merge the typerefactor branch into the main branch.
  5. When items 2 and 3 are done, merge the llvm-experimental branch into the llvm branch.
  6. Implement codegen for everything the front-end currently supports (in the llvm branch).
  7. Merge the llvm branch into the main branch.
This is what I have in mind for the future, but it is not guaranteed that development will truly follow this roadmap. And it doesn't account for any front-end-only changes that I might do in the mean time.

Some things that will need to be done to the front-end:
  • Add support for 'references' (or restricted pointers)
  • Create a Unicode interface for the "string" type (no more random access)
  • Change the handling of operator overloading for binary operators (no more global list)
Others things are on my possible TO-DO list as well, such as adding support for "pure" procedures, but that feature will need to be thought out (and published here) first.

Friday, September 19, 2008

Launchpad

As the title of this posting says, I have registered a project on Launchpad for my programming language Hyper:

https://launchpad.net/hyper

The main website for the language is still here:

http://users.edpnet.be/hyperquantum/hyper/

Why Launchpad instead of, say, Sourceforge? Because Launchpad is one of the few websites that has support for the Bazaar version control system. So I can easily upload the code to Launchpad with a simple "bzr push", and anyone can easily branch from it and make his own changes. And it's always nice to have an extra backup of my code :)

The project will probably slow down in the near future, as I have graduated now (masters degree, computer science) and am looking for a job.

Sunday, June 15, 2008

About the 0.4.0 release and the back end

The compiler had a new release last month, with some big improvements. So the version number was not increased to 0.3.39, but to 0.4.0.

I did a major internal restructuring of the compiler's semantic checking, as I wrote earlier. These changes allow me to detect circular dependencies between classes, and fix some nasty bugs. I no longer keep a separate symbol table; the symbol names and symbol lookup are now embedded in the AST data structures directly.

And the compiler now finally has full Unicode support! Source files are assumed to be in UTF-8 format now, because only that format is supported at this time. It also has the advantage that it's a superset of ASCII, so you can open a source file with a dumb text editor that doesn't know about Unicode and still have readable source, except for some special characters that aren't displayed correctly because they're not ASCII. Other formats like UTF-16 are on my TO-DO list, but do not have a very high priority. And though the compiler now has Unicode support, the language still needs to be adapted partially to it. The string type, for instance, will need to have its interface changed because it will use UTF-8; the random access (array) operator needs to go, and a replacement mechanism must be provided to be able to iterate over the characters in the string.

The compiler with back end has gotten some of my attention as well. It has inherited the new features from the 0.4.0 release and can do LLVM code generation for most expressions and statements, involving only the built-in types. User-defined classes, strings, floating-point types, and the 'iterate' statement are not supported yet. I have added support for printing single characters to stdout, so I wouldn't have to wait for full string support until I could see simple test programs working. Next thing I would like to be compilable is the eight-queens program.

My efforts will be mostly on the compiler back end now. I would like to have complete code generation for what the front end currently supports, and then I can merge the llvm-branch to the main-branch so that the main release is no longer front-end only. But I guess I will need to implement a solution for the iterate problem first.

The compiler sources are getting pretty big now. The main branch now has almost 44000 lines of code, and the llvm branch has about 49000 lines (not counting the LLVM sources, of course). That's a simple count, including header files, blank lines and comments. It's still quite impressive to me, though. My biggest project ever :-)