Monday, February 02, 2009

Compiler development update

So here's an update on how things went after my previous post ("Compiler development roadmap").

After the "typerefactor" branch was more or less finished, I tried to merge it into (a copy of) the "llvm" branch. But the merge didn't work, so I decided to do things in a different way.

I created a completely new branch called "llvm-new", starting from the code of the "typerefactor" branch. Then I added the LLVM 2.4 sources to it, using the CMake build system from LLVM itself. To get the thing completely compiled and working I had to update to a SVN version of the LLVM code, though. And then I started writing my LLVM back end from scratch.

Wanting to avoid the mistakes of the first "llvm" branch, I started immediately on the implementation of a complete mechanism to manage types and to do type conversions. This means dealing with real class types, that are passed as a (this-)pointer, with primitive types that are passed as a value directly, passing parameters by reference or by value, passing values to "inout" parameters, referencing/dereferencing, indirectly returning values (using a extra hidden function parameter), etc... Eventually this implementation turned out pretty good, so it's no longer a PITA to write back end code like passing a value to a parameter.

Now I'm working on the implementation of codegen for all types (classes actually), expressions and statements. For most things implementation is easy, because I can look at the old implementation in the "llvm" branch and port that code to the new way of doing things. This new way is not just the typepassing/-conversion infrastructure; I've also changed the way I emit instructions. Previously I created the LLVM IR directly, but now I use the IRBuilder utility class from LLVM. And I now have a utility class that makes it a lot easier to emit basic blocks and branches to them.

So things are going forward slowly. I think that my "llvm-new" branch now has about 75% of the codegen functionality that was in the "llvm" branch. But things are still primitive; the compiler spits out lot of debug output followed by LLVM IR. You can try it if you want to, the code is available on the Launchpad project page:

So most of my time is currently spent on the "llvm-new" branch. But that doesn't mean that the other branches are dead, however. The "typerefactor" branch now acts as a merge bridge between "llvm-new" and "main" (the 'official' front end branch). All front end related changes in "llvm-new" are regularly merged into "typerefactor". And those improvements are then merged into the "main" branch. The reason for doing things this way is that I like to keep the differences between "llvm-new" and "typerefactor" (in their front end code, at least) as small as possible. And now I can do some more or less invasive changes in the "main" branch without worrying about breaking the back end code.

As you might have noticed, progress has been going rather slowly these days. That's because I found a job as a software developer, using .NET (C# and VB). But at home it's still only Linux (Gentoo) for me. I'd prefer not to depend on Microsoft personally, but I need something to pay the bills of course.

No comments: