Saturday, September 30, 2006

Restricted pointers and heap classes

Ik think I have found a solution for the pointer issues I described in my last writing. And this solution is called 'restricted pointers'. The idea is introducing a new kind of pointer type. The syntax is not fixed yet, but I think I will use an ampersand ('&') as it resembles the 'reference' from C++.

The 'normal' pointer type that Hyper already has, resembles the pointers from C++; they can point to things just anywhere. The new restricted pointer type will be primarily used to point to stack objects. Such a restricted pointer should be impossible to copy outside of the stack that carries the object it points to. So you will not be able to use a restricted pointer in a class field, or pass it to code on another thread. Also, a restricted pointer will be guaranteed to be not null. If you know C++ you will see that this looks a lot similar to C++ references. But C++ references are references and restricted pointers are not references; variables of a restricted pointer type will be changeable to point to another thing, and that is not possible in C++ with references.

So whats the use of this new pointer type? It will be used to guard things on the stack. You will get a restricted pointer to a stack variable and you will not be able to carry that pointer outside of the lifespan of that stack variable. If the compiler auto-references a value on the stack (or a literal) then it will have a restricted pointer type:

var x : & const byte = 29 # 'x' is a restricted pointer

Of course a normal pointer will be convertable to a restricted pointer, it will just restrict its use of the thing pointed to (it will not be storable outside the stack). But this conversion will require the insertion of a runtime check for null, because normal pointers can be null and restricted ones cannot. And a conversion from restricted to normal pointer will not be allowed.

Another protection measure we have to take is to forbid procedures to return just any restricted pointer. After all, allowing that would allow a procedure to return a pointer to a variable in that procedure and that ceases to exist. But returning a restricted pointer that was retrieved as a parameter for the procedure will be allowed. Another important issue is the this pointer. It will no longer have a pointer type, but a restricted pointer type. The reason for this is that a procedure cannot know at compile time (for all uses) if the object it belongs to is on the stack or not. Examples:
class C
public:
# some code here ...

procedure p1() : * C
return this # ERROR: this object could be on the stack
end procedure

static procedure p2() : & C
var c : C
return c # ERROR: c is on the stack and will be destroyed
end procedure

static procedure p3(c : & C) : & C
return c # OK : c still exists when pointer is returned
end

end class
The fact that a class cannot use its this pointer anymore in any way it would want could sometimes be very annoying. This inspired me to create 'heap classes'. These classes are declared to be always on the heap and never on the stack. This could be useful for some types with a long lifetime, like a GUI widget or a ServerSocket. For these classes the this pointer would be of a normal pointer type, instead of a restricted type, because such classes can never be on the stack anyway. The compiler also would not auto-generate an assignment operator, because a copy-by-value operation will in general be unneeded or unwanted for heap objects. A possible syntax for declaring a heap class would be the addition of an asterisk to indicate the use of a normal pointer for this. Example:
class * Widget
# this is a heap class
end

So this is my proposed solution for the pointer issues. It would be a simple solution for the copy-constructor-called-with-null-argument problem: copy constructors will use as argument type a restricted pointer from now on. The compiler will prevent a null value being assigned to a restricted pointer. And restricted pointers will also be useful for memory management: a procedure that takes a restricted pointer actually guarantees that it will not store the pointer on some place. Instead, it will probably use the info on that object (or copy the object entirely) and then discard the pointer. This will avoid an unnecessary copy for safety purposes.

Comments are of course welcome.

Friday, September 22, 2006

Types and pointer issues

I will write a bit about type issues in this article (can I actually call this writing an article?) . I will provide a brief introduction on types here, but for a detailed reference on types in Hyper see this page.

Hyper has 3 kinds of types: class types, pointers and arrays. A class type is a user-defined class or a built-in primitive type, like int. A pointer type is what it says, it points to another type (that cannot be another pointer btw). An array type is also obvious to anyone who has programmed before. Types are written from left to right, for simplicity. A pointer type is written as an asterisk followed by the type it points to. The notion 'points to nothing' is represented by the simple expression (and keyword) null. A class type is written by using its name. And arrays are written as [size] followed by the content type. An array of an array is allowed, but the two arrays are merged together into a true multidimensional array. Every array can be used in the same way as a class type. For example, every array has a member size that returns the size of the first or another dimension. Types also support the notion of being constant. An array itself cannot be declared as being constant, but you declare its base type as const instead.

Some examples of variable declarations:
var a : int # variable a has type int
var b : * int = null # b has type pointer to int
var c : * const int # c is pointer to constant int
The character # starts a comment by the way.

Arrays are not further discussed here, I will now focus on pointer types. Hyper does not have the same pointer system as C++. It also doesn't have reference types. The difference lies in how pointer types are used in expressions and statements. There are no reference (&) and dereference (*) operators. Referencing and dereferencing is done automatically by the compiler. This requires separate pointer operators. For example, normal (non-pointer) assignment is done with '=', and pointer assignment is done with '$='. Dereferencing is never a problem, but I am still wondering when and how I will disallow some automatic referencing. Like using a literal number for a pointer to a number, or using a number returned from a function for this.

var x : * const byte = 12 # disallow??
var y : * const double = someFunc() # disallow??

Another issue: Hyper does not have reference types, because referencing/dereferencing happens automatically anyway. So copy constructors use a parameter that has a pointer type. But what if that pointer is null? Since any pointer parameter can be null, the following is currently allowed:

var i : int = null # currently allowed :-S

The copy constructor of class int is called for this variable with a pointer that points to nothing. Run-time failure guaranteed. Of course it was not my intention to allow this! So I need to find a way to forbid it. I will think about it, and in future writings I will suggest some possible solutions.

Saturday, September 16, 2006

Compiler issues

Since February, 2004 I am working on a compiler for Hyper. It's difficult and lots of work. But if I ever want a usable language I will need one. And of course no one else will write it for me.

This language I am using to write the compiler in is C++. C++ is one of the languages I like most, and writing the compiler in Hyper is unfortunately not possible. It would be great if it was, but it isn't. The bootstrapping problem, if you know what that is. You cannot write the very first compiler for language X in language X. Unless, of course, you already have an interpreter for language X. But for Hyper that's not the case.

The compiler is already open-source, under the GPL license. It is not yet publicly downloadable yet though; for now it is only available to fellow students and people I know personally. Making it available on my website will happen in the near future. The single license GPL for all will have to change; I will need the LGPL license for the runtime library, and some other not very restrictive license (maybe the BSD license?) for the class library. The GPL for the compiler is OK, any modified version of the compiler (or anything derived from it) will also need to be released under the GPL. The runtime library needs the LGPL because it will need to be linkable to proprietary programs. And the class library needs a not very restrictive license, to allow proprietary classes to be derived from classes in the library. And because inheritance counts as 'making a derivative work', the GPL and LGPL are not an option here. (Someone correct me if I am wrong about this)

The compiler for Hyper is still only a front end. So the compiler only checks its input file(s) for errors, but does not yet generate an executable for valid input source files. The task of writing a back end for code generation still lies ahead of me. I will not write my own code generators for all machine architectures that exist. This leaves me two options: (1) let the compiler generate source code for another language, most likely C++. (2) use a library or source from another project for code generation. Of these two, I have a slight preference for the second option. Again, two options exist for this. The first is writing my compiler as yet-another front end for GCC. Another option is to use LLVM, an open-source compiler infrastructure. I strongly prefer LLVM. An important reason is that LLVM is also written in C++ (as is the front end of my compiler), but GCC is written in C. I am more familiar with C++ than I am with C, and I consider C++ a better choice. Also, in my opinion, the GCC source code is difficult to understand, and contains lots of macro's.

Another issue I will have to deal with: some time ago I switched to CMake for building the compiler. LLVM uses the GNU autotools for building. I will need to find a way to let those two build systems cooperate.

The compiler development is progressing well. The compiler accepts a simple subset of the language (so without inheritance, interfaces, generic programming, etc.) and it already does most of the semantic checking that needs to be done. But this does not come without some complexity; I do a regular line count on the compiler sources, and today the number of lines code (headers + implementation) exceeded 30,000. It surprises even me that I already have such a large codebase. Some refactoring could make the number drop a bit, but many language features will need to be added and you can expect the number to rise even more.

Saturday, September 09, 2006

Introduction to Hyper

Before I can talk about my programming language Hyper in depth, I first need to provide a background for readers. I will do that here.

First, why would I try to create my own programming language? A couple of reasons: because I was looking for features not present in existing languages, because I wanted to 'fix' certain issues from other languages and last but not least for the fun of it.

As explained in the 'introduction' on my website (here) I started early in the year 2004 implementing a compiler for the language I wanted to create. Back then, I actually had not really thought out the language yet so I based it largely on C++ except for its C style syntax. I already had ideas for the basic types and some statements, but I didn't have a type system yet so I got stuck for a while. After some time I figured out to use semi-implicit pointers (not reference types as they are in Java). Since then I have made a lot of progress, and the basics for the language are there.

Why the name "Hyper"? Well, back in 2004 I was thinking for a cool name to give to the language and a fellow student proposed the name "Hyper". I was ok with it, so Hyper it was. So the name has no actual meaning or relation to the language specification. The only disadvantage I see is the resemblance of Hyper with the full name of HTML: HyperText Markup Language.

This writing is going to get rather long now so I'm going for a short list of language properties and features: The language is statically typed and object-oriented like Java. The type system uses semi-implicit pointers, which means that pointers are declared explicitly but are implicitly used in expressions (no reference/dereference operations). Types are written from left to right. Arrays are real distinct types (not simple pointers) that remember their length. All types implicitly inherit from the built-in 'object' type. Built-in primitive types are real objects; they are derived from 'object' and they (can) have member functions. The language is a compiled one, like C++. An important aim is to provide an easy way for programmers to create platform independent programs. My buzz-word for this is 'write-once-compile-anywhere'. The language will provide a large standard library that provides many common needed things, together with platform abstraction mechanisms. If it is possible I would like to also provide a GUI library in it. Graphical user interfaces are needed for most programs, so having this would be great.

I am currently writing a compiler for the language part I have already figured out. The compiler is written in C++ and currently only does checkings on its Hyper input file; it doesn't generate executables yet. The compiler comes with a simple tool to convert a Hyper source file into a HTML file with syntax highlighting. It is already open-source (GPL license) but it isn't available yet for the general public. I will wait a bit more to make it public; I use a limited testing audience for now to get the first feedback. The public release (still in alpha stage of course) will come later.

As I wrote in previous messages, you can always get more info about Hyper on my website. The website already provides a limited amount of documentation on the current state of the language. See you next time.

Tuesday, September 05, 2006

Change of purpose

The purpose of this blog is now changed. I originally created it for my master's thesis, but I changed my mind. I've created another blog on a more private place for that. And anyone that should know the new location should be informed.

Anyway, I am going to use this blog now for other things. The most important thing to write about will be my programming language called Hyper. I already have some documentation about it on my website (http://users.telenet.be/hyperquantum). So in this blog I will probably write about the new language features I'm still figuring out. I used to do that on a Google Group, but that group no longer exists, and it had a very limited audience, because the articles were written in Dutch. I will soon start to write an introduction to the language here. More to come!