Wednesday, December 27, 2006

progress...

It's been a while since I wrote something the last time. But I have made progress since then.

First of all, the Hello world example has changed again. Here is the new (and hopefully final) version:
import system.stdio

class Hello
static procedure main()
system.Out.line("Hello, World!")
end
end
The 'StdIO' class has been replaced by 'Out', which will provide simple console output. The 'printLn' procedures have been renamed to 'line'.

Implementation of the compiler is now much further. The compiler on SVN trunk now supports the new namespace system, and supports importing 'system.stdio'. This means that all sample programs in the directory tests/programs now compile successfully. A new fun feature is that the compiler detects TODO and FIXME in comments and emits warnings for them. Of course new compiler options are provided to turn these warnings off, but they are enabled by default.

The compiler has been restructured internally as well. I have completed 3 major refactorings. But this is not the end of the road, much other improvements will be done in the future.

There is a good chance that the next release of the compiler will have version number 0.4.0 because of all the improvements that have been done. The milestone for 0.4.0 will then be "front-end mature enough to start implementing the back-end".

I have also created a new SVN branch where I will start to work on the compiler back-end. As said earlier I will use LLVM for this. I have imported LLVM 1.9 into the branch. The first thing to do is to get LLVM compiled with CMake, as the LLVM developers use GNU autotools to build it. But I don't, I use CMake for the front-end. Then I will try to get the Hello World program compiled with it.

Today I have written a bunch of docs again. You can find class references for the built-in types on the website.

I currently use Subversion, but I am thinking about switching to Bazaar.

Thursday, December 14, 2006

Hyper compiler 0.3.31 released

I have released a new version of the compiler. This release is fairly bigger than the previous one. It contains more new features and has undergone internal structure improvements.

The compiler now checks for the presence of return statements in procedures that return something. Another very important new feature is public/private access checking. Some small things are not checked yet, like for example the usage of private conversion constructors for passing arguments to a procedure. And const checking is now complete (at least to my knowledge), which means that the compiler does additional checks for const procedures and calling procedures on a const object.

Some things that are not listed in the changelog: a new test program was added, the Hello World test program. This already uses the new import and namespace semantics so the compiler currently rejects it. And the compiler now allows import directives but currently ignores them.

Monday, December 11, 2006

dynamic arrays and array sizes

Arrays are supported for some time, but dynamically creating an array wasn't possible yet. Time to change that. The syntax is simple:
procedure xxxx(a & b : nat)
var x : * [5] int = new [5] int()
var y : * [][] real = new [a][b] real
end
As you can see, the syntax is "new" followed by the type of the array and an optional empty pair of parentheses. Dynamic arrays initialize there elements with their default value. The size of the new array must be fully specified (but for its elements this is not required):
 new [] int      # illegal
new [][10] int # illegal
new [10] * [] int # allowed
This brings us to the compatibility of array sizes. When pointing an array variable to an array, the sizes that ARE specified must be evaluatable at compile time and be equal. But you don't HAVE to specify the sizes of course. Open arrays accept any size, this means no size specified or a size that isn't known at compile time.
const globalC : nat = 17  # constant field

procedure test(i : * [] int, j : * [17] int, n : nat)
var a : * [9] int = j # ERROR: 9 != 17
var b : * [17] int = j # OK
var c : * [17] int = i # ERROR: unknown size of i
var d : * [globalC] int = j # OK, globalC = 17
var e : * [n] = j # ERROR : value of n unknown
var f : * [] int = i # OK
var g : * [] int = j # OK
end
For now you cannot specify an initializer for an array. That's why you don't specify arguments between the parentheses when creating a dynamic array. And that's why an array variable or field can't have an initializer part.

Monday, December 04, 2006

sourcefiles and namespaces again

I was a little brief on my previous post about sourcefiles, namespaces and imports. I'll try to explain it a bit more here. So here's an example of multiple sourcefiles working together.

# File "someDir/MyApp/GUI/mainwindow.hyp"
namespace MyApp.GUI

class MainWindow
# (...)
end


# File "someDir/MyApp/Data/store.hyp"
namespace MyApp.Data

class DBStorage
# (...)
end


# File "someDir/MyApp/Core/main.hyp"
namespace MyApp.Core

import MyApp.GUI.mainwindow
import MyApp.Data.store

static class Main
static procedure main()
# main program's entry point
var dbs : MyApp.Data.DBStorage
dbs.open()
var win : MyApp.GUI.MainWindow
win.show()
# (...)
end
end
I hope this clears it up. Every file is in a namespace. When you import a file you need to specify the namespace AND the name of the file you want to use. The classes inside a file are in the namespace of that file, so that's why the "main()" code uses the full names like "MyApp.Data.DBStorage" instead of just "DBStorage". To get rid of the long names I will support 'using' declarations (in fact aliases), but that's for later.

Something we don't support now is having public/private members in a sourcefile. The Main class of the example above does not have to be public. For now all direct source file members are simply public.

A program often needs to use libraries outside of its own codebase. Therefore Hyper will support some variation of the 'class path' concept from Java but with a different name. I suggest somethink like 'codebase path' or 'code path'. The default 'code path' will be empty and this means the compiler will only look at the sources you are compiling now (including the imported files from the same codebase). How about closed source libraries? I am thinking to use a concept similar to D's interface files (see the end of this page). This means having a second type of sourcefile that only contains the interface parts of each class (the procedure headers etc...).

Off-topic:
* I am tempted to have the compiler generate C++. This would be somewhat easier than using LLVM, but it would require an extra compilation to get a working program. It could be an acceptable temporary solution.
* The next compiler release will support public/private access checking, full 'const' checking and maybe also 'static' checking. But I am unsure about going for a small or a major release. A major release take much longer to be released, but allows for large internal improvements in the compiler. A minor release would be version 0.3.31, and a major release would be 0.4.0.
* Restricted pointers will definately be part of the language. I just don't know yet when to start implementing it and whether or not to wait after the next major release (0.4.0). I would like to have a better name for them. "References" is a candidate, but it could be too confusing for C++ users that don't know yet what they really are, since they differ a lot from the 'references' from C++.
* Strict in/inout parameters will probably be added when restricted pointers are already implemented.

Sunday, December 03, 2006

importing other sourcefiles

I think I have finally found a way to have multiple sourcefiles working together. I have based it mostly on the packages system from Java, but I don't call it packages anymore. I have decided to keep the 'namespace' keyword for this purpose. In Java each file that is not in the default package is in some specified package, in Hyper each file is in some namespace. This means that namespaces are no longer declared in blocks like classes are, but they are declared in one line on top of the sourcefile. It will also be possible to have a sourcefile that is not in a namespace; this will be useful for one-file test programs. More about that later. Each sourcefile is in a directory structure that corresponds to the namespace of that file (such as for packages in Java). So a file in namespace "Foo.Bar.Baz" could be named "Foo/Bar/Baz/filename.hyp". A file can import other files by specifying the namespace and name of that file, without the extension. So this will be something like:

namespace Abc.Defg.Stuv.Xy
import Foo.Bar.Baz.filename

There are public and private imports, and an import is private by default. If file 1 is publicly imported in file 2, then any file that imports file 2 will have file 1 imported with it. This is not the case if file 1 is privately imported in file 2. For private imports the compiler will have to check that there are no things from the private import exposed to the outside.

Imports are allowed to be circular; this means that file 1 can import file 2 while file 2 also imports file 1. Such things are of course to be used as little as possible. Disallowing circularity is not feasible because these things are not always avoidable, and the language currently does not allow for forward declarations as C++ does.

A sourcefile that is not in a namespace will not be able to import things from other sourcefiles but only from the standard libraries (system.****). And it cannot be imported by any other sourcefiles. This is to minimize its usage. Files not in a namespace are not in some 'default' namespace as Java does it, but aren't in any namespace at all. So there would be no relation to the directory such a file resides in.

The standard library will use the 'system' keyword as the root namespace. Standard input/output will be available with "import system.stdio". (I think I will use the convention of using a lowercase identifier for the name of a sourcefile) This file contains a static class "StdIO" with a number of procedures for stdout printing. There are "print" procedures for literal printing and "printLn" procedures for printing with an additional newline. This would make the "Hello World"-example look like this:

import system.stdio

class Hello
static procedure main()
system.StdIO.printLn("Hello, World!")
end
end

It sure looks better than the current version.

Tuesday, November 28, 2006

scopes and RAII

Classes in Hyper cannot have a destructor. I wanted it to be this way because the language uses a garbage collector, and then the execution of a destructor cannot be always guaranteed. So destructors would not be reliable anyway if you want it to release acquired resources. And without destructors there is currently no way to do RAII. This is not acceptable in my opinion. So we need at least one way to do it.

This is my proposition. I would like to introduce a new block statement, "scope". This would take care of resource acquisition and disposal. You would give it an object that support 2 methods: "enterScope()" and "leaveScope()". The scope block would call the first method upon entering the scope, and would make sure that the second one is called whenever execution leaves the scope (i.e. normal exit at the end, plus exceptions and jump statements like "return", "break", etc.). The first form of "scope" would support the declaration of a variable, with or without an initializer. But I think it would also be useful to have an anonymous variable (i.e. to give it no name) and/or to have no variable at all and simply use a temporary object on the stack.

Examples:

procedure test(x : * Xyz)
# indented output
scope i : Indenter = x.getIndenter() # increase indentation
i.printLn("Hello world.")
# ...
end # get indentation back to the previous level

scope this.getMutex() # lock mutex
# ...
end # unlock mutex
end test

The first example uses an explicit variable and the second uses no variable at all. In the second example the mutex returned by "getMutex()" is used as the scope object. In this case the result of the scope expression is probably a pointer (or a reference) to a field, but it does not necessarily has to be a pointer.

This reminds me about how to treat temporaries. I suggest to let them exist until the function they appear in returns. This would be ideal for use with restricted pointers. Any temporary could then be pointed to by a restricted pointer. And those could be stored in variables, so it would be necessary to keep the temporaries alive until the function returns.

Monday, November 27, 2006

operations on numeric types

First a bit of info about the built-in numeric types, in case you never saw them before or in case you have forgotten. The floating point types: 'single', 'double' and 'real' (not the subject of this article). The integral types: 'byte', 'nat16', 'nat32', 'nat64', 'int16', 'int32', 'int64' and 'int'. A 'byte' is unsigned and 8 bits. The 'int*' types are signed integer types (specified size or native int type), 'nat*' are unsigned integer types (again, of the specified size or else the native size). The native ones, 'int' and 'nat' are equal in size and have the size of the target platform (32 or 64 bit).

I am still somewhat unsure about the operations on those types. For example: what type should the result of a unary minus on a 'nat16' be? I would say 'int32', because the result can require 17 bits and that would not fit into a 'int16'. And I want to avoid unexpected overflows as much as I can. But this leaves me with some unpleasant consequences. I have no type to use for the unary minus on a 'nat64'. And neither I have for the native 'nat', because it would require one more bit than the 'int' has. So at this time there is no unary minus available for those two types. This problem of course doesn't exist for the 'int*' types; for example the unary minus operation on a 'int16' returns an 'int16' again.

I propose the following 'solution': I give the 'nat*' types a member function 'truncateToSigned', or something like that, which discards the most significant bit and then returns a signed type of the same size as the original. Like this:

class nat16
public:
# (...)
const procedure truncateToSigned() : int16
# truncate and convert to signed
end
end

This allows the programmer to do what he/she wants, but allows for data loss. But at least the 'corruption' is visible by looking at the name of the function!

I already provided the 'int*' types with a function to turn the sign. This is a way to write assignments like 'delta = -delta' like this: 'delta.turnSign' (will be available in the next compiler version). This function doesn't return a result and changes the value of the object itself. (So it is different from the unary minus, which does not change its object and instead returns a value.) I wasn't completely sure about the name, maybe I could have used something like 'negate' instead but I am not sure if that means what's intended (I am not a native English speaker).

I am thinking about a signed byte type. At this type the literal -1 has type int16. That looks like a bit of a waste for such a small number. So it would maybe be nice to add a type (e.g. 'sbyte') for unsigned 8-bit values. Then I could add to 'byte' also a member 'truncateToSigned' to return a signed equivalent. But to avoid loss the unary minus of 'byte' would still return an 'int16'.

Another proposition. We now have a way to make a signed from an unsigned but not the other way around. So I would like to add a member 'abs' (= absolute value) that gives the unsigned value. I don't really like 'abs' because it looks to short. But on the other hand, something like 'absoluteValue' looks too long. Suggestions and argumentations for a name are of course welcome :-)


P.S.: I converted my blog to the new Blogger Beta, and it looks like the RSS of the old articles is a bit messed up now

Sunday, October 29, 2006

Hyper compiler 0.3.30 released

I've just released a new version of the compiler for Hyper. You can download it from here.

A short list of changes in this release:
  • highlighter now uses the same style as the Hyper website
  • copy constructors are now checked and auto-generated if needed
  • 'inout' is now used for parameters that used 'var'
  • the 'this' keyword is now supported in expressions
  • access specifiers in a namespace are no longer supported
And next to these changes, the compiler has been restructured internally a bit as well.

Friday, October 27, 2006

begin specification

Something that is not documented yet on the website is the begin directive. It appears on top of the source file, to indicate the entry point for that file if it is compiled as an executable. You specify the class that should be 'started', the class that contains the static procedure called `main'. This procedure will be called when the executable is run.

It's use was mandatory, but I am now making it optional in one case: when the file contains only one class. Because in that case the user's intention is obvious. This simplifies the Hello World example:
namespace Example
class Hello
static procedure main()
system.out.printLn("Hello, World!")
end
end
end
This Hello World program will change even more later. Namespaces will be removed and some module system, or a system like Java's packages will be adopted. And the function (or 'procedure') to output text to the console will probably be renamed.

Wednesday, October 18, 2006

Concatenation & pass by reference

I am wondering how I will allow strings to be concatenated. The string type currently uses the + operator for this. But I am thinking whether or not to introduce an operator especially for this purpose. I suggest the ~ (tilde) as an operator (as it's not used yet and D also uses it for concatenation). An accompanying ~= operator can be introduced for concatenating to the back of an existing string.
var s : string = "How" ~ " are " ~ "you"
# s = "How are you"
s ~= " today?"
# s = "How are you today?"
This would make an interesting syntax possible for output, like C++ has the << operator. You could have an output stream that has the operators ~ and ~=. This can make the following possible:
procedure writePoint(inout s : FileOutputStream, x & y & z : real)
s ~= s ~ '(' ~ x ~ ", " ~ y ~ ", " ~ z ~ ')'
end
It's not ideal because s is still used twice. But maybe it's a start :-)

I am also thinking about pass by value or by reference for 'in' parameters. As I wrote previously, 'in' parameters cannot be changed at all by the callee. So in my opinion it would be a nice idea to use pass by reference for this. It would make 'in' parameters fit in more, because 'inout' parameters are also passed by reference and 'out' parameters use a reference to write the result directly to the location of the caller.

Sunday, October 08, 2006

Compiler goes public!

Yesterday I put the compiler sources on my website. The compiler is finally available to everyone who wants to take a look. You can download the sources from here. The released version is 0.3.29, a release that already dates from mid-september, but was only available to some limited test audience. I got almost no feedback, so I figured I could make it already available on my website to have a larger potential of testers.

As said on the download page this is a development release so it probably has lots of bugs, and on top of that it doesn't support all 'official' language features yet. It also requires CMake to build. You will have to build it yourself from source because no precompiled binaries are available.
Another thing you need to know is that the compiler only accepts filenames in Unix format. So Windows users will have to use the compiler under Cygwin until Windows filenames are supported in some future release.

If you are a programmer, I hope you will try it and give some feedback! For information about the language, see the language reference on the website. The documentation you find there is not really complete at this time and is sometimes very brief but it will give you a start.

Thursday, October 05, 2006

Various new features

On the language and compiler status page, I still have some unimplemented features left that I did not explain here yet. I will explain as much as I can in this writing.

First 'initializer lists'. I will assume that you know initializer lists from C++. In hyper they are not very different. The only (semantic) difference is that Hyper allows fields to have an initializer in their declaration and C++ does not. In a constructor, to initialize a field the value in the initializer list is used if it's present. If it is not, then the initializer in the declaration of the field is used. No initializer there assumes the use of a default constructor to initialize the field. The syntax consists of lines starting with a colon, it resembles the way it looks in C++, although C++ requires it to be on one line and Hyper allows to spread it across multiple lines for readability. An example to illustrate all this (unfortunately this blog thing eliminates all indentation from my example):
class Test
private:
var a : string = "SSS"
var b : real = 9.81
var c : int
var d : bool
var e : char
var f : * byte

public:
procedure new()
: a("Cookie")
: c(16), f(new byte(13))
# Other code
end
end Test
In this example a gets the value "Cookie", b is initialized to 9.81, c becomes 16, d is initialized with its default constructor (which initializes it to false), f is initialized to a pointer to a byte with value 13, and e gives a (compile time) error because char doesn't have a default constructor (yet?).

Another feature: comparison operator chains. This allows the mathematical notation of comparison operators that are used together: 1 <= 20 = (19 + 1) < 75 etc. And this notation supports the meaning that is obvious to someone who has never programmed before; a = b = c is not the same as (a = b) = c, but is equivalent to a = b && b = c. The difference is that in a = b = c, all expressions are evaluated once (b is not evaluated twice as in a = b && b = c) and the order of evaluation is undefined. a is not necessarily evaluated before c, and the comparison of b and c is not necessarily after the comparison of a and b.

Hyper will also provide two kinds of enums: nominal and ordinal ones. For now only nominal ones are explained as I haven't really figured out a syntax yet for the other type. Enums are strongly typed in Hyper. They are not a named alias of constants of some numeric type. Nominal enums represent named, uncomparable values. They are in their own namespace so you need to explicitly refer to the enum values by the name of the enum. Enum values are just listed, separated by comma's or newlines. An example of an enum declaration:
enum Transportation
Car, Bus,
Airplane, Subway
Boat
end
An enum can be declared anywhere a class declaration is allowed. The first member is considered the default value for variable/field declarations. I am also looking for a different approach, that there is no default and that if you want one you will have to specify which one it is. Remember to use Transportation.Bus instead of just Bus, because explicit qualification is required.

That's it for this time. New features I still have to explain (later):
  • single inheritance
  • modules & imports (still requires some thinking)
  • copy procedures (not on the website yet)

Sunday, October 01, 2006

'in', 'inout' and 'out' parameters

This is an idea that I'm thinking about for some time. Hyper currently uses a parameter system that was borrowed from Oberon (I am not sure if its ancestors Modula-2 and Pascal already supported it) and Visual Basic. There are now 2 kinds of parameters: normal ones and variable parameters. Normal ones are passed by-value and variable ones are passed by reference. Now this would be a perfect parameter system if it was used for a language without pointers. An example from the Hyper docs on my website illustrates the problem:
procedure isCellEmpty(x & y : nat, m : * [ ] const * [ ] const * const int) : bool
return m[x, y] =$ null
end
So many const keywords just to make sure that no contents of the matrix are modified! Hyper needs a parameter to be declarable as 'input only'. Such a parameter would not need any const keywords, because for an input parameter it is obvious that no content can be modified:
procedure isCellEmpty(in x & y : nat, in m : * [ ] * [ ] * int) : bool
return m[x, y] =$ null
end
So a keyword in could be used for input parameters. The safest solution would make this behaviour the default for all parameters that don't specify another option. This would make the keyword in redundant. The other option would be an input/output parameter. It would pass by reference:
procedure makePositive(inout x : int)
if x < 0 then
x = -x
end if
end
And the const keyword can be used for data that is not supposed to be changeable:
# make sure that x points to the string to come first
procedure tinyAscendingSort(inout x & y : * const string)
if y < x then
var t : * const string = x
x =$ y
y =$ t
end if
end
Another useful feature would be 'output only' parameters. I am not sure whether or not to include them, because they would behave very differently from other parameter types. They would also use an implicit reference to pass through their changes. In my opinion an output parameter should be initialized by the function that assigns it a value. This requires a new statement, an 'out' statement. Such a statement assigns a value to an out parameter. Every return path of a procedure must have an out statement for each out parameter. Example:
procedure selectBestCandidate(c1 & c2 : * Candidate, out best : * const Candidate)
if c2.betterThan(c1) then
out best c2
else
out best c1
end if
end
Of course output parameters are best used for multiple output parameters, because otherwise you could just use the function's return value. Another example:
procedure getMinMaxAvg(i & j : int, out min & max : int, out average : real)
out min = (i < j) ? i : j
out max = (j < i) ? i : j
out average = (real(i) + real(j)) / 2
end
An out statement initializes its output parameter by using its constructor, so the specified value is like an intializer for a variable. We also need a special function call syntax for output parameters. This is because an output argument is not an expression but a variable declaration:
procedure p(x & y : int)
getMinMaxAvg(x, y, out smallest, out largest, out middle)
var diff : int = largest - smallest
end
The declaration of an output argument happens in an expression. To avoid dependencies on expression evaluation order, an output argument can olny be used after the entire expression (so that it isn't a subexpression of something else) is evaluated.

As said earlier, I am not really convinced about the current idea of output only parameters, but the input and input/output parameter ideas are good enough for me.

Saturday, September 30, 2006

Restricted pointers and heap classes

Ik think I have found a solution for the pointer issues I described in my last writing. And this solution is called 'restricted pointers'. The idea is introducing a new kind of pointer type. The syntax is not fixed yet, but I think I will use an ampersand ('&') as it resembles the 'reference' from C++.

The 'normal' pointer type that Hyper already has, resembles the pointers from C++; they can point to things just anywhere. The new restricted pointer type will be primarily used to point to stack objects. Such a restricted pointer should be impossible to copy outside of the stack that carries the object it points to. So you will not be able to use a restricted pointer in a class field, or pass it to code on another thread. Also, a restricted pointer will be guaranteed to be not null. If you know C++ you will see that this looks a lot similar to C++ references. But C++ references are references and restricted pointers are not references; variables of a restricted pointer type will be changeable to point to another thing, and that is not possible in C++ with references.

So whats the use of this new pointer type? It will be used to guard things on the stack. You will get a restricted pointer to a stack variable and you will not be able to carry that pointer outside of the lifespan of that stack variable. If the compiler auto-references a value on the stack (or a literal) then it will have a restricted pointer type:

var x : & const byte = 29 # 'x' is a restricted pointer

Of course a normal pointer will be convertable to a restricted pointer, it will just restrict its use of the thing pointed to (it will not be storable outside the stack). But this conversion will require the insertion of a runtime check for null, because normal pointers can be null and restricted ones cannot. And a conversion from restricted to normal pointer will not be allowed.

Another protection measure we have to take is to forbid procedures to return just any restricted pointer. After all, allowing that would allow a procedure to return a pointer to a variable in that procedure and that ceases to exist. But returning a restricted pointer that was retrieved as a parameter for the procedure will be allowed. Another important issue is the this pointer. It will no longer have a pointer type, but a restricted pointer type. The reason for this is that a procedure cannot know at compile time (for all uses) if the object it belongs to is on the stack or not. Examples:
class C
public:
# some code here ...

procedure p1() : * C
return this # ERROR: this object could be on the stack
end procedure

static procedure p2() : & C
var c : C
return c # ERROR: c is on the stack and will be destroyed
end procedure

static procedure p3(c : & C) : & C
return c # OK : c still exists when pointer is returned
end

end class
The fact that a class cannot use its this pointer anymore in any way it would want could sometimes be very annoying. This inspired me to create 'heap classes'. These classes are declared to be always on the heap and never on the stack. This could be useful for some types with a long lifetime, like a GUI widget or a ServerSocket. For these classes the this pointer would be of a normal pointer type, instead of a restricted type, because such classes can never be on the stack anyway. The compiler also would not auto-generate an assignment operator, because a copy-by-value operation will in general be unneeded or unwanted for heap objects. A possible syntax for declaring a heap class would be the addition of an asterisk to indicate the use of a normal pointer for this. Example:
class * Widget
# this is a heap class
end

So this is my proposed solution for the pointer issues. It would be a simple solution for the copy-constructor-called-with-null-argument problem: copy constructors will use as argument type a restricted pointer from now on. The compiler will prevent a null value being assigned to a restricted pointer. And restricted pointers will also be useful for memory management: a procedure that takes a restricted pointer actually guarantees that it will not store the pointer on some place. Instead, it will probably use the info on that object (or copy the object entirely) and then discard the pointer. This will avoid an unnecessary copy for safety purposes.

Comments are of course welcome.

Friday, September 22, 2006

Types and pointer issues

I will write a bit about type issues in this article (can I actually call this writing an article?) . I will provide a brief introduction on types here, but for a detailed reference on types in Hyper see this page.

Hyper has 3 kinds of types: class types, pointers and arrays. A class type is a user-defined class or a built-in primitive type, like int. A pointer type is what it says, it points to another type (that cannot be another pointer btw). An array type is also obvious to anyone who has programmed before. Types are written from left to right, for simplicity. A pointer type is written as an asterisk followed by the type it points to. The notion 'points to nothing' is represented by the simple expression (and keyword) null. A class type is written by using its name. And arrays are written as [size] followed by the content type. An array of an array is allowed, but the two arrays are merged together into a true multidimensional array. Every array can be used in the same way as a class type. For example, every array has a member size that returns the size of the first or another dimension. Types also support the notion of being constant. An array itself cannot be declared as being constant, but you declare its base type as const instead.

Some examples of variable declarations:
var a : int # variable a has type int
var b : * int = null # b has type pointer to int
var c : * const int # c is pointer to constant int
The character # starts a comment by the way.

Arrays are not further discussed here, I will now focus on pointer types. Hyper does not have the same pointer system as C++. It also doesn't have reference types. The difference lies in how pointer types are used in expressions and statements. There are no reference (&) and dereference (*) operators. Referencing and dereferencing is done automatically by the compiler. This requires separate pointer operators. For example, normal (non-pointer) assignment is done with '=', and pointer assignment is done with '$='. Dereferencing is never a problem, but I am still wondering when and how I will disallow some automatic referencing. Like using a literal number for a pointer to a number, or using a number returned from a function for this.

var x : * const byte = 12 # disallow??
var y : * const double = someFunc() # disallow??

Another issue: Hyper does not have reference types, because referencing/dereferencing happens automatically anyway. So copy constructors use a parameter that has a pointer type. But what if that pointer is null? Since any pointer parameter can be null, the following is currently allowed:

var i : int = null # currently allowed :-S

The copy constructor of class int is called for this variable with a pointer that points to nothing. Run-time failure guaranteed. Of course it was not my intention to allow this! So I need to find a way to forbid it. I will think about it, and in future writings I will suggest some possible solutions.

Saturday, September 16, 2006

Compiler issues

Since February, 2004 I am working on a compiler for Hyper. It's difficult and lots of work. But if I ever want a usable language I will need one. And of course no one else will write it for me.

This language I am using to write the compiler in is C++. C++ is one of the languages I like most, and writing the compiler in Hyper is unfortunately not possible. It would be great if it was, but it isn't. The bootstrapping problem, if you know what that is. You cannot write the very first compiler for language X in language X. Unless, of course, you already have an interpreter for language X. But for Hyper that's not the case.

The compiler is already open-source, under the GPL license. It is not yet publicly downloadable yet though; for now it is only available to fellow students and people I know personally. Making it available on my website will happen in the near future. The single license GPL for all will have to change; I will need the LGPL license for the runtime library, and some other not very restrictive license (maybe the BSD license?) for the class library. The GPL for the compiler is OK, any modified version of the compiler (or anything derived from it) will also need to be released under the GPL. The runtime library needs the LGPL because it will need to be linkable to proprietary programs. And the class library needs a not very restrictive license, to allow proprietary classes to be derived from classes in the library. And because inheritance counts as 'making a derivative work', the GPL and LGPL are not an option here. (Someone correct me if I am wrong about this)

The compiler for Hyper is still only a front end. So the compiler only checks its input file(s) for errors, but does not yet generate an executable for valid input source files. The task of writing a back end for code generation still lies ahead of me. I will not write my own code generators for all machine architectures that exist. This leaves me two options: (1) let the compiler generate source code for another language, most likely C++. (2) use a library or source from another project for code generation. Of these two, I have a slight preference for the second option. Again, two options exist for this. The first is writing my compiler as yet-another front end for GCC. Another option is to use LLVM, an open-source compiler infrastructure. I strongly prefer LLVM. An important reason is that LLVM is also written in C++ (as is the front end of my compiler), but GCC is written in C. I am more familiar with C++ than I am with C, and I consider C++ a better choice. Also, in my opinion, the GCC source code is difficult to understand, and contains lots of macro's.

Another issue I will have to deal with: some time ago I switched to CMake for building the compiler. LLVM uses the GNU autotools for building. I will need to find a way to let those two build systems cooperate.

The compiler development is progressing well. The compiler accepts a simple subset of the language (so without inheritance, interfaces, generic programming, etc.) and it already does most of the semantic checking that needs to be done. But this does not come without some complexity; I do a regular line count on the compiler sources, and today the number of lines code (headers + implementation) exceeded 30,000. It surprises even me that I already have such a large codebase. Some refactoring could make the number drop a bit, but many language features will need to be added and you can expect the number to rise even more.

Saturday, September 09, 2006

Introduction to Hyper

Before I can talk about my programming language Hyper in depth, I first need to provide a background for readers. I will do that here.

First, why would I try to create my own programming language? A couple of reasons: because I was looking for features not present in existing languages, because I wanted to 'fix' certain issues from other languages and last but not least for the fun of it.

As explained in the 'introduction' on my website (here) I started early in the year 2004 implementing a compiler for the language I wanted to create. Back then, I actually had not really thought out the language yet so I based it largely on C++ except for its C style syntax. I already had ideas for the basic types and some statements, but I didn't have a type system yet so I got stuck for a while. After some time I figured out to use semi-implicit pointers (not reference types as they are in Java). Since then I have made a lot of progress, and the basics for the language are there.

Why the name "Hyper"? Well, back in 2004 I was thinking for a cool name to give to the language and a fellow student proposed the name "Hyper". I was ok with it, so Hyper it was. So the name has no actual meaning or relation to the language specification. The only disadvantage I see is the resemblance of Hyper with the full name of HTML: HyperText Markup Language.

This writing is going to get rather long now so I'm going for a short list of language properties and features: The language is statically typed and object-oriented like Java. The type system uses semi-implicit pointers, which means that pointers are declared explicitly but are implicitly used in expressions (no reference/dereference operations). Types are written from left to right. Arrays are real distinct types (not simple pointers) that remember their length. All types implicitly inherit from the built-in 'object' type. Built-in primitive types are real objects; they are derived from 'object' and they (can) have member functions. The language is a compiled one, like C++. An important aim is to provide an easy way for programmers to create platform independent programs. My buzz-word for this is 'write-once-compile-anywhere'. The language will provide a large standard library that provides many common needed things, together with platform abstraction mechanisms. If it is possible I would like to also provide a GUI library in it. Graphical user interfaces are needed for most programs, so having this would be great.

I am currently writing a compiler for the language part I have already figured out. The compiler is written in C++ and currently only does checkings on its Hyper input file; it doesn't generate executables yet. The compiler comes with a simple tool to convert a Hyper source file into a HTML file with syntax highlighting. It is already open-source (GPL license) but it isn't available yet for the general public. I will wait a bit more to make it public; I use a limited testing audience for now to get the first feedback. The public release (still in alpha stage of course) will come later.

As I wrote in previous messages, you can always get more info about Hyper on my website. The website already provides a limited amount of documentation on the current state of the language. See you next time.

Tuesday, September 05, 2006

Change of purpose

The purpose of this blog is now changed. I originally created it for my master's thesis, but I changed my mind. I've created another blog on a more private place for that. And anyone that should know the new location should be informed.

Anyway, I am going to use this blog now for other things. The most important thing to write about will be my programming language called Hyper. I already have some documentation about it on my website (http://users.telenet.be/hyperquantum). So in this blog I will probably write about the new language features I'm still figuring out. I used to do that on a Google Group, but that group no longer exists, and it had a very limited audience, because the articles were written in Dutch. I will soon start to write an introduction to the language here. More to come!

Friday, June 09, 2006

Hello world

Hello everyone.

I'm a computer science student from the University of Antwerp (Belgium) and I will be using this blog for my masters thesis next year. My main interests are programming and compilers. The thesis will be about monitoring the university's wireless network.

In case you're interested, my personal website is : http://users.telenet.be/hyperquantum
It describes the programming language I'm creating in my free time.