What Gont is meant to be?
=========================

Largly imperative composition of C-like lexical layer, control structures
and speed with ML's typesystem, functions as first class citizens,
and safty. Plus possibly few more things, like objects.


Delta
=====

What is Gont diffrent from C? C is reffered to as a C89 ANSI standard
(see K&R's The C Language, second edition). Notes about implementation
are given in [[]], grep '\[\[' README can serve as TODO list.

General, conceptual, diffrences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

*  Gont is safe. This means that compiler shouldn't produce any code,
   execution of which would cause SEGV.  Parse: it should not be possible
   to overflow buffers, mismatch types of arguments in calls and so on.
   However note, that SEGV can be abtained from playing with extern_c
   declaration, compiler has no means of validating.
   
*  Gont is much higher level. It generally operates at much higher level 
   of abstraction.

   Warning, evangelisation follows.
   
   C is, in principle, assembly language, therefore it is almost as fast
   as real assembly, but is still easy to write and quite portable. It
   is great for programming things like kernels, libraries that need
   high speed operation (like MPEG decoding/encoding), generally most of
   things that need to be real time. However it is very easy to cut your
   finger with such a sharp knife. Typesystem provided by C isn't very
   powerfull, most of people familiar with functional programming, would
   say that typesystem of C is no more then a toy. Similary C doesn't
   provide much support for memory managment, exceptions, functional and
   objective programming.  Of course, lack of restricion from language
   about how you organise your memory managment, objects, and error
   recovery, can be advantage, especially with respect to operating
   system programming and so on. However in most cases, writing xmalloc()
   100th time, is annoying.  I think that's the main reason why Java is
   so popular -- it takes burden of thinking at the very low level away
   from the programmer, and also it is much harder to make a mistake
   (at the cost of more typing, and performance regression).

   Evangelisation ends.

*  Gont provides transparent memory managment, along with garbage
   collection (Boehm conservative garbage collector). It generally means
   you can allocate as much as you want, forget the pointers, and the Gont
   runtime system with get rid of unused memory.  [[not yet, but this is
   very simple]]

*  Gont is not compatibile with C (this is diffrence more against
   C++ then C). It is simply impossible to maintain such compatibility
   in safe language. What a braindamage might result, one may easy tell
   from the C++ example.

Concrete diffrences
~~~~~~~~~~~~~~~~~~~

*  There is no empty instruction (written as ; in C). skip keyword is
   introduced instead. For example:
   
   C:
   	while (f(b++));
   
   Gont:
   	while (f(b++))
		skip;

   This is to disallow common C's mistake:

	while (cond);
		do_sth;
   
   (note ';' after while).
   
*  Control structures. 

   if COND STMT
   if COND BLOCK else BLOCK
   while COND STMT
   do STMT while COND
   
   COND is expression of type bool. (therfore int i=10; while (i--) f(); is
   not allowed, one needs to write while (i-- != 0)). STMT might be single
   statement or a BLOCK. BLOCK is zero or more statements surrounded with 
   {}. One might note that if's version with `else' keyword is required to
   have {} arround each part. This is to solve dangling-else problem,
   for example:

   C:
   	if (b1)
   		if (b2)
			f1();
	else
		f2();
   
   This if of course parsed as:
   
   	if (b1) {
   		if (b2) {
			f1();
		} else {
			f2();
		}
	}

   In C `else' is associated with nearest else-free `if'. As you have seen
   this can be source of problems.

*  There is no explicit notation of pointers. Structures and unions are
   always treated as pointers (as in Java). Strings are built in type.
   Arrays are implemented as abstract type. Pointers to functions are
   discussed below.

*  Functions are first class citizens. This functional programming slogan
   means that you can write a function, that takes function as argument and
   returns function. For example:

	opt_struct int_list {
		int data;
		int_list next;
	}
	
	*(int_list) -> void mapper(*(int) -> void f) 
	{
		void for_all(int_list lst) 
		{
			while (lst != null) {
				f(lst.data);
				lst = lst.next;
			}
		}

		return for_all;
	}

   This function takes function f operating on items, and returns function
   that applaies f to all elements of list.

   The key thing to note is functional type notation. `*(int) -> void f' means
   roughly `void (*f)(int)' -- so it is function that takes single int
   parameter and returns no value. Similary *(int, int_list) -> string is
   function, that takes integer and int_list parameter, and returns string.
   int_list is passed as pointer (in C's sense). 
   
   Functional values are not simply passed as pointers to functions. That
   wouldn't work for `for_all' function from our example, since it needs
   `f' paramater that would disappear from the stack after `mapper'
   has returned. So functional values are passed as pairs of pointers
   to functions, and pointers to special closure structures.

   Nested function definitions (for example the `for_all' function) are
   shortcut to defining functional variables and initializing them, so our
   example could look as:

	*(int_list) -> void mapper(*(int) -> void f) 
	{
		*(int_list) -> void for_all;
		
		for_all = fun (int_list lst) : void -> {
			while (lst != null) {
				f(lst.data);
				lst = lst.next;
			}
		};

		return for_all;
	}
   
   Or even:
   
	*(int_list) -> void mapper(*(int) -> void f) 
	{
		return fun [int_list lst] : void -> {
			while (lst != null) {
				f(lst.data);
				lst = lst.next;
			}
		};
	}
   
*  Special extension is supported, usefull when dealing with functional
   values. Whenever function body (sequence of statements within {}) should 
   appear, single expression within () can be supplaied. It is passed to
   return statement, which is only statement in function body. So:

	fun (int a, int b) : int -> (a + b)

   is equivalent of

	fun (int a, int b) : int -> { return a + b; }

*  Structures. They are defined with struct and opt_struct keywords.

	struct s {
		int a;
		string b;
	};

   is roughly eqivalent of C's:

  	typedef struct {
		int a;
		char *b;
	} *s;

   So, name `s' can be used (only) without any struct or opt_struct,
   as type. You should note, that structures are passed by pointer (or by
   reference if you prefer C++ naming style), therefore changes made to
   struct inside function are reflected in state of it on thr caller side.

   You can access fields of structure with `.' operator.
 
   Now, what opt_struct is all about. struct value is always valid, i.e.
   it has to be initialized with object instance, and you cannot assign
   null pointer to it (btw: null is keyword). opt_struct's are not. They can
   be initialized with null, as well as assigned null. This involves runtime
   check on each access to opt_struct value. When you try to access
   opt_struct value, that is null, Null_access exception is raised.
 
   [[structures are not yet implemented at all]]

*  Polimorphism. This is probably the neatest thing in Gont. Structures, as
   well as functions can be parametrized over types. For example to define
   list of anything datatype you write:

	opt_struct <'a>list {
		'a data;
		<'a>list next;
	}

   Then you use it as:

   	<int>list l;
	<<int>list>list ll;

   The second example is list of lists of int. (If you are familar with C++
   you can note `<<' that has to be written as '< <' there. This is not
   the case in Gont. It is handled specially).

   Functions can be written that operate on lists:

	void iter(*('a) -> void f, <'a>list l)
	{
		while (l != null) {
			f(l);
			l = l.next;
		}
	}

	<'b>list map(*('a) -> 'b f, <'a>list l)
	{
		<'b>list o = null;

		while (l != null) {
			o = {data = f(l.data), next = o};
			l = l.next;
		}

		return o;
	}

   [[This also ain't implemented yet]]
  
*  Module system. Modules provide way of separate compilation along with
   possibility of avoiding namespace conflicts. Current module system is
   based on ideas from OCaml. However it is very limited, we only support
   one level of modules, there are no functors and so on.  Module Foo
   consist of two files: foo.g and foo.gi (interface). Whatever names
   are placed in foo.* files, they are all prefixed with `Foo::'. gontc
   compiles foo.gi file to foo.gio, and foo.g to foo.o. foo.gio is needed
   whenever you access Foo:: symbols from other modules.
  
   Example:
   
   list.gi:
	// this is to output information,
	// that we implment type `t', but not to
	// disclosure what it is.
	type <'a>t;
	'a hd(<'a>t x);
	<'a>t tl(<'a>t x);
	int cnt;
	<'a>t create();
	<'b>t map(*('a) -> 'b f, <'a>t l);
	void iter(*('a) -> void f, <'a>t l);

   list.g:
	// this is local datatype.
	opt_struct <'a>t {
		'a data;
		<'a>opt_struct next;
	}
	
	// this will be exported out (`public')
	'a hd(<'a>t x) { return x.data; }
	<'a>t tl(<'a>t x) { return x.next; }
	
	// this is local, can't be called from outside the module
	void helper(<'a>t x) { ... }

	// and more publics
	int cnt;
	<'a>t create() { cnt++; return null; }
	<'b>t map(*('a) -> 'b f, <'a>t l) { ... }
	void iter(*('a) -> void f, <'a>t l) { ... }
	// ...

   Then if you want to use the module, it can be done with :: notation, like
   this:
  	
	<int>List::t l = List::create();
	...
	int k = List::hd(l);

   In case of some modules it might be usufull to open them, i.e. import
   all symbols from module intro current namespace, so you no longer have to
   use :: notation (but you still can, it is often suggested for readability):

  	open List;
	...
	<int>List::t = List::create();
	...
	int k = hd(l);
	<int>List rest = tl(l);

*  Tuples. Tuple is datatype, that consits of two or more components, which
   are anonymous. Tuples are defined as:

	*[int, int] t1;
	*[int, string, bool] t2;
	*[ctx, ty] t3;

   And used as:

	t1 = [1, 2];
	int a, b;
	[a, b] = t;
	// swap
	[a, b] = [b, a];
	t3 = [1, "hello", true];
	// true and false are keywords

   More on tuples in `Patterns' below.

   [[Yes, you guessed, there are no tuples yet]]

*  Unions. Union are more closly related to ML's datatypes then to C's
   unions. The basic diffrence is that Gont compiler remebers which member 
   is currently stored in union.

   Unions are defined as:

   	union exp {
		int Const;
		string Var;
		*[exp, exp] Add;
		*[exp, exp] Sub;
		*[exp, exp] Mul;
		*[exp, exp] Div;
	}

    This union can be later on used for processing symbolic expressions.
    You can access union components only using pattern matching (there is
    no `.' notation).

    [[Unions? Nope... not yet]]

*  Pattern matching. It is especially usefull with conjunction with tuples
   and unions. Patterns are used in switch and let statements, like this:

   	int compute(exp e)
	{
		switch e {
		case Const[x] 	 : return x;
		case Var[x]	 : return lookup_var(x);
		case Add[e1, e2] : return compute(e1) + compute(e2);
		case Mul[e1, e2] : return compute(e1) * compute(e2);
		case Div[e1, e2] : return compute(e1) / compute(e2);
		case Sub[e1, e2] : return compute(e1) - compute(e2);
		}
	}

	exp diff(exp e)
	{
		switch e {
		case Const[x] 	 : return Const[0];
		case Var[x]	 : return Const[1];
		case Add[e1, e2] : return Add[diff(e1), diff(e2)];
		case Sub[e1, e2] : return Sub[diff(e1), diff(e2)];
		case Div[e1, e2] :
			exp up = Sub[Mul[diff(e1), e2], Mul[e1, diff(e2])];
			exp down = Mul[e2, e2];
			return Div[up, down];
		case Mul[e1, e2] : 
			return Add[Mul[diff(e1), e2], Mul[e1, diff(e2])];
		}
	}

   To convince you, we're not so far from C:

   	union color {
		void Red;
		void Green;
		void Blue;
	}

	int spy007;
	
	string color_name(color c)
	{
		string r;
		
		switch (c) {
		case Red: 
			spy007++;
			r = "red";
		case Green: 
			r = "green";
	        // [] also allowed
		case Blue[]: 
			r = "blue";
		}

		return r;
	}

   You can note lack of break at the end of case clouses. They are not
   required, as there is no fall through (becouse case clouse has to
   introduce new scope).

   In previous examples we have checked for each possibility, but we don't
   have to:
   
	string var_name1(exp e) 
	{
		switch e {
		case Var[x]: return x;
   		// matches any x
		case x: return "not variable";
		}
	}

	string var_name2(exp e) 
	{
		switch e {
		case Var[x]: return x;
		}
	}

   var_name2 would raise Match_failure exception if any pattern didn't
   match.

   Patterns can be used to decomposite tuples, like this:

   	*[int, int] t = [1, 2];

	...
	switch t {
	case [i1, i2]: return i1 + i2;
	}
  
   This can be abbreviated to:

   	let [i1, i2] = t in { return i1 + i2; }

   There can be more then one assigment, like this:

        let [t1, t1] = t,
	    [s1, s2] = s {
	    	// ...
	}

   [[This would be nice, but... not yet]]

*  Exceptions. They are used to inform about unusual situation in a
   program, in order to transfer control to block, that can handle
   it. They can be thought of as member of a huge union:

 	union exn {
		// system defined
		void Null_access;
		void Match_failure;
		void Not_found;
		
		// user defined
		string Syntax_error;
		void ICE;
	}

   New members are added with exception keyword:

   	exception string Syntax_error;
	exception void ICE;

   In order to signal unusual situation, you use raise keyword:

   	raise Syntax_error["parse error"];
	raise ICE;
	raise ICE[];

   At the place, where you know how to handle it, you use try:

   	try {
		open_file();
		...
		parse();
		...
	} with e {
	case Syntax_error[s]:
		print_msg(s);
	case ICE:
		print_msg("Internal Compiler Error");
	} finally {
		close_file();
	}

   First we open some file, then we call parse(), that can raise
   Syntax_error or ICE in which case we print error message,
   or something else, but in then control is transfered to
   upper-level try block. No matter how control leaves try {} block
   instructions in finally { .. } arre excuted. So we always close
   the file.

   [[Nope... but they are expected right after modules, which means
     Real Soon Now. But I guess there will be only very partial support,
     just for system exceptions, until pattern matching is done]]
	

Typesystem
~~~~~~~~~~

ti and t are type expressions. vi are type variables ('a, 'b, 'foo, ...).

Basic types: int, float, string, bool, void, written as is.

Function type is written as follows:

	*(t1, t2, ..., tn) -> t
  
where n >= 0. One might note that this is somewhat diffrent from ML where
all functions take just one parameter (which can be tuple or another
function, but this is not the point). This is closer to C's function
notation.

Tuples:

	*[t1, t2, ... tn]

where n >= 2.

Structures are defined with:

	struct <v1, v2, ..., vn> NAME {
		t1 f1;
		t2 f2;
		...
		tm fm;
	}

Structures that can be invalid (i.e. null):

	opt_struct <v1, v2, ..., vn> NAME {
		t1 f1;
		t2 f2;
		...
		tm fm;
	}

Unions (datatypes) are defined with:

	union <v1, v2, ..., vn> NAME {
		t1 f1;
		t2 f2;
		...
		tm fm;
	}

where n >= 0, m >= 1. If n == 0, <> can be omitted. fi are field names.

They can be later on reffered with:

	<t1, t2, ..., tn> NAME

where n >= 0. If n == 0, <> can be omitted. 

`*' in front of tuple and function types is (crude) way to convince ocamlyacc
(Bison has the same problem, though), that there are none reduce/reduce
conflicts in garmmar. I guess there aren't even without `*', but tell it
to yacc... (this problem is reffered to in Bison manual, there are some
workarounds, however I was unable to make use of it).


Constructing functional values
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are two ways to build functional value in Gont. First comes from C:

  int add(int a, int b) { return a + b; }
  
and the second from ML:

  *(int, int) -> int add = fun (int a, int b) : int -> { return a + b; };

The second way might seem odd to C programmer. Of course, becouse it is
odd when used to define simple C-like function. But it is not when you
need to pass function to another function, let's say:

  void fputs(file f, string s);
  ...
  void do_sth(int foo, *(string) -> void err_report);
  ...
  do_sth(16, fun (string s) : void -> { fputs(f, s); } );

People familiar with ML might recognise that despite the type information
for (fun ...) statement can be obtained from itself, it still has to be given
(even twice). I guess it is place for experiments.

I also would like to have some support for named arguments, so functions
can be called as:

   Window.create(width => 20, height => 23, color => red);

This should come with default values.



Implementation Notes (aka hacker guide)
=======================================

The compiler is written in OCaml. This ensures high portablity, reasonable
speed, along with modularization and type safty. 

Code generation
~~~~~~~~~~~~~~~

It is done in module Gontcodegen.

Compiler outputs intermediate language, called Ksi, that is later on 
compiled by gcc front end. Ksi itself is rather strike through interface
to trees used by gcc internally. It uses lispy syntax.

Ksi output is done through Ksi module. It exports datype for Ksi
expressions and declaration, and function to convert this data into output
stream (that is indented, to allow easier debugging of compiler output).


Argument passing conventions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Values passed as polimorphic parameters has to be all of the same size.
Otherwise we wouldn't be able to generate efficient code. Specially for
this purpouse `int' Gont type is choosen with the same as size as machine
pointer. (This is not the choice of some C compilers on some 64 bit
architectures, for example on alpha-*-linux with GCC sizeof(int) = 4, 
sizeof(long) = sizeof(void*) = 8). 

They are few basic types of passing arguments in Gont. 

1. by value. int and bool are passed this way.
2. by pointer to some location on the heap. structures, unions and tuples 
   are passed this way.
3. by pointer to location, that is not guaranted to exists after function
   terminates (read: this can be pointer to the stack of
   function that called us). If such value needs to be saved after function
   return, is has to be copied to the heap. It also has to be copied if
   calling raise with it. Special functions are provided by the runtime to
   copy such values. Floats and functional values are passed this way.

[[Functional values are currently always passed as pointers to the heap.
I guess some sementic analisis will be done to pass it as pointers to
stack when it is safe. Simiallry for function closures.]]

Functional values cannot be passed simply as pointers to functions,
since they might need clousre in some (most?) cases. Here is layout of
datastructure describing functional value:

  void *fnc;
  void *closure;

closure has to be passed as the first argument to fnc. Closure musn't 
point into the stack.

One might note that it might be impossible to pass regular function this
way. We need auxilary function to be genrated.

# $Id: README,v 1.7 2001/11/18 23:25:17 malekith Exp $
# vim: tw=75
