
   next_inactive up previous

                           Gont for C programmers

   $Id: c2gont.tex,v 1.2 2001/11/24 19:41:12 malekith Exp $

Contents

     * [1]What Gont is meant to be?
     * [2]General, conceptual, differences
          + [3]Safty
          + [4]Level of abstraction
          + [5]Memory Management
          + [6]C compatibility
          + [7]Diffrences against ML
          + [8]Diffrences against Popcorn
     * [9]Concrete differences
          + [10]Empty instruction
          + [11]Control structures
          + [12]Pointers
          + [13]Functions
          + [14]Structures
          + [15]Polimorphism
          + [16]Module system
          + [17]Tuples
          + [18]Unions
          + [19]Pattern matching
          + [20]Exceptions
     * [21]Constructing functional values
     * [22]Typesystem
     * [23]About this document ...

                           What Gont is meant to be?

   Largely imperative composition of C-like lexical layer, control
   structures and speed with ML's typesystem, functions as first class
   citizens, and safety. Plus possibly few more things, like objects.

   What is Gont different from C? C is referred to as a C89 ANSI standard
   (see K&R's The C Language, second edition). Notes about implementation
   are given in [[]].

                       General, conceptual, differences

Safty

   Gont is safe. This means that compiler shouldn't produce any code,
   execution of which would cause SEGV. Parse: it should not be possible
   to overflow buffers, mismatch types of arguments in calls and so on.
   However note, that SEGV can be obtained from playing with extern_c
   declaration, compiler has no means of validating.

Level of abstraction

   Gont is much higher level. It generally operates at much higher level
   of abstraction.

   <evangelisation> C is, in principle, assembly language, therefore it
   is almost as fast as real assembly, but is still easy to write and
   quite portable. It is great for programming things like kernels,
   libraries that need high speed operation (like MPEG
   decoding/encoding), generally most of things that need to be real
   time. However it is very easy to cut your finger with such a sharp
   knife. Typesystem provided by C isn't very powerful, most of people
   familiar with functional programming, would say that typesystem of C
   is no more then a toy. Similarly C doesn't provide much support for
   memory management, exceptions, functional and objective programming.
   Of course, lack of restriction from language about how you organize
   your memory management, objects, and error recovery, can be advantage,
   especially with respect to operating system programming and so on.
   However in most cases, writing xmalloc() 100th time, is annoying. I
   think that's the main reason why Java is so popular - it takes burden
   of thinking at the very low level away from the programmer, and also
   it is much harder to make a mistake (at the cost of more typing, and
   performance regression). </evangelisation>.

Memory Management

   Gont provides transparent memory management, along with garbage
   collection (the particular kind of garbage collector used is Boehm
   conservative garbage collector, used also by GCJ and Popcorn). It
   generally means you can allocate as much as you want, forget the
   pointers, and the Gont runtime system with get rid of unused memory.
   [[not yet, but this is very simple]]

C compatibility

   Gont is not compatible with C (this is difference more against C++
   then C). It is simply impossible to maintain such compatibility in
   safe language. What a braindamage might result, one may easy tell from
   the C++ example.

Diffrences against ML

   How is Gont different from Caml or SML? (Caml and SML are functional
   languages, with imperative features from ML family) Hm... generally
   all languages are interchangeable, what can be written in one, can be
   written in all other. What is different is: how easy can you get the
   code to work, how much bugs will compiler detect, and how much will be
   left for the programmer and how fast will it run. Gont places accents
   on these things somewhere between Caml and C. Generally it does not
   provide as much support for functional programming as Caml does,
   similar can be told about Gonts module system (which is a toy,
   compared to functors and other ML machinery) and restricted
   polimorphism. On the other hand, linking Gont code with C is very
   easy, the only thing you need to remember, is not to put pointers from
   Gont, in malloc()'ed area - save it on stack, or in GC_malloc()'ed
   area. Interfacing OCaml is... ghm... nightmare, mainly because of its
   precise garbage collector. Also Gont code will probably run faster, as
   it uses highly optimizing back end (gcc), and because of restrictions
   put on the language itself (this is probably not true yet).

Diffrences against Popcorn

   How is Gont different from Popcorn? (Popcorn is safe C subset,
   compiler is available to TALx86 (Typed Assembly Language)). Popcorn
   was inspiration for Gont :) However, it is somewhat limited
   (especially to x86) and not currently under development (AFAIK).

                             Concrete differences

Empty instruction

   There is no empty instruction (written as `;' in C). skip keyword is
   introduced instead. For example:

   C:
        while (f(b++));

   Gont:
        while (f(b++))
                skip;

   This is to disallow common C's mistake:
        while (cond);
                do_sth;

   (note `;' after while).

Control structures

     * if cond stmt
     * if cond block else block
     * while cond stmt
     * do stmt while cond

   cond is expression of type bool. (therefore int i=10; while (i--) f();
   is not allowed, one needs to write while (i-- != 0)). stmt might be
   single statement or a block. block is zero or more stmt's surrounded
   with { }. One might note that if's version with else keyword is
   required to have { } around each part. This is to solve dangling-else
   problem, for example:

   C:
        if (b1)
                if (b2)
                        f1();
        else
                f2();

   This if of course parsed as:
        if (b1) {
                if (b2) {
                        f1();
                } else {
                        f2();
                }
        }

   In C else is associated with nearest else-free if. As you have seen
   this can be source of problems.

   However, as there is no danger with using if without { } and else, (as
   in if (x == 0) x++;), it can be used without { }.

Pointers

   There is no explicit notation of pointers. Structures and unions are
   always treated as pointers (as in Java). Strings are built in type.
   Arrays are implemented as abstract type. Pointers to functions are
   discussed below.

Functions

   Functions are first class citizens. This functional programming slogan
   means that you can write a function, that takes function as argument
   and returns function. For example:

        opt_struct int_list {
                int data;
                int_list next;
        }

        *(int_list) -> void mapper(*(int) -> void f)
        {
                void for_all(int_list lst)
                {
                        while (lst != null) {
                                f(lst.data);
                                lst = lst.next;
                        }
                }

                return for_all;
        }

   This function takes function f operating on items, and returns
   function that applies f to all elements of list.

   The key thing to note is functional type notation. *(int) -> void f
   means roughly void (*f)(int) - so it is function that takes single int
   parameter and returns no value. Similarly *(int, int_list) -> string
   is function, that takes int and int_list parameter, and returns
   string. int_list is passed as pointer (in C's sense).

   Functional values are not simply passed as pointers to functions. That
   wouldn't work for for_all function from our example, since it needs f
   parameter that would disappear from the stack after mapper has
   returned. So functional values are passed as pairs of pointers to
   functions, and pointers to special closure structures.

   Nested function definitions (for example the for_all function) are
   shortcut to defining functional variables and initializing them, so
   our example could look as:

        *(int_list) -> void mapper(*(int) -> void f)
        {
                *(int_list) -> void for_all;

                for_all = fun (int_list lst) -> void is {
                        while (lst != null) {
                                f(lst.data);
                                lst = lst.next;
                        }
                };

                return for_all;
        }

   Or even:

        *(int_list) -> void mapper(*(int) -> void f)
        {
                return fun (int_list lst) -> void is {
                        while (lst != null) {
                                f(lst.data);
                                lst = lst.next;
                        }
                };
        }

   Special extension is supported, useful when dealing with functional
   values. Whenever function body (sequence of statements within { })
   should appear, single expression within ( ) can be supplied. It is
   passed to return statement, which is only statement in function body.
   So:

        fun (int a, int b) -> int is (a + b)

   is equivalent of

        fun (int a, int b) -> int is { return a + b; }

Structures

   They are defined with struct and opt_struct keywords.

        struct s {
                int a;
                string b;
        };

   is roughly equivalent of C's:

        typedef struct {
                int a;
                char *b;
        } *s;

   So, name s can be used (only) without any struct or opt_struct, as
   type. You should note, that structures are passed by pointer (or by
   reference if you prefer C++ naming style), therefore changes made to
   struct inside function are reflected in state of it on the caller
   side.

   You can access fields of structure with `.' operator.

   Now, what opt_struct is all about. struct value is always valid, i.e.
   it has to be initialized with object instance, and you cannot assign
   null pointer to it (BTW: null is keyword). opt_struct's are not. They
   can be initialized with null, as well as assigned null. This involves
   runtime check on each access to opt_struct value. When you try to
   access opt_struct value, that is null, Null_access exception is
   raised.

   If you do:

        void f(s1 x)
        {
                s1 y;
                y = x;
                y.fld = 0;  // here you also modify x.fld
        }

   In general assigning values other then int's, bool's and float's
   copies pointer, not content, i.e. makes an alias of object.

   [[structures are not yet implemented at all]]

Polimorphism

   This is probably the neatest thing in Gont. Structures, as well as
   functions can be parameterized over types. For example to define list
   of anything data type you write:

        opt_struct <'a>list {
                'a data;
                <'a>list next;
        }

   'a is alpha. It is type variable, any type could be substituted in
   place of it. (similarly 'b is beta, but I really don't know what 'foo
   is... :)

   Then you use it as:

        <int>list l;            // list of ints
        <<int>list>list ll;     // list of lists of ints

   If you are familiar with C++ you can note `' that has to be written
   as '< <' there. This is not the case in Gont. It is handled specially.

   Functions can be written that operate on lists:

        void iter(*('a) -> void f, <'a>list l)
        {
                while (l != null) {
                        f(l);
                        l = l.next;
                }
        }

        <'b>list map(*('a) -> 'b f, <'a>list l)
        {
                <'b>list o = null;

                while (l != null) {
                        o = {data = f(l.data), next = o};
                        l = l.next;
                }

                return o;
        }

   Later on you can use defined functions.

   Suppose you have defined:

        string int2string(int);
        void print_string(string);

   somewhere, then:

        <int>list il = { data = 1, next = { data = 2, next = null}};
        <string>list sl = map(int2string, il);
        iter(print_string, sl);

   [[This also ain't implemented yet]]

Module system

   Modules provide way of separate compilation along with possibility of
   avoiding namespace conflicts. Current module system is based on ideas
   from OCaml. However it is very limited, we only support one level of
   modules, there are no functors and so on. Module Foo consist of two
   files: foo.g and foo.gi (interface). Whatever names are placed in
   foo.* files, they are all prefixed with `Foo::'. gontc compiles foo.gi
   file to foo.gio, and foo.g to foo.o. foo.gio is needed whenever you
   access Foo:: symbols from other modules.

   Example:

   list.gi:
        // this is to output information,
        // that we implement type `t', but not to
        // disclosure what it is.
        type <'a>t;
        'a hd(<'a>t x);
        <'a>t tl(<'a>t x);
        int cnt;
        <'a>t create();
        <'b>t map(*('a) -> 'b f, <'a>t l);
        void iter(*('a) -> void f, <'a>t l);

   list.g:
        // this is local datatype.
        opt_struct <'a>t {
                'a data;
                <'a>opt_struct next;
        }

        // this will be exported out (`public')
        'a hd(<'a>t x) { return x.data; }
        <'a>t tl(<'a>t x) { return x.next; }

        // this is local, can't be called from outside the module
        void helper(<'a>t x) { ... }

        // and more publics
        int cnt;
        <'a>t create() { cnt++; return null; }
        <'b>t map(*('a) -> 'b f, <'a>t l) { ... }
        void iter(*('a) -> void f, <'a>t l) { ... }
        // ...

   Then if you want to use the module, it can be done with :: notation,
   like this:

        <int>List::t l = List::create();
        ...
        int k = List::hd(l);

   In case of some modules it might be useful to open them, i.e. import
   all symbols from module intro current namespace, so you no longer have
   to use :: notation (but you still can, it is often suggested for
   readability):

        open List;
        ...
        <int>List::t l = List::create();
        ...
        int k = hd(l);
        <int>List rest = tl(l);

Tuples

   Tuple is datatype, that consists of two or more components, which are
   anonymous. Tuples are defined as:

        *[int, int] t1;
        *[int, string, bool] t2;
        *[ctx, ty] t3;

   And used as:

        t1 = [1, 2];
        int a, b;
        [a, b] = t;
        // swap
        [a, b] = [b, a];
        t3 = [1, "hello", true];
        // true and false are keywords

   More on tuples in `Patterns' below.

   [[Yes, you guessed, there are no tuples yet]]

Unions

   Union are more closely related to ML's datatypes then to C's unions.
   The basic difference is that Gont compiler remembers which member is
   currently stored in union.

   Unions are defined as:

        union exp {
                int Const;
                string Var;
                *[exp, exp] Add;
                *[exp, exp] Sub;
                *[exp, exp] Mul;
                *[exp, exp] Div;
        }

   This union can be later on used for processing symbolic expressions.
   You can access union components only using pattern matching (there is
   no `.' notation).

   [[Unions? Nope... not yet]]

Pattern matching

   It is especially useful with conjunction with tuples and unions.
   Patterns are used in switch and let statements, like this:

        int compute(exp e)
        {
                switch e {
                case Const[x]    : return x;
                case Var[x]      : return lookup_var(x);
                case Add[e1, e2] : return compute(e1) + compute(e2);
                case Mul[e1, e2] : return compute(e1) * compute(e2);
                case Div[e1, e2] : return compute(e1) / compute(e2);
                case Sub[e1, e2] : return compute(e1) - compute(e2);
                }
        }

        exp diff(exp e)
        {
                switch e {
                case Const[x]    : return Const[0];
                case Var[x]      : return Const[1];
                case Add[e1, e2] : return Add[diff(e1), diff(e2)];
                case Sub[e1, e2] : return Sub[diff(e1), diff(e2)];
                case Div[e1, e2] :
                        exp up = Sub[Mul[diff(e1), e2], Mul[e1, diff(e2])];
                        exp down = Mul[e2, e2];
                        return Div[up, down];
                case Mul[e1, e2] :
                        return Add[Mul[diff(e1), e2], Mul[e1, diff(e2])];
                }
        }

   To convince you, we're not so far from C:

        union color {
                void Red;
                void Green;
                void Blue;
        }

   This is roughly equivalent of C's: `typedef enum Red, Green, Blue
   color'. Then we do:

        int spy007;

        string color_name(color c)
        {
                string r;

                switch (c) {
                case Red:
                        spy007++;
                        r = "red";
                case Green:
                        r = "green";
                // [] also allowed, if you don't know why, just don't use
                // it :)
                case Blue[]:
                        r = "blue";
                }

                return r;
        }

   You can note lack of break at the end of case clauses. They are not
   required, as there is no fall through (because case clause has to
   introduce new scope).

   In previous examples we have checked for each possibility, but we
   don't have to:

        string var_name1(exp e)
        {
                switch e {
                case Var[x]: return x;
                // matches any x
                case x: return "not variable";
                }
        }

        string var_name2(exp e)
        {
                switch e {
                case Var[x]: return x;
                }
        }

   var_name2 would raise Match_failure exception if any pattern didn't
   match.

   Patterns can be used to decomposite tuples, like this:

        *[int, int] t = [1, 2];

        ...
        switch t {
        case [i1, i2]: return i1 + i2;
        }

   This can be abbreviated to:

        let [i1, i2] = t in { return i1 + i2; }

   There can be more then one assignment, like this:

        let [t1, t1] = t,
            [s1, s2] = s {
                // ...
        }

   The let assignment and binding names with case just creates new name
   for an object. Specificly it means that assigning values to names
   bound with let/case changes object itself. Example:

        *[int, string] t = [1, "one"];

        switch t {
        case [i, s]: i = 2;
        }

        let [i, s] = t in { s = "two"; }

        // here t == [2, "two"]

   One can note that you can also decomposite t with:

        string s;
        int i;
        [i, s] = t;
        // here i = 2, s = "two"
        // however:
        i = 3; s = "three";
        // here i = 3, s = "three", but t == [2, "two"]

   You can also pattern-match structures [[describe this]]

   [[This would be nice, but... not yet]]

Exceptions

   They are used to inform about unusual situation in a program, in order
   to transfer control to block, that can handle it. They can be thought
   of as member of a huge union:

        union exn {
                // system defined
                void Null_access;
                void Match_failure;
                void Not_found;

                // user defined
                string Syntax_error;
                void ICE;
        }

   New members are added with exception keyword:

        exception string Syntax_error;
        exception void ICE;

   In order to signal unusual situation, you use raise keyword:

        raise Syntax_error["parse error"];
        raise ICE;
        raise ICE[];

   At the place, where you know how to handle it, you use try:

        try {
                open_file();
                ...
                parse();
                ...
        } with e {
        case Syntax_error[s]:
                print_msg(s);
        case ICE:
                print_msg("Internal Compiler Error");
        } finally {
                close_file();
        }

   First we open some file, then we call parse(), that can raise
   Syntax_error or ICE in which case we print error message, or something
   else, but in then control is transfered to upper-level try block. No
   matter how control leaves try {} block instructions in finally { ... }
   are executed. So we always close the file.

   [[Nope... but they are expected right after modules, which means Real
   Soon Now. But I guess there will be only very partial support, just
   for system exceptions, until pattern matching is done]]

                        Constructing functional values

   There are two ways to build functional value in Gont. First comes from
   C:

        int add(int a, int b) { return a + b; }

   and the second from ML:

        *(int, int) -> int add =
                fun (int a, int b) -> int is { return a + b; };

   The second way might seem odd to C programmer. Of course, because it
   is odd when used to define simple C-like function. But it is not when
   you need to pass function to another function, let's say:

        void fputs(file f, string s);
        ...
        void do_sth(int foo, *(string) -> void err_report);
        ...
        do_sth(16, fun (string s) -> void is { fputs(f, s); } );

   People familiar with ML might recognize that despite the type
   information for (fun ...) statement can be obtained from itself, it
   still has to be given (even twice). I guess it is place for
   experiments.

   I also would like to have some support for named arguments, so
   functions can be called as:

        Window::create(width => 20, height => 23, color => red);

   This should come with default values.

                                  Typesystem

   ti and t are type expressions. vi are type variables ('a, 'b, 'foo,
   ...).

   Basic types: int, float, string, bool, void, written as is.

   Function type is written as follows:

        *(t1, t2, ..., tn) -> t

   where n >= 0. One might note that this is somewhat different from ML
   where all functions take just one parameter (which can be tuple or
   another function, but this is not the point). This is closer to C's
   function notation.

   Tuples:

        *[t1, t2, ..., tn]

   where n >= 2.

   Structures are defined with:

        struct <v1, v2, ..., vn> NAME {
                t1 f1;
                t2 f2;
                ...
                tm fm;
        }

   Structures that can be invalid (i.e. null):

        opt_struct <v1, v2, ..., vn> NAME {
                t1 f1;
                t2 f2;
                ...
                tm fm;
        }

   Unions (datatypes) are defined with:

        union <v1, v2, ..., vn> NAME {
                t1 f1;
                t2 f2;
                ...
                tm fm;
        }

   where n >= 0, m >= 1. If n == 0, <> can be omitted. fi are field
   names.

   They can be later on referred with:

        <t1, t2, ..., tn> NAME

   where n >= 0. If n == 0, <> can be omitted.

   `*' in front of tuple and function types is (crude) way to convince
   ocamlyacc (Bison has the same problem, though), that there are none
   reduce/reduce conflicts in grammar. I guess there aren't even without
   `*', but tell it to yacc... (this problem is referred to in Bison
   manual, there are some workarounds, however I was unable to make use
   of it).

                            About this document ...

   Gont for C programmers

   This document was generated using the [24]LaTeX2HTML translator
   Version 99.2beta8 (1.46)

   Copyright  1993, 1994, 1995, 1996, [25]Nikos Drakos, Computer Based
   Learning Unit, University of Leeds.
   Copyright  1997, 1998, 1999, [26]Ross Moore, Mathematics Department,
   Macquarie University, Sydney.

   The command line arguments were:
   latex2html -split 0 c2gont.tex

   The translation was initiated by Michal/ Moskal on 2001-12-03
     _________________________________________________________________

   next_inactive up previous


    Michal/ Moskal 2001-12-03

References

   1. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/c2gont.html
   2. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00030000000000000000
   3. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00031000000000000000
   4. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00032000000000000000
   5. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00033000000000000000
   6. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00034000000000000000
   7. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00035000000000000000
   8. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00036000000000000000
   9. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00040000000000000000
  10. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00041000000000000000
  11. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00042000000000000000
  12. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00043000000000000000
  13. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00044000000000000000
  14. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00045000000000000000
  15. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00046000000000000000
  16. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00047000000000000000
  17. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00048000000000000000
  18. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00049000000000000000
  19. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION000410000000000000000
  20. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION000411000000000000000
  21. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00050000000000000000
  22. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00060000000000000000
  23. file://localhost/home/users/malekith/gont/cvs/gont/doc/c2gont/index.html#SECTION00070000000000000000
  24. http://www-dsed.llnl.gov/files/programs/unix/latex2html/manual/
  25. http://cbl.leeds.ac.uk/nikos/personal.html
  26. http://www.maths.mq.edu.au/~ross/
