Re: Easing language bindings (was Re: GtkPlot widget)

> To solve the language binding problem once and for all what we need is a
> tool that parses the C headers and can build bindings from them. Otherwise
> it will always take forever to sync the bindings with the C library, and
> stuff like this will cause pain.
> The only bindings author to use this approach so far is Manuel with his
> Haskell bindings. He has a cool tool that even autogenerates the functions
> to access struct members. That tool is written in Haskell - maybe not a
> big deal since only the binding maintainers would have to install the
> Haskell compiler - but Perl is likely more convenient and Owen points out
> that the gtk-doc codebase is a good place to start writing a Perl version.
> In any case Manuel has a paper about his approach on his web site that
> ought to be interesting.
> If all the language binding authors worked together you could probably
> have some code to extract all the type and function information from
> GTK/Gnome in a fairly short period of time; then writing a binding is just
> a matter of walking these data structures and spitting out code in the
> proper language, and maybe adding some special cases.

I like to add that my tool, called C->Haskell, is a generic
interface generator for bindings from Haskell to C libraries
(ie, it doesn't exploit any special structure of GTK+/Gnome
headers).  It implements a good part of the front-end of an
ANSI C compiler (not that much work in Haskell).  In
addition to the C header file, it reads a Haskell module
skeleton, which says which C objects to bind, and from this
generates the Haskell-land API for the C library.  You can
get the thing from

(in the `Documentation' section is a reference to the paper
that Havoc mentioned).

As the Gtk+Haskell binding is probably the most complicated
of the existing bindings - just because Haskell is
semantically about as far away from C as you can get - I
would like to add some comments regarding a "unified"
infrastructure for GTK+ bindings.

Re IDL: IDL is an overkill for bindings to C-like languages
  and untyped scripting languages and is itself too
  low-level for the more interesting problems in the binding
  to a language like Haskell.  (So, not much is gained apart
  from points in the buzzword department.)

Re SWIG: SWIG is cool, but I don't like the idea of manually
  adding annotations to the C header files - especially, as
  languages based on different paradigms will require
  different annotations (and you really don't want to
  clutter the standard GTK+/Gnome headers with the
  annotations required for Haskell, believe me ;-)  See, the
  paper about C->Haskell for why I think, SWIG wouldn't work
  well for languages like Haskell.

Conceptually, generation of any language binding requires
three categories of information:

(1) Information already contained in the GTK+/Gnome C
    headers that is needed for any binding.  This is most
    conveniently extracted directly from the C source, as
    Havoc has pointed out.

(2) Information needed in every binding, but not contained
    in the C header - eg, which functions or struct members
    are private and should not be bound.  It would be nice
    to share his information between the different
    GTK+/Gnome bindings.

(3) Information specific to the host language for which the
    generated binding is.  That's the real culprit, and the
    amount of this information largely decides how much work
    it is to implement the language binding.

Some information is kind of inbetween (1) and (2): It cannot
be deduced from the C header by reasoning about C alone, but
only when taking some of the conventions into account that
were followed when writing the C code (most prominently,
storage (de)allocation policies).

In some language bindings, there is very little of category
(3), because the information provided by the host language
API is very close to that in the C API.  (This is exactly
the case, where SWIG works very well.)  In other bindings,
there is a lot of stuff in Category (3) - in the case of
Haskell, it is so much that it doesn't make much difference
whether the information from Category (2) can be obtained
and integrated automatically or whether it is manually
encoded together with that from Category (3).

C->Haskell is designed to exploit that structure, it reads
and analyses the C header file to obtain the information in
Category (1) and it reads the Haskell module skeleton to get
the information from Category (2) and (3).  A Grand Unified
GTK+/Gnome Binding Tool, could read three input files
(corresponding to the three categories) or exploit coding
conventions used in the libraries to get the information for
both Category (1) and (2) from the C headers, and then, only
read the host language-depend file in addition.

Personally, I would tend to handling Category (1) and (2)
together by exploiting conventions and maybe by
"standardising" binding hints in C comments to indicate, for
example, which declarations are private and should not be
bound.  The information from Category (3) would be contained
in a host language-specific `binding file' implemented by
the binding author.  Then, those parts of the Grand Unified
GTK+/Gnome Binding Tool that read the C header files could
be shared between language bindings - the parts reading the
host language-specific `binding file' and the backend would,
of course, be language-dependent (this is not unlike the
structure used in ORBit, I think, but with the addition of
the host language-specific binding files).


PS: From my experience with C->Haskell, I am not convinced
    that I would want to implement this in Perl.  Analysing
    C's arcane declarator syntax and semantics gets quite
    tricky in some places, and while GTK+'s header files are
    quite well-behaved, there seems to be more stylistic
    variation in the Gnome headers.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]