Re: GtkEtext final design [really OT]



On Sat, 11 Mar 2000, Emmanuel DELOGET wrote:

>     I did fairly interesting stuff in the past for a school
>     project (reimplementing a yacc-like tool without any use
>     of global and/or static vars - not very difficult but very
>     interesting task) and therefore I know what a scanner can
>     do. But I choose to described under other terms : what a
>     scanner should actually do :)
> 
>     Moreover, if you want to have a full, reusable scanner code
>     I think it's better to not deal with the parser 'reserved'
>     areas. Actually, a scanner does not need to know what are
>     nested comments. He should know what are these '/*' and
>     '*/' words but he really don't care their meaning.
>     Lexical analyse is not syntaxic analyse. The lexical
>     pass just deals with the 'word A is correctly spelled'
>     issue, while syntaxic deals with 'word A is at the correct
>     place in the sentence'. 
> 
>     That's how I learned English (well, my personnal scanner 
>     and parsers still have some bugs and memory leaks :)
> 
>     Now, a word about the gscanner comments feature: 
>     it is clear that a scanner which knows what a comment 
>     is provides implementation facilities from a parser point 
>     of view. But there is another way to do it : using a 
>     tool like cpp does the trick - and if the only goal of
>     such a tool is to get rid of comments in the
>     source, it's a trivial routine to write (even if
>     you want to allow multiple lined nested comments).

good observation, basically you and Christopher both present valid points.
while there can a really clean line be draw between parser and lexer in
theory, for practical purposes, you often make specific tradeoffs.
gscanner particularly, was meant as a powerfull lexer for things like gtk's
or gimp's rc files, as well as parsing lisp syntax.

as such, gscanner provides also convenience features for keywords, that
is, it allowes for <identifier> (whatever characters that may consist of
is configurable, e.g. [_A-Za-z][_A-Za-z0-9]* for C) to be automatically
translated into predefined tokens through a hashtable provided by the
scanner.

that i had to parse C comments as well was actually the reason that some
preprocessor magic "sneaked" in, i.e. the /* .... */ stuff ;)

for instance an often requested feature for number parsing, i.e. automatic
evaluation of '-' as unaray prefixing number specifier was intentionally
*not* added, since ways in which a minus can be interpreted is definitely
a parser issue (is it an unary operator, a binary operator, or part of
e.g. a C reference '->', ...)

so far, one interesting thing has been braught up, that should definitely
made its way into gscanner, a character pair to be scanned as
line-continuation, i.e. "\\\n". i think that would also be incredibly
easy to implement ;)


> 
>     Yours,
> 
>     Emmanuel
> 

---
ciaoTJ



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]