Re: GtkEtext final design [really OT]
- From: "Emmanuel DELOGET" <logout free fr>
- To: <gtk-devel-list redhat com>
- Subject: Re: GtkEtext final design [really OT]
- Date: Sat, 11 Mar 2000 20:57:45 +0100
From: Christopher Kohnert <cjkohner@brain.uccs.edu>
> Emmanuel DELOGET wrote:
> >
> > From: Tim janik <timk@gtk.org>
> > > the main reason i didn't make those configurable in the first place in
> > > GScanner, was that there are issues on whether and how you allow nesting
> > > of multi line comments. so for the time being i made them like C comments.
> >
> > I don't think it's up to the scanner to say wether some lines
> > are a multiple lined comment or not - it's more a parser task
> > [so you should not have to deal with this issue in gscanner]
> >
> [snip]
> >
> > A scanner is just a tokenizer - ie it reads an input from
> > a buffer (possibly the keyboard buffer if you do interactive
> > scanning), brake the entry into words and tries to match the
> > word to a largeur group of tokens.
> >
> [large snip]
>
> Erm, a scanner is all relative. Often things that theoretically should
> go into a parser are put into a lexer for convenience. As a lexer only
> identifies regular grammars, you need to add things to it to identify
> non-regular strings. Such as the capability for a nested comment by
> adding a single integer to the scanner. Symbols are often added at lex
> time not parse time for convenience as well, though symbols apply more
> to a compiler than to generic scanning. There is not such a distinction
> that you draw, and often things that are theoretically supposed to go
> into a parser are much more suited to go into the lexer both for speed
> and convenience.
Symbols are available as tokens. For example, the 'if' symbol
can be represented just as a keyword - or you'll have to deals
with a lot of problems in your parser implementation. Symbol
determination is really not a parser issue - but their correct
use is.
> And as far as some of your examples of multi-line tokens... there's no
> reason you couldn't identify a multi-line #define (as it still matches a
> regular expression).
[yes... figured out that '\\\n' is still acceptable for a scanner :)]
>
> I think you're being a bit to formal as to what goes into a scanner, and
> if you do that, it becomes significantly less powerful.
>
> Christopher
When you try to explain the difference between two different
stuff, it is a bad idea to begin your speach with 'they are
not the same, but they can do the same thing'. Of course,
a LR(1) grammar is equivalent to a set of regular expression
(this mean that you do not need a parser to parse such a
grammar) but LR(1) grammer are not as useful as LALR(1)'s.
I did fairly interesting stuff in the past for a school
project (reimplementing a yacc-like tool without any use
of global and/or static vars - not very difficult but very
interesting task) and therefore I know what a scanner can
do. But I choose to described under other terms : what a
scanner should actually do :)
Moreover, if you want to have a full, reusable scanner code
I think it's better to not deal with the parser 'reserved'
areas. Actually, a scanner does not need to know what are
nested comments. He should know what are these '/*' and
'*/' words but he really don't care their meaning.
Lexical analyse is not syntaxic analyse. The lexical
pass just deals with the 'word A is correctly spelled'
issue, while syntaxic deals with 'word A is at the correct
place in the sentence'.
That's how I learned English (well, my personnal scanner
and parsers still have some bugs and memory leaks :)
Now, a word about the gscanner comments feature:
it is clear that a scanner which knows what a comment
is provides implementation facilities from a parser point
of view. But there is another way to do it : using a
tool like cpp does the trick - and if the only goal of
such a tool is to get rid of comments in the
source, it's a trivial routine to write (even if
you want to allow multiple lined nested comments).
Yours,
Emmanuel
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]