Re: GtkEtext final design [OT?]
- From: "Emmanuel DELOGET" <logout free fr>
- To: <gtk-devel-list redhat com>
- Subject: Re: GtkEtext final design [OT?]
- Date: Sat, 11 Mar 2000 12:43:27 +0100
From: Tim janik <timk@gtk.org>
> the main reason i didn't make those configurable in the first place in
> GScanner, was that there are issues on whether and how you allow nesting
> of multi line comments. so for the time being i made them like C comments.
I don't think it's up to the scanner to say wether some lines
are a multiple lined comment or not - it's more a parser task
[so you should not have to deal with this issue in gscanner]
BTW, what about adding a new gparser object to glib (sould be
a simple LALR or even a simple SLR parser - not that hard to
implement)
From: Derek Simkowiak <dereks@kd-dev.com>
>
> > > # There are other config options that are cool, too. These are for HTML:
> > > CharStart = &
> > > CharEnd = ;
> >
> > since the characters enclosed in & and ; in xml are actually arbitrary
> > keywords as well, this again looks more like a parser item.
>
> This is one example where my understanding of scanner vs. parser
> falls apart.
Well (Hope I'm clear).
A scanner is just a tokenizer - ie it reads an input from
a buffer (possibly the keyboard buffer if you do interactive
scanning), brake the entry into words and tries to match the
word to a largeur group of tokens.
Let's define a lex-like scanner (take care: this is not actually
a pure lex/flex syntax)
/* the tokens */
STRING \"[^\"\n]*\" /* begin with ", end with ", does not contains */
/* newlines or " */
KEYWORDS "option"|"true"|"false"
EOL ";"
/* the additional rules */
{STRING} return TOK_STRING;
{KEYWORDS} return TOK_KEYWORD;
{EOL} return TOK_EOL
/* eat up whitespaces */
" \n\t" ;
/* additional rule which match anything not already matched */
. return TOK_ERROR;
The scanner read one token at a time (it does not rely on any
kind of separator to read the words - it uses a state machine
which is rather simple [more informations about how a scanner
state machine is done can be found in the Aho/Sethi/Ullman
compiler book]) and return the type of this token.
With the following entry (in a config file for example)
option "ShowTooltips" true;
option 'MiseryLovesCompany' = true;
It will return:
TOK_KEYWORD - on the first iter
TOK_STRING - on the 2nd iter
TOK_KEYWORD - ...
TOK_EOL
TOK_KEYWORD
TOK_STRING
TOK_ERROR - '=' is not matched by anything
TOK_KEYWORD
TOK_EOL
That's it. It is the only thing he does (consider a syntax
hihlighter : by using a scanner, it is able to grab the
type of a word and thus to find the correct color for this
word. But this is probably not enough to do a correct syntax
highligh (consider C and a multiple-line #define statement ;
as we do not know where our #define end, we can only handle
one-lined #define with the following rule:
DEFINE "#define"[[:blank:]]+[^\n]* -- ))
So we need a parser. The parser works with token types
(and not tokens). He is able to say if a list of
token represents a syntaxicly valid sentence.
Fore example, let's define a small parser for the example
above:
%token TOK_KEYWORD TOK_STRING TOK_ERROR TOK_EOL
start:
list-of-options;
list-of-options:
option-line | option-line list-of-options;
option-line:
TOK_KEYWORD TOK_STRING TOK_KEYWORD TOK_EOL
The parser relies on a underlying scanner and uses
it in a loop (more complicated than this one, but
this one should do the trick)
void *tok_value;
int tok;
int current_rule;
ParserState *parser_state;
ParserRule *rule;
current_rule = RULE_start;
scanner_init();
tok = scanner_get_token();
while (!end of input)
{
parser_state = parser_get_next_state(current_rule,
tok);
switch (parser_state->stack_action)
{
case PUSH_RULE:
parser_push(parser_state->state, NULL);
break;
case POP_RULE:
rule = parser_pop(parser_state->pos_size);
if (parser_state->action(rule) == PARSE_ERROR)
{
parser_display_error(rule);
return PARSE_ERROR;
}
break;
case PUSH_TOK:
parser_push(parser_state->state, tok);
tok = scanner_get_token();
break;
}
}
return PARSE_SUCCESSFUL;
[I did not work with parsers internal for a while now
and there should be some deign mistales. A more
valid loop can be found in any compiler book]
Given the scanner results from above, it will produce an
error on line 2 (since the sentence TOK_KEYWORD
TOK_STRING TOK_ERROR is not matched by its rules).
Again, more informations about parsers can be found
in the Aho/Sethi/Ullman compiler book.
I hope I made the distinction between parser and scanner
clearer for you.
Yours,
Emmanuel
>
> Thanks,
> Derek Simkowiak
> dereks@kd-dev.com
>
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]