Re: GtkEtext final design [OT?]



From: Tim janik <timk@gtk.org>
> the main reason i didn't make those configurable in the first place in
> GScanner, was that there are issues on whether and how you allow nesting
> of multi line comments. so for the time being i made them like C comments.

    I don't think it's up to the scanner to say wether some lines
    are a multiple lined comment or not - it's more a parser task
    [so you should not have to deal with this issue in gscanner]

    BTW, what about adding a new gparser object to glib (sould be
    a simple LALR or even a simple SLR parser - not that hard to
    implement)

From: Derek Simkowiak <dereks@kd-dev.com>
> 
> > > # There are other config options that are cool, too.  These are for HTML:
> > > CharStart = &
> > > CharEnd = ;
> > 
> > since the characters enclosed in & and ; in xml are actually arbitrary
> > keywords as well, this again looks more like a parser item.
> 
> This is one example where my understanding of scanner vs. parser
> falls apart.

    Well (Hope I'm clear).
    
    A scanner is just a tokenizer - ie it reads an input from
    a buffer (possibly the keyboard buffer if you do interactive
    scanning), brake the entry into words and tries to match the
    word to a largeur group of tokens.

    Let's define a lex-like scanner (take care: this is not actually
    a pure lex/flex syntax)

    /* the tokens */
    STRING         \"[^\"\n]*\"   /* begin with ", end with ", does not contains */
                                 /* newlines or " */
    KEYWORDS       "option"|"true"|"false"
    EOL            ";"    
    
    /* the additional rules */
    {STRING}         return TOK_STRING;
    {KEYWORDS}       return TOK_KEYWORD;
    {EOL}            return TOK_EOL
    /* eat up whitespaces */
    " \n\t"          ;
    /* additional rule which match anything not already matched */
    .                return TOK_ERROR;

    The scanner read one token at a time (it does not rely on any
    kind of separator to read the words - it uses a state machine
    which is rather simple [more informations about how a scanner
    state machine is done can be found in the Aho/Sethi/Ullman
    compiler book]) and return the type of this token.
    With the following entry (in a config file for example) 

    option "ShowTooltips" true;
    option 'MiseryLovesCompany' = true;

    It will return:
    TOK_KEYWORD - on the first iter
    TOK_STRING - on the 2nd iter
    TOK_KEYWORD - ...
    TOK_EOL
    TOK_KEYWORD
    TOK_STRING
    TOK_ERROR - '=' is not matched by anything
    TOK_KEYWORD
    TOK_EOL

    That's it. It is the only thing he does (consider a syntax
    hihlighter : by using a scanner, it is able to grab the
    type of a word and thus to find the correct color for this
    word. But this is probably not enough to do a correct syntax
    highligh (consider C and a multiple-line #define statement ; 
    as we do not know where our #define end, we can only handle
    one-lined #define with the following rule:
    DEFINE        "#define"[[:blank:]]+[^\n]*    -- ))
    
    So we need a parser. The parser works with token types
    (and not tokens). He is able to say if a list of
    token represents a syntaxicly valid sentence.

    Fore example, let's define a small parser for the example
    above:

    %token TOK_KEYWORD TOK_STRING TOK_ERROR TOK_EOL

    start:
        list-of-options;

    list-of-options:
        option-line | option-line list-of-options;

    option-line:
        TOK_KEYWORD TOK_STRING TOK_KEYWORD TOK_EOL

    The parser relies on a underlying scanner and uses
    it in a loop (more complicated than this one, but
    this one should do the trick)

    void         *tok_value;
    int          tok;
    int          current_rule;
    ParserState  *parser_state;
    ParserRule   *rule;

    current_rule = RULE_start;
    scanner_init();
    tok = scanner_get_token();
    while (!end of input)
      {
        parser_state = parser_get_next_state(current_rule, 
                                             tok);
        switch (parser_state->stack_action)
          {
          case PUSH_RULE:
            parser_push(parser_state->state, NULL);
            break;
          case POP_RULE:
            rule = parser_pop(parser_state->pos_size);
            if (parser_state->action(rule) == PARSE_ERROR)
              {
                parser_display_error(rule);
                return PARSE_ERROR;
              }
            break;
          case PUSH_TOK:
            parser_push(parser_state->state, tok);
            tok = scanner_get_token();
            break;
          }
      }
    return PARSE_SUCCESSFUL;

    [I did not work with parsers internal for a while now
    and there should be some deign mistales. A more
    valid loop can be found in any compiler book]    

    Given the scanner results from above, it will produce an
    error on line 2 (since the sentence TOK_KEYWORD
    TOK_STRING TOK_ERROR is not matched by its rules).
 
    Again, more informations about parsers can be found
    in the Aho/Sethi/Ullman compiler book.

    I hope I made the distinction between parser and scanner
    clearer for you.

    Yours,

    Emmanuel

> 
> Thanks,
> Derek Simkowiak
> dereks@kd-dev.com
> 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]