Re: GtkEtext final design



On Fri, 10 Mar 2000, Derek Simkowiak wrote:

> > > I will also be writing a new lexical scanner (called the gescanner
> > > :) which will be more configurable than the current GScanner.
> 
> >     I do not understand why you plan to create a new
> >     tokenizer. I mean : gscanner is very flexible and
> >     it should be easy to embed it in any kind of 
> >     project.
> 
>      Emmanuel,
> 	It has been several months since I did the research for this, so
> I'm a bit rusty on all of the specific features.  I'll try to dig up my
> notes, but I'll be posting detailed messages to this list when I get to 
> that stage of development.
> 
> 	In short, I saw some things in the "syntax definition" files of
> the win32 program called "TextPad" (http://www.textpad.com/) that implied
> a lexical scanner slightly superior to the GScanner.  I wanted those
> features in my text editor application.
> 
> 	For example, here are some things that are in the TextPad scanner
> config options that are not (AFAIK) in the GScanner (please note I have
> never used the GScanner, and this is only from my reading of the
> documentation):

none of the example config options you outlined below would be particularly
hard to implement on top of the GScanner code. note, that its config option
structure can easily be extended with new binary incompatible Gtk+ versions
while maintaining source compatibility, because compilers zero padd the
remains of partly initilized structures.

> # Supports different grammers, such as markup languages:
> # This turns on keyword recongition (< and > would be for HTML/SGML) 
> SyntaxStart = <   
> SyntaxEnd = >     

what exactly are these used for? the < and > chars are really only important
information for the parser, so it knows when to look for keywords. a lexical
scanner should still just identify them as simple tokens.

> # GScanner has multi-line comments hardcoded to start with '/*' and end
> # with '*/'.  In TextPad, it's configurable:
> 
> # For example, this is for C highlighting:
> CommentStart = /*
> CommentEnd = */

the main reason i didn't make those configurable in the first place in
GScanner, was that there are issues on whether and how you allow nesting
of multi line comments. so for the time being i made them like C comments.

> # ...and this is for HTML:
> CommentStart = !--
> CommentEnd = -->

shouldn't that be:
CommentStart = <!--
CommentEnd = -->
?

> # Not only that, but it supports more than one kind of comment:
> CommentStartAlt = 
> CommentEndAlt =

is that for single line comments? those are generically featured by
GScanner as well.


> # There are other config options that are cool, too.  These are for HTML:
> CharStart = &
> CharEnd = ;

since the characters enclosed in & and ; in xml are actually arbitrary
keywords as well, this again looks more like a parser item.

> # ...but these are for C:
> CharStart = '
> CharEnd = '

while in C, you may not enclose arbitrary strings within ' and '.
but GScanner supports 'c', you'd just have to check whether it contains
more than one character.

> # Here is more cool stuff (example settings used to highlight C):
> BracketChars = {[()]}
> 
> StringStart = "
> StringEnd = "
> StringsSpanLines = Yes
> StringEsc = \

this is very easy to get going in GScanner, basically it already has
the functionality, and you'd just have to make it configurable.

> # There are many other options which I have omitted...

would probably be interesting to evaluate.

> 	These features make it much, much easier to support syntax
> highlighting of different languages, such as Java or Perl.  Furthermore,
> if I implement a scanner which has all of the TextPad scanner's features,
> I can examine their syntax definition files to immediately support
> highlighting of many different languages in my text editor.
> 
> 	If you need to use win32 for *anything*, you really need TextPad.  
> Personally, I consider it to be the best text/hex editor on the planet
> (but I don't come from an Emacs or vi background).
> 
> 	I hope this answers your question.

well, it seems like the config file for TextPad defines items beyond
the pure lexical layer, which is ok and probably necessary to support
syntax highlighting. but that doesn't mean GScanner would fail here,
rather, you should strive for a parser implementation, based on GScanner
to perform the parsing related parts of the TextPad features.

at least so far i haven't seen a compelling reason in your mail to
implement a *new* lexical scanner to surpass GScanner, since at the
lexical level it is easy enough to expand to suit your needs.

and beware, implementing a reliable lexical scanner, that's still
compact and fast enough to outperform lexx and co, is not a trivial
task. it's also very hard to resist the temptation of implementing
parser functionality at the wrong layer.
that being said, i'm willing to provide you with any help you require
to extend GScanner beyond its current functionality to adapt it to
your requirements. think about it, do you really believe that
extending the existing GScanner code that already handles a variety
of lexical tokens would be harder than reimplementing a scanner from
scratch?

> 
> Thanks,
> Derek Simkowiak
> dereks@kd-dev.com
> 

---
ciaoTJ



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]