Re: word processor document format: what parts?



Todd,

Let me qualify everything in this letter by saying that I've only be
reading up on XML/XSL etc. for a few days (I started looking at it
following the great wp discussions of last weekend -- with an eye towards
using it to build a WP).  So, much of whats in here may not be the whole
story.  Those of you who are more experienced please feel free to correct
me.

First off, I think the basic editing structure of Bob should be xml.  This
is hardly radical.  However, based on my four day study of XML, I think
that to really 'be xml' implies some very crucial things about design.

Second, I think that Bob should support XSL as much as possible. (I'll
also go along with CSS or DSSL -- any of them would serve my purposes I
suspect). While this is going to be an uphill battle, a difficult task,
once it is complete you have a complete word processor that can do just
about any formating you require. 

I think the reason why many freeware word processors have failed in the
past is that the rapidly reached a point of diminishing returns trying to
implement features piecemeal.  Using the XSL + XML approach, you can
implement elements in a consistent, logical framework and then add
elements without creating a hodge-podge of flaky features ala MS Word.
Take a look at the web pages of various word document converters sometime:
apparently the file format is really bizaare.

I think the trick to making this work right will be implementing XML/XSL
in your main editing component.  That is, don't try to interpret XML/XSL
and display it to a component that is XML/XSL ignorant. 

This is sounding complicated and its not.  Basically, the idea is this:
when a user inserts text on the screen, the screen location where he is
inserting it should be within SOME kind of XML tag.  To demo:
	<header 1>
	<title>Of mice and men</title>
	The quick brown fox jumped over the purple mushroom.
	</header 1> 
If I insert text after mice, then I have a context of 'header 1/title'. 
On the other hand, if I insert after 'brown', my context is simply 'header
1'.  The point is that my component needs to know the context, and display
the appropriate font without having to go back to some external XML
converter.  Then, at save time, it can just dump the xml elements by
traversing the tree (more on this in a second).

Where this gets cool is when you combine it with XSL.  Basically, using
XSL style sheets, you could force the above to render (in HTML, for
convenience) to:
	<H1>
	<I>Of Mice and Men</I>
	
	The quick ...
	</H1>
Or, with a minor change to the XSL template, you could do:
	<B>
	<HR>
	<I>Of Mice and Men</I>
	
	The quick ...
	<HR>
	</B>
THIS is flexibility.

Which brings me to how I answer your question regarding tags: don't.  I
would recommend against defining a limited set of _xml_ tags that you
support.  Instead, I would concentrate on which xsl
tags/attributes/whatever you support, then, when the time comes, implement
a fairly comprehensive DTD of XML that would use the xsl to the maximum
and a few decent default style sheets. The cool part is that, when you're
done, someone else can come behind you, implement another DTD, another XSL
style sheet, etc with very little hassle. 

For the record, there is absolutely no reason you couldn't have BOTH
context based markup and font manipulation with this kind of scheme.  You
would just define a "Bold" tag in your XML DTD to go along with the
'header 1' tags.

Thirdly, I think you should look real hard at exporting a DOM API from
your edit control via CORBA.  This lets you do a couple of neat things
right off the bat. 

You can load an XML document just by pointing the gnome-xml libs
towards the DOM implementation you're using (once gnome-xml supports DOM:
this is planned functionality).  You can also easily populate this from
out of process using DOM via CORBA.  Hence, converters and such get
easier to write.

When it comes time to save, you just traverse your own DOM tree,
writing tags as you go.  This is very easy coding.

Also, DOM gives you almost a pre-made memory structure for your document
-- just implement DOM in your edit control, then mirror it in memory.
This is pretty close to my understanding of the 'piece table' used by most
modern word processors.  

In any case, I really think that some kind of standardized style sheet
language is the solution.

My $0.02.

Patrick
  
----------------------------------------------------------------------
If we're to have any luck stanching the vain drain, we just have to 
let nerds be nerds...  Owen Edwards, Forbes Magazine
----------------------------------------------------------------------




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]