Re: HarfBuzz API design

From: Behdad Esfahbod <behdad behdad org>
To: Carl Worth <cworth cworth org>
Cc: Ralph Giles <giles ghostscript com>, Chris Wilson <chris chris-wilson co uk>, "gtk-i18n-list gnome org" <gtk-i18n-list gnome org>, Harfbuzz <harfbuzz lists freedesktop org>
Subject: Re: HarfBuzz API design
Date: Wed, 19 Aug 2009 15:53:19 -0400

On 08/19/2009 02:45 PM, Carl Worth wrote:

Excerpts from Behdad Esfahbod's message of Tue Aug 18 16:23:50 -0700 2009:

[Warning: long email ahead]


My reply might bounce to unsubscribed-from lists. Feel free to forward
if you think there's anything important in here for others to read.


Thanks Carl.

_reference(), _destory(), and _get_reference_count().  At some point we may
want to add _[gs]et_user_data() also which is useful for language bindings.


That should be "destroy" not "destory", of course. I normally wouldn't
point out a typo in an email message, but I did see the same error in
a comment that might be copied from actual code:


Ouch.  Fixed the typo.

/* calls destory() when not needing user_data anymore */
hb_face_t *
hb_face_create_for_tables (hb_get_table_func_t  get_table,
                             hb_destroy_func_t    destroy,
                             void                *user_data);


So don't forget to spell-check your header files. :-)


Yeah, postponed that to when I add actual docs.

Humm, seems like s/writeable/writable/g is also needed. Stupid me. The geekinside me prefers writeable though.

typedef enum {
    HB_MEMORY_MODE_DUPLICATE,
    HB_MEMORY_MODE_READONLY,
    HB_MEMORY_MODE_WRITEABLE,
    HB_MEMORY_MODE_READONLY_NEVER_DUPLICATE,
    HB_MEMORY_MODE_READONLY_MAY_MAKE_WRITEABLE,
} hb_memory_mode_t;

...

    DUPLICATE: copy data right away and own it.

    READONLY: the data passed in can be kept for later use, but should not be
modified.  If modification is needed, the blob will duplicate the data lazily.

    WRITEABLE: data is writeable, use it freely.

    READONLY_NEVER_DUPLICATE: data is readonly and should never be duplicated.
   This disables operations needing write access to data.

    READONLY_MAY_MAKE_WRITEABLE: data is readonly but may be made writeable
using mprotect() or equivalent win32 calls.  It's up to the user to make sure
calling mprotect() or system-specific equivalents on the data is safe.  In
practice, that's never an issue on Linux and (according to Tor) on win32.


I don't think these names are quite right yet.

That's definitely one of the most arbitrary parts of the code, thanks forcatching! I do think there's a slight confusion though, which if I clarify(as the docs will eventually will), the values make more sense. Look at itthis way, the mode parameter describes the characteristics of the input dataonly. It's not a property of the blob itself, and for that reason there is nogetter for it.


Looking at it that way, these values are accurate:

>>     HB_MEMORY_MODE_DUPLICATE,
>>     HB_MEMORY_MODE_READONLY,
>>     HB_MEMORY_MODE_WRITEABLE,
>>     HB_MEMORY_MODE_READONLY_MAY_MAKE_WRITEABLE,

This one, not so much:

>>     HB_MEMORY_MODE_READONLY_NEVER_DUPLICATE,

As it also describe a behavior of the blob itself. That value however is forvery corner case uses (when you're just interested in knowing whether the datais sane and you want to avoid the cost of copying).

I think the thing that strikes me first as wrong is that if you create
a READONLY blob then it's still perfectly valid to write to it, (such
that you have to have a separate READONLY_NEVER_DUPLICATE for
"readonly---and I mean it).


More clear in the above context?

I also did some Hoffman encoding here. I initially had READONLY andREADONLY_MAY_DUPLICATE, but figured that READONLY_MAY_DUPLICATE is what I wantmost users to use, and one should think twice before using READONLY. So Igave the short name to the common one and made the corner case use a longer name.

One difficulty is that you're capturing separate notions here,
(whether the data buffer is writeable vs. whether the created blob
should be writeable).


Yes and no indeed.

And there's some missing orthogonality too. For example, if I have
data that I don't want the blob to reference, (hence DUPLICATE), might
I not still want to create a blob in which all operations requiring
write access are disabled? That combination is not possible to express
from what I can see.

Initially I had that design in mind, with separate writeable. But I figuredthe extra control is not needed, no.

So, here's pseudocode trying to capture the decision-tree for choosing
one of the modes currently:

if (the data cannot be referenced by the blob)
	DUPLICATE
else if (the data buffer can be written to)
	if (writing requires calling mprotect)
		READONLY_MAY_MAKE_WRITEABLE
	else
		WRITEABLE
else if (the resulting blob should be readonly)
	READONLY_NEVER_DUPLICATE
else
	READONLY;


Again, lets forget about READONLY_NEVER_DUPLICATE, my mental model for this was:

if (I don't own the data)
  DUPLICATE;
else if (I malloced it)
  WRITEABLE;
else if (it's mmapped)
  READONLY_MAY_MAKE_WRITEABLE;
else /* ie.  it's shared with someone else, I don't have perms to modify it */
  READONLY;

So perhaps what's actually desired here is a set of flags to express
each of those conditions independently? I don't have a good proposal
for what those flag names might be.

I found the flag approach even more confusing as one would then think about(and the docs have to explain) all different kind of combinations.

Although, having a name for the entire behavior is helpful, (rather
than just a conjunction of flags). So maybe we just need simpler names
to capture the behaviors of interest:

COPY: Data is immediately copied by the blob. Resulting blob supports
	write operations. [New name for DUPLICATE]

COPY_ON_WRITE: Data is referenced by the blob. Resulting blob support
	write operations which are implemented by lazily copying the
	data, (the original data is never modified). [New name for
	READONLY]

READ_WRITE: Data is referenced by the blob. Resulting blob supports
	write operations which will directly modify original data.
	[New name for WRITEABLE. Of the new names I'm proposing, this
	is the one I like the least.]

READONLY: Data is referenced by the blob. Resulting blob does not
	support write operations. [New name for READONLY_NEVER_DUPLICATE]

MPROTECT: Data is referenced by the blob. Resulting blob supports
	write operations which are implemented by first calling
	mprotect or equivalent. [New name for
	READONLY_MAY_MAKE_WRITEABLE. Obviously this isn't an ideal
	name since it references an operating-system-specific
	feature. Suggestions welcome.]

I wanted to keep MPROTECT out of the name. Again, please think about it againwithout the mode determining whether the blob will be writeable or not. Justas describing the access mode of the input data only. Do you think thecurrent enum makes sense that way?

I don't have much to say about the actual font-related parts of the
proposal. But I do think it's perhaps a bad idea to reuse the same
buffer for text input and glyph output. Is memory that tight? Might a
user not want to be able to query the text back out even after
shaping?

Memory is not tight, no. Note that this is only about what the input/outputto the hb_shape() function will be. Users already keep the original text insome location anyway, and the cluster mapping is what's used to map the glyphsback. All I'm doing here essentially is to let the user populate a hb-bufferand pass that to hb_shape(), instead of hb_shape() doing it.

The benefit of course is having one hb_shape() API, instead ofhb_shape_utf8(), hb_shape_utf16(), and hb_shape_utf32().


Makes more sense?

Thanks again,
behdad

-Carl

References:
- HarfBuzz API design
  - From: Behdad Esfahbod

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]