Re: GVariant for prez!



On Sun, 2009-04-12 at 14:55 -0400, Havoc Pennington wrote:
> Hi,
> 
> On Sun, Apr 12, 2009 at 9:14 AM, Christian Dywan <christian imendio com> wrote:
> > You are asserting that something like a "gint" or "guint" is not
> > something that can be saved to disk.
> 
> I'm not saying that; I'm saying they can only be saved to disk by
> converting them to a fixed-size integer.
> 
> "int" is a bad example because for all machines glib runs on, int is
> equal to int32.
> 
> But let's look at "long"; "long" means different things on different
> machines. This makes it totally broken to save "long" to disk or push
> it over the network. It's OK to autoconvert "long" to "int32" ... but
> there's no meaningful "long" type once we're serialized, imo.
> 
> > That said, it is all too common, for better or worse, to work with
> > numbers of unknown physical size, including storage to disk.
> 
> It's impossible to do this, imo. Once you save to disk the physical
> size is "locked in"
> 
> Yes you can write a super-broken program that saves to disk at a
> different size depending on the cpu it's running on, but that
> super-broken program is still going to be picking either int32 or
> int64 when it goes to save the long.
> 
> Basically, "typedefs" don't mean anything once you're serialized. A
> serialized format could support time_t, size_t, long, ssize_t, pid_t,
> etc. etc. (this goes on forever); or it could support int32, int64,
> uint32, uint64 and that's it. I would advocate the second; have the
> type system of a serialized format describe the actual binary data
> types, not every way those binary data types can be interpreted or
> mapped into language types.
> 
> This is a pretty fundamental decision, I went into more in my other
> mail, with two slightly different angles on it or ways to frame it.
> 
> a) union of all type systems (GType, python, JavaScript, Java, etc.)
> or rough intersection of primitive types they have in common
> b) round-trip full "native" (GType, python, JS, Java, etc.) type
> knowledge through the serialization, or preserve the binary storage
> format only
> 
> For disk formats and ipc formats, I think intersection / binary format
> only is the right way to lean, rather than
> union/full-type-annotation-knowledge.

What about a format like this (the numbers are somewhat arbitrary, they
may need to be adjusted for performance in the particular use case):

Blocks of Data:
1st bit tells if there is a label
if 1st bit is 1:
	next byte gives us the length of the label, L = ceiling(log[2](length
in bytes)) (which gives us up to up to 256 characters, so we can have
quite large namespaces, but without wasting space)
	the next L bytes contain the label (in some chosen encoding, but that's
a different discussion)
	the next bit tells if the label is continued, if it is the next byte is
again the length (so we can have labels as long as we want... maybe this
isn't necessary, but it's only one bit). rinse, repeat

once the label is finished (or the first bit of the data section is 0),
	the next byte gives the length of the data, again, L = ceiling(log[2]
(length of data in bytes))
	the next L bytes are the data
	the next bit after the data is the "continue" bit

Once the data is finished, we start the next block.


Perhaps we would want to steal one bit from the length field for the
"continue" bit.  It cuts the possible size of our data/label in half,
but keeps us lined up on bytes.

Maybe there's something fundamentally wrong with this for this
particular use case, but it would allow the application and library
developers a lot of flexibility in how they want to store the data;
whether they wanted to tie the data to types/names/tag it somehow.  With
this, structs without pointers could be pretty trivially serialized.
Even the code to serialize/deserialize structs with pointers doesn't
seem like it would be too bad... all the pointers could be stored
separately and tagged as such, and simply point to the position in the
file/datastream.  Of course, we would have calculate the size of the
pointers, but that would be a function of the size of the nonpointer
data and the total number of pointers, which shouldn't be too hard to
calculate.


Like I said, maybe I'm way off in left field... I honestly don't really
know what exactly the use cases for this are.  But I figured I'd throw
my 2 cents into the ring.


-Larry
<larry yrral net>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]