Re: UTF-16 support?



(Answering lots of things at once, and not in order.)

On 4 April 2013 00:06, Nick <nospam codesniffer com> wrote:
I think I found a solution.  It required 3 pieces:

1.  Set Meld's Encodings to:
       utf8, utf-16le, iso8859

2.  Set SVN's mime-type property on the UTF-16 files to
    text/plain;encoding=UTF-16LE.

3.  Placed a BOM in the UTF-16 files.

With this configuration I am able to view UTF-8 and UTF-16 files in Meld
without changing the configuration.  The files can be directly from the
filesystem (ie. meld file1 file2) or via the SVN hook within Meld.


In the process of experimenting on this (and I think contributing to the
problem), I think I found a bug in Meld.  It seems that once I attempt
to view/diff a file that's in SVN which fails, other files which
normally work also fail.  Here's a breakdown of the steps I observe this
happening (using Meld 1.7.0):

(1)  Open Meld for a directory inside a SVN working copy, which contains
3 files:  a.xml (a UTF-16LE file without a BOM), b.xml (a UTF-16LE file
with a BOM), c.txt (a UTF-8 file).
(2)  Set Meld's Encodings configuration to "utf8, utf-16le"
(3)  Open/View b.xml.  This should work.
(4)  Open/View c.txt.  This should work.
(5)  Attempt to open a.xml.  This should yield an error that the file is
binary (as expected).
(6)  Now attempt to open/view b.xml again.  It fails with the same
error.

The only way I've found to get it out of this stuck state is to refresh
the listing.

I can try creating a screen recording of this behavior if it helps.

A screen recording wouldn't really help - those are pretty clear
instructions - but if there's any way you could provide a SVN working
copy to reproduce the problem, then that would be great.

On Wed, 2013-04-03 at 09:41 -0400, Nick wrote:
Looks like if I change the order of the codecs such that utf16 is listed
first, then Meld displays the file fine.  But then I lose the ability to
view UTF-8 files.  So it seems like it's one or the other, but not both.

If this is true, I don't understand the purpose of being able to specify
more than one encoding in the Preferences dialog.

Can Meld support going through each specified encoding while the file is
not displayable (including the finding that it's a 'binary' file)?  This
will allow me to specify "utf8, utf16" for the encodings which will
support UTF-8 and UTF-16 files to be used in Meld w/out changing the
configuration.

That's exactly what we do... except that the binary file check is
unrelated to the rest. Having said that, reordering those really
shouldn't avoid the binary file check.

On Wed, 2013-04-03 at 08:48 -0400, Nick wrote:
Hi,

First and foremost, thanks for a great diff & merge tool!

My project involves XML files which need to be encoded in UTF-16 Little
Endian.  I cannot seem to view or diff UTF-16 files with Meld.

In the Encoding tab of the Preferences dialog I have this for the
codecs:

    utf8, iso8859, utf16, utf-16, utf16le, utf-16le

When I try to open a UTF-16LE file that's in SVN, Meld displays a yellow
error bar on top which reads, "Error fetching original comparison file".
I've confirmed UTF-8 files in the repo open fine--it's only an issue w/
UTF-16 files.

It behaves the same even for files which are marked for addition in the
repo but not yet added (so in this case, there's nothing to diff
against, but normally Meld will display the contents of the file
alongside a blank pane).

I've tried UTF-16 files that contain a BOM and files which do not; no
difference.

I notice that SVN sets the mime-type on these files as binary
(application/octet-stream).  If I manually change it to UTF-16LE
(text/plain;encoding=UTF-16LE), Meld displays a yellow error bar on top
which reads, "Could not read file" "test.xml appears to be a binary
file."--but it still doesn't display the contents of the file.

I had no idea the mime-type behaviour would be different... we
certainly don't do anything on the SVN end with regards to that. I
guess that's a possibly-interesting issue with the new SVN support.

What version of Subversion are you using here? We fetch files in very
different ways for <1.6 and 1.7.

If I call meld and pass it 2 UTF-16 files on the file system (ie. not
trying to open a file from the SVN listing), I still get a yellow error
bar on top which reports "Could not read file" "test.xml appears to be a
binary file."

Is there something else I need to do?

Has anyone used Meld to diff UTF-16 files?

No, and it's known not to work. In fact, it shouldn't be possible to
view UTF-16 files in Meld. Or at least, this is what I would have said
if I'd seen this email before I saw your follow-ups.

The problem is that in FileDiff._load_files, we check for null bytes
in the file we're reading in, and throw up our hands and declare a
file to be binary if there are any. This works shockingly well,
considering how wrong it is. Obviously it falls over pretty badly for
UTF16. What I'm actually more puzzled by is that you've somehow
managed to find a way around this!

Also, this is bug 632540:
    https://bugzilla.gnome.org/show_bug.cgi?id=632540

cheers,
Kai


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]