[PATCH] Myers matcher output improvements



Hi,

I have noticed that raw output of Myers diff algorithm may be a bit confusing in some cases, most notably when highlighting inline changes. The issue may be easily reproduced and I am attaching a simple test case (see e.g. first g in gnomeglade) and a screenshot showing the results. The included patch tries to address the issue by adding a post-processing clean-up step and reducing the number of small matching chunks.
Comments and testing welcome.

Cheers,
Piotr
>From d6feec0227b3b3ae67c23aba6650edd26a64b0f7 Mon Sep 17 00:00:00 2001
From: Piotr Piastucki <the_leech users berlios de>
Date: Fri, 28 Jan 2011 10:57:05 +0100
Subject: [PATCH] Add post-processing cleanup algoritm to MyersSequenceMatcher
 While there may exist several optimal edit scripts some of them are
 more suitable for human use. The goal of a post-processing cleanup
 algorithm is to provide more meaningful results by reducing 'chaff' in
 diff algorithm output. This commit adds a simple cleanup algorithm
 that combines small commonalities together in order to yield more
 human-friendly yet optimal results.

---
 meld/matchers.py |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/meld/matchers.py b/meld/matchers.py
index 5393de0..9540b90 100644
--- a/meld/matchers.py
+++ b/meld/matchers.py
@@ -133,6 +133,32 @@ class MyersSequenceMatcher(difflib.SequenceMatcher):
                 b = b2
         return (a, b)
 
+    def postprocess(self):
+        mb = []
+        i = len(self.matching_blocks) - 2
+        mb.append(self.matching_blocks[i + 1])
+        while i >= 0:
+            cur_block = self.matching_blocks[i]
+            cur_a = cur_block[0]
+            cur_b = cur_block[1]
+            cur_len = cur_block[2]
+            i -= 1
+            while i >= 0:
+                prev_block = self.matching_blocks[i]
+                prev_a = prev_block[0]
+                prev_b = prev_block[1]
+                prev_len = prev_block[2]
+                if prev_b + prev_len == cur_b or prev_a + prev_len == cur_a:
+                    if self.a[cur_a - prev_len : cur_a] == self.b[cur_b - prev_len : cur_b]:
+                        cur_b -= prev_len
+                        cur_a -= prev_len
+                        cur_len += prev_len
+                        i -= 1
+                        continue
+                break
+            mb.insert(0, (cur_a,cur_b,cur_len))
+        self.matching_blocks = mb
+
     def build_matching_blocks(self, lastsnake, snakes):
         """
         Build list of matching blocks based on snakes taking into consideration all preprocessing
@@ -263,4 +289,5 @@ class MyersSequenceMatcher(difflib.SequenceMatcher):
                     lastsnake = node
                     break
         self.build_matching_blocks(lastsnake, snakes)
+        self.postprocess()
         yield 1
-- 
1.7.1

Attachment: diff_chaff.tar.gz
Description: GNU Zip compressed data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]