[meld/meld-3-16] misc: Avoid string copies during filtering (bgo#768300)
- From: Kai Willadsen <kaiw src gnome org>
- To: commits-list gnome org
- Cc:
- Subject: [meld/meld-3-16] misc: Avoid string copies during filtering (bgo#768300)
- Date: Sat, 2 Jul 2016 00:38:01 +0000 (UTC)
commit de6061ad7aab445812636447e52258eec5d3dc77
Author: Kai Willadsen <kai willadsen gmail com>
Date: Sat Jul 2 10:20:56 2016 +1000
misc: Avoid string copies during filtering (bgo#768300)
When we switched over to doing better regex filtering and highlighting
of ignored regions, we changed the way we were applying filters from a
simple multiple-regex approach to a merged-span based approach. This is
fine, except that this also changed the way we sliced the existing text
to produce the filtered version.
Prior to this commit, we removed matching filtered text by
concatenating two string slices, which is extremely slow in Python due
to the overhead of string allocation, among other things. With this
patch, we use a more idiomatic approach of grabbing all of the text
sections that we care about and concatenating them in a single join
operation at the end.
The test case in bgo#768300 was previously extremely slow (I gave up
waiting), but with this change takes a few seconds.
This commit also switches up the role of the "cutter" function, which
now only applies changes rather than expecting to modify the text. Text
modification is carried out by apply_text_filters itself, since it can
do so much more efficiently.
meld/filediff.py | 7 ++-----
meld/misc.py | 22 ++++++++++++++--------
2 files changed, 16 insertions(+), 13 deletions(-)
---
diff --git a/meld/filediff.py b/meld/filediff.py
index b1e3c59..e44049f 100644
--- a/meld/filediff.py
+++ b/meld/filediff.py
@@ -755,19 +755,16 @@ class FileDiff(melddoc.MeldDoc, gnomeglade.Component):
dimmed_tag = buf.get_tag_table().lookup("dimmed")
buf.remove_tag(dimmed_tag, txt_start_iter, txt_end_iter)
- def cutter(txt, start, end):
- assert txt[start:end].count("\n") == 0
- txt = txt[:start] + txt[end:]
+ def highlighter(start, end):
start_iter = txt_start_iter.copy()
start_iter.forward_chars(start)
end_iter = txt_start_iter.copy()
end_iter.forward_chars(end)
buf.apply_tag(dimmed_tag, start_iter, end_iter)
- return txt
try:
regexes = [f.filter for f in self.text_filters if f.active]
- txt = misc.apply_text_filters(txt, regexes, cutter)
+ txt = misc.apply_text_filters(txt, regexes, apply_fn=highlighter)
except AssertionError:
if not self.warned_bad_comparison:
misc.error_dialog(
diff --git a/meld/misc.py b/meld/misc.py
index 60b5eef..aea2b3e 100644
--- a/meld/misc.py
+++ b/meld/misc.py
@@ -503,15 +503,13 @@ def merge_intervals(interval_list):
return merged_intervals
-def apply_text_filters(txt, regexes, cutter=lambda txt, start, end:
- txt[:start] + txt[end:]):
+def apply_text_filters(txt, regexes, apply_fn=None):
"""Apply text filters
Text filters "regexes", resolved as regular expressions are applied
to "txt".
- "cutter" defines the way how to apply them. Default is to just cut
- out the matches.
+ "apply_fn" is a callable run for each filtered interval
"""
filter_ranges = []
for r in regexes:
@@ -533,7 +531,15 @@ def apply_text_filters(txt, regexes, cutter=lambda txt, start, end:
filter_ranges = merge_intervals(filter_ranges)
- for (start, end) in reversed(filter_ranges):
- txt = cutter(txt, start, end)
-
- return txt
+ if apply_fn:
+ for (start, end) in reversed(filter_ranges):
+ apply_fn(start, end)
+
+ offset = 0
+ result_txts = []
+ for (start, end) in filter_ranges:
+ assert txt[start:end].count("\n") == 0
+ result_txts.append(txt[offset:start])
+ offset = end
+ result_txts.append(txt[offset:])
+ return "".join(result_txts)
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]