[gnome-software: 16/20] appstream: Comment on stemming approach for keyword searches
- From: Milan Crha <mcrha src gnome org>
- To: commits-list gnome org
- Cc:
- Subject: [gnome-software: 16/20] appstream: Comment on stemming approach for keyword searches
- Date: Mon, 11 Jul 2022 08:41:48 +0000 (UTC)
commit 3729968ee0629f98d5f338d64798ccb835ef15f3
Author: Philip Withnall <pwithnall endlessos org>
Date: Tue Jul 5 14:00:49 2022 +0100
appstream: Comment on stemming approach for keyword searches
The previous approach to tokenisation/stemming was for search queries
(`helper->tokens` for a `GS_PLUGIN_ACTION_SEARCH` job) to be tokenised
and stemmed at the time of building the job, using
`as_pool_build_search_tokens()`. This ensured that everything was
tokenised and stemmed consistently.
Unfortunately, some backends did their own tokenisation/stemming of the
data to match against, and this didn’t necessarily match what
`as_pool_build_search_tokens()` did, leading to broken searches (see
issue #1193).
The new approach is for the search keywords to be passed through
`GsAppQuery` unmodified, and for it to be the responsibility of plugins
to tokenise/stem them before doing the search matching. Plugins which
use custom tokenisation/stemming can apply that — other plugins can call
`as_pool_build_search_tokens()` themselves if they want. (Although in
reality, all plugins either don’t need tokenisation/stemming, or
implement it themselves. None of them currently seem to need to call
`as_pool_build_search_tokens()`.)
This commit doesn’t actually fix #1193 – the refactoring done in the
previous commits does – but this wraps up the issue.
Signed-off-by: Philip Withnall <pwithnall endlessos org>
Fixes: #1193
lib/gs-appstream.c | 2 ++
1 file changed, 2 insertions(+)
---
diff --git a/lib/gs-appstream.c b/lib/gs-appstream.c
index 730926284..d11a82ed9 100644
--- a/lib/gs-appstream.c
+++ b/lib/gs-appstream.c
@@ -1490,6 +1490,8 @@ gs_appstream_do_search (GsPlugin *plugin,
return TRUE;
}
+/* This tokenises and stems @values internally for comparison against the
+ * already-stemmed tokens in the libxmlb silo */
gboolean
gs_appstream_search (GsPlugin *plugin,
XbSilo *silo,
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]