[gnome-software: 16/20] appstream: Comment on stemming approach for keyword searches




commit 3729968ee0629f98d5f338d64798ccb835ef15f3
Author: Philip Withnall <pwithnall endlessos org>
Date:   Tue Jul 5 14:00:49 2022 +0100

    appstream: Comment on stemming approach for keyword searches
    
    The previous approach to tokenisation/stemming was for search queries
    (`helper->tokens` for a `GS_PLUGIN_ACTION_SEARCH` job) to be tokenised
    and stemmed at the time of building the job, using
    `as_pool_build_search_tokens()`. This ensured that everything was
    tokenised and stemmed consistently.
    
    Unfortunately, some backends did their own tokenisation/stemming of the
    data to match against, and this didn’t necessarily match what
    `as_pool_build_search_tokens()` did, leading to broken searches (see
    issue #1193).
    
    The new approach is for the search keywords to be passed through
    `GsAppQuery` unmodified, and for it to be the responsibility of plugins
    to tokenise/stem them before doing the search matching. Plugins which
    use custom tokenisation/stemming can apply that — other plugins can call
    `as_pool_build_search_tokens()` themselves if they want. (Although in
    reality, all plugins either don’t need tokenisation/stemming, or
    implement it themselves. None of them currently seem to need to call
    `as_pool_build_search_tokens()`.)
    
    This commit doesn’t actually fix #1193 – the refactoring done in the
    previous commits does – but this wraps up the issue.
    
    Signed-off-by: Philip Withnall <pwithnall endlessos org>
    
    Fixes: #1193

 lib/gs-appstream.c | 2 ++
 1 file changed, 2 insertions(+)
---
diff --git a/lib/gs-appstream.c b/lib/gs-appstream.c
index 730926284..d11a82ed9 100644
--- a/lib/gs-appstream.c
+++ b/lib/gs-appstream.c
@@ -1490,6 +1490,8 @@ gs_appstream_do_search (GsPlugin *plugin,
        return TRUE;
 }
 
+/* This tokenises and stems @values internally for comparison against the
+ * already-stemmed tokens in the libxmlb silo */
 gboolean
 gs_appstream_search (GsPlugin *plugin,
                     XbSilo *silo,


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]