Skip to content

Commit

Permalink
Fix and improve sorting full matches first under some conditions.
Browse files Browse the repository at this point in the history
- Let `prescient-filter-regexps` return flat list of regexps.

  We need to be able to test the individual regexps produced by the
  filter methods to see if they fully match. If we use the `or`-ed
  combined regexps, which we use for filtering, then it's possible for
  Emacs to only find a matching substring using a partially matching
  sub-regexp before testing the fully matching sub-regexp, depending
  on the order in which the sub-regexps are `or`-ed together.

- Change the property list to use normal keys.

- Change `prescient-regexps` to `:prescient-match-regexps`.

- Add `:prescient-all-regexps`, which we need for proper sorting.

- Test the each regexp produced by a filter method separately.  Do not
  just re-use the regexps that were used for filtering.

- Fix `prefix` with `:with-group` when the first character of the
  query is a word character.  Make sure to not accidentally match the
  middle of a word.

- Tweak the `prefix` filter method to better work with
  `prescient-sort-full-matches-first` and avoid conflicts when trying
  to use an initialism.  Don't use greedy matching when there are no
  non-word characters.

  1. "re" only matches "re" at the start of the word "repeat", not the
     entire word.  Otherwise, we run into conflicts when we want "re"
     to be an initialism for a longer command name, such as
     "restart-emacs".

  2. ".g" fully matches ".git".

  3. "s-r" fully matches "string-rectangle", not just "string-r".
     This can't conflict with `initialism`, so there is no harm in
     assuming that the "r" is meant to match an entire word, just like
     the "s" does.
  • Loading branch information
okamsn committed Jul 15, 2023
1 parent f5e8a1f commit b43e334
Show file tree
Hide file tree
Showing 3 changed files with 162 additions and 106 deletions.
23 changes: 21 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,34 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog].

## Unreleased

### Internal Changes

* `prescient-filter` now only propertizes the first returned candidate
for use with `prescient-sort-full-matches-first` ([#148]). Custom
sorting functions using this data should be changed to search the
candidates for the properties, as in `prescient--get-sort-info`.

### Bugs fixed
* Fix highlighting of the first prefix in the `prefix` filter method
([#149]).

### Enhancements
* Improve filter situation when using `prefix`, `initialism`, and
`prescient-sort-full-matches-first` by tweaking `prefix` ([#149]).
* Previously, with the input "re", `prefix` would match all of the
string "repeat", which would sort it higher than "restart-emacs"
when attempting to use "re" as an initialism.
* Now, the `prefix` "re" only matches the "re" in "repeat", but does
not match the entire word. If there is no non-word character in
the subquery, `prefix` matching is now done non-greedily.
* Continuing without change, "str-re" still fully matches
"string-rectangle" and ".g" still fully matches ".gitignore".
The inclusion of a non-word character means that these inputs
couldn't be used as initialisms anyway, so there is no conflict
when matching greedily for these inputs.

[#148]: https://github.com/radian-software/prescient.el/pull/148
[#149]: https://github.com/radian-software/prescient.el/pull/149


## 6.1 (released 2022-12-16)
### New features
Expand Down
226 changes: 126 additions & 100 deletions prescient.el
Original file line number Diff line number Diff line change
Expand Up @@ -453,43 +453,37 @@ INPUT contains uppercase letters."
(not (string-match-p "[[:upper:]]" input)))
prescient-use-case-folding))

(defun prescient--add-sort-info (candidates regexps ignore-case)
(defun prescient--add-sort-info (candidates &rest properties)
"Propertize the first candidate in CANDIDATES to save data.
REGEXPS are the regexps used by filtering. IGNORE-CASE is whether
case was ignored. These are stored as the text property
`prescient-regexps' and `prescient-ignore-case', respectively.
Currently recognized PROPERTIES are:
- `:prescient-match-regexps': The regexps used for filtering.
- `:prescient-all-regexps': All regexps outputted by the filter
methods, which were used to make PRESCIENT-MATCH-REGEXPS. A
regexp in this list does not necessarily match the candidates
by itself.
- `:prescient-ignore-case': Whether prescient ignored case.
This information is used by the function
`prescient-sort-full-matches-first'."
(if (null candidates)
nil
;; Some UIs might rearrange candidates before we can sort them.
;; For example, Company will sort CAPF candidates that don't have
;; a specified sorting function, which moves them around before
;; passing them to the Company Prescient transformer that applies
;; our own sorting.
(cons (propertize (car candidates)
'prescient-regexps regexps
'prescient-ignore-case ignore-case)
(cons (apply #'propertize (car candidates) properties)
(cdr candidates))))

(defun prescient--get-sort-info (candidates)
"Return a property list of properties added by `prescient-filter'.
`prescient-filter' adds properties to the CANDIDATES that it
filtered for use by the function `prescient-sort-full-matches-first'.
If the properties aren't found, such as when the candidates were
filtered using a different completion method, then the values in
the property list are nil."
filtered for use by the function `prescient-sort-full-matches-first'."
(cl-loop for cand in candidates
until (get-text-property 0 'prescient-regexps cand)
finally return
`(prescient-regexps
,(get-text-property 0 'prescient-regexps cand)
prescient-ignore-case
,(get-text-property 0 'prescient-ignore-case cand))))
until (get-text-property 0 :prescient-match-regexps cand)
;; Since we allow other keys in `prescient--add-sort-info',
;; just return all properties here.
finally return (text-properties-at 0 cand)))

(defun prescient--highlight-matches (input candidates)
"According to INPUT, highlight the matched sections in CANDIDATES.
Expand Down Expand Up @@ -644,70 +638,95 @@ data can be used to highlight the matched substrings."
"Return a regexp for matching the beginnings of words in QUERY.
This is similar to the `partial-completion' completion style
provided by Emacs, except that non-word characters are taken
literally \(i.e., one can't glob using \"*\"). Prescient already
literally (i.e., one can't glob using \"*\"). Prescient already
covers that case by separating queries with a space.
If QUERY contains non-word characters, then this matches
greedily. Otherwise, it matches non-greedily. For example,
- \"str-r\" fully matches \"string-repeat\"
- \"re\" does not fully match the word \"repertoire\", only the
\"re\" at the beginning of the word
- \".g\" fully matches \".git\"
This behavior is meant to work better with the function
`prescient-sort-full-matches-first' and to avoid interpreting an
initialism as a prefix.
If WITH-GROUP is non-nil, enclose the parts of the regexp that
match the QUERY characters in capture groups, so that the match
data can be used to highlight the matched substrings."
(let ((str (replace-regexp-in-string
"[[:word:]]+"
;; Choose whether to wrap sequences of word characters.
(if with-group
(lambda (s) (concat "\\(" s "\\)[[:word:]]*"))
"\\&[[:word:]]*")
;; Quote non-word characters so that they're taken
;; literally.
(replace-regexp-in-string "[^[:word:]]"
(lambda (s) (regexp-quote s))
query 'fixed-case 'literal)
'fixed-case with-group)))
;; If regexp begins with a word character, make sure regexp
;; doesn't start matching in the middle of a word.
(if (eql 0 (string-match-p "[[:word:]]" str))
(concat "\\<" str)
str)))
(concat (when (eql 0 (string-match-p "[[:word:]]" query))
;; If QUERY begins with a word character, make sure the
;; returned regexp doesn't start matching in the middle of
;; a word.
"\\<")
(replace-regexp-in-string
"[[:word:]]+"
;; Choose whether to wrap sequences of word characters.
(concat (if with-group
"\\\\(\\&\\\\)[[:word:]]*"
"\\&[[:word:]]*")
(unless (string-match-p "[^[:word:]]" query)
"?"))
;; Quote non-word characters so that they're taken
;; literally.
(replace-regexp-in-string "[^[:word:]]"
#'regexp-quote
query 'fixed-case 'literal)
'fixed-case)))

;;;; Sorting and filtering

(defun prescient-filter-regexps (query &optional with-group)
(defun prescient-filter-regexps (query &optional with-group separated)
"Convert QUERY to list of regexps.
Each regexp must match the candidate in order for a candidate to
match the QUERY.
If WITH-GROUP is non-nil, enclose the initials in initialisms
with capture groups. If it is the symbol `all', additionally
enclose literal substrings with capture groups."
(let ((subquery-number 0))
(mapcar
(lambda (subquery)
(prog1 (string-join
(cl-remove
nil
(mapcar
(lambda (method)
(if-let ((func (alist-get method prescient-filter-alist)))
(funcall func subquery
:with-group with-group
:subquery-number subquery-number)
;; Don't throw error if function doesn't exist, but do
;; warn user.
(message
"No function in `prescient-filter-alist' for method: %s"
method)))
(pcase
(if (functionp prescient-filter-method)
(funcall prescient-filter-method)
prescient-filter-method)
;; We support `literal+initialism' for backwards
;; compatibility.
(`literal+initialism '(literal initialism))
((and (pred listp) x) x)
(x (list x))))
:test #'eq)
"\\|")
(cl-incf subquery-number)))
(prescient-split-query query))))
enclose literal substrings with capture groups.
By default, this function returns a list containing a regexp for
each query that combines the regexps produced by the filter method
for each query. If SEPARATED is non-nil, this function instead
returns a list of all regexps returned by the filter methods,
without combining them."
(let ((list-of-lists
(cl-loop
with filter-methods
= (pcase (if (functionp prescient-filter-method)
(funcall prescient-filter-method)
prescient-filter-method)
;; We support `literal+initialism' for backwards
;; compatibility.
(`literal+initialism '(literal initialism))
((and (pred listp) x) x)
(x (list x)))
for subquery in (prescient-split-query query)
for subquery-number from 0
collect
(cl-loop with temp-regexp = nil
for method in filter-methods
for func = (alist-get method prescient-filter-alist)
if (null func)
do (message
"No function in `prescient-filter-alist' for method: %s"
method)
else
;; Can't use "for =" here.
do (setq temp-regexp
(funcall func subquery
:with-group with-group
:subquery-number subquery-number))
and if temp-regexp collect temp-regexp end
end))))
(if separated
(apply #'append list-of-lists)
(mapcar (lambda (list) (string-join list "\\|"))
list-of-lists))))

;;;###autoload
(defun prescient-filter (query candidates &optional pred)
Expand Down Expand Up @@ -740,28 +759,35 @@ copy of the list."
;; metadata, since a user might be able to configure their
;; completion UI with a custom sorting function that would use
;; this info.
(prescient--add-sort-info
(all-completions prefix candidates pred)
;; There is a question of how to handle prefixes for identifying
;; fully matched candidates. Prefixes are used in:
;;
;; - `completing-read-multiple', as the candidates that have
;; already been selected
;;
;; - file-name completion, as the directory preceeding the file
;; name, though this seems to only happen when there is no
;; match in some UIs (Icomplete, but not Selectrum)
;;
;; It might turn out that for file names we need to adjust the
;; regexps to be "\(?:QUOTED-PREFIX\)METHOD-REGEXP", but this
;; isn't evident yet. We just do the below to be proactive.
(if (and (not (string-empty-p prefix))
minibuffer-completing-file-name)
(cl-loop for regexp in completion-regexp-list
collect (concat "\\(?:" (regexp-quote prefix) "\\)"
regexp))
completion-regexp-list)
completion-ignore-case)))
;;
;; There is a question of how to handle prefixes for identifying
;; fully matched candidates. Prefixes are used in:
;;
;; - `completing-read-multiple', as the candidates that have
;; already been selected
;;
;; - file-name completion, as the directory preceeding the file
;; name, though this seems to only happen when there is no
;; match in some UIs (Icomplete, but not Selectrum)
;;
;; It might turn out that for file names we need to adjust the
;; regexps to be "\(?:QUOTED-PREFIX\)METHOD-REGEXP", but this
;; isn't evident yet. We just do the below to be proactive.
(cl-flet ((maybe-add-prefix (regexps)
(if (and (not (string-empty-p prefix))
minibuffer-completing-file-name)
(cl-loop for regexp in regexps
collect (concat "\\(?:"
(regexp-quote prefix)
"\\)"
regexp))
regexps)))
(prescient--add-sort-info
(all-completions prefix candidates pred)
:prescient-match-regexps completion-regexp-list
:prescient-all-regexps (maybe-add-prefix
(prescient-filter-regexps pattern nil t))
:prescient-ignore-case completion-ignore-case))))

(defmacro prescient--sort-compare ()
"Hack used to cause the byte-compiler to produce faster code.
Expand Down Expand Up @@ -840,9 +866,9 @@ REGEXPS."
and case-fold-search = ignore-case
for cand in candidates
;; By this point, after filtering, we know that a candidate
;; matches all regexps, so we don't have to check that. So
;; long as a candidate is fully matched by even one regexp,
;; we move it to the front of the list.
;; matches all of the filtering regexps, so we don't have to
;; check that. So long as a candidate is fully matched by
;; even one regexp, we move it to the front of the list.
;;
;; If a regexp didn't match, then we wouldn't be able to use
;; `match-beginning' and `match-end', as they wouldn't
Expand Down Expand Up @@ -938,8 +964,8 @@ the user option `prescient-sort-full-matches-first'."
(ignore-case))
(when prescient-sort-full-matches-first
(let ((props (prescient--get-sort-info candidates)))
(setq regexps (plist-get props 'prescient-regexps)
ignore-case (plist-get props 'prescient-ignore-case))))
(setq regexps (plist-get props :prescient-all-regexps)
ignore-case (plist-get props :prescient-ignore-case))))
(thread-first
candidates
(prescient-sort)
Expand Down
19 changes: 15 additions & 4 deletions test/prescient-test.el
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ for BINDINGS."
;; Simplify the regexes in the test cases by disabling char
;; folding
(prescient-use-char-folding nil))
(should (equal ,result (prescient-filter-regexps ,query ,with-group))))
(should (equal ,result (prescient-filter-regexps ,query ,with-group ,separated))))
(methods with-group separated query result)
'(literal) nil nil "foo" '("foo")
'(literal) nil nil "foo bar" '("foo" "bar")
Expand All @@ -122,9 +122,16 @@ for BINDINGS."
'(literal initialism) nil nil "hai" '("hai\\|\\bh\\w*\\W*\\ba\\w*\\W*\\bi\\w*")
'(literal initialism) t nil "hai" '("hai\\|\\b\\(h\\)\\w*\\W*\\b\\(a\\)\\w*\\W*\\b\\(i\\)\\w*")
'(literal initialism) 'all nil "hai" '("\\(hai\\)\\|\\b\\(h\\)\\w*\\W*\\b\\(a\\)\\w*\\W*\\b\\(i\\)\\w*")
;; '(literal initialism) nil t "hai" '("hai" "\\bh\\w*\\W*\\ba\\w*\\W*\\bi\\w*")
'(literal initialism) nil t "hai" '("hai" "\\bh\\w*\\W*\\ba\\w*\\W*\\bi\\w*")
'(literal regexp) nil nil "f.*d b[o]*t" '("f\\.\\*d\\|f.*d" "b\\[o]\\*t\\|b[o]*t")
;; '(literal regexp) nil t "f.*d b[o]*t" '("f\\.\\*d" "f.*d" "b\\[o]\\*t" "b[o]*t")
'(literal regexp) nil t "f.*d b[o]*t" '("f\\.\\*d" "f.*d" "b\\[o]\\*t" "b[o]*t")
'(prefix) nil nil "re" '("\\<re[[:word:]]*?")
'(prefix) t nil "re" '("\\<\\(re\\)[[:word:]]*?")
'(prefix) nil nil "r-e" '("\\<r[[:word:]]*-e[[:word:]]*")
'(prefix) t nil "r-e" '("\\<\\(r\\)[[:word:]]*-\\(e\\)[[:word:]]*")
'(prefix) nil nil "r.*e" '("\\<r[[:word:]]*\\.\\*e[[:word:]]*")
'(prefix) nil nil ".re" '("\\.re[[:word:]]*")
'(prefix) t nil ".re" '("\\.\\(re\\)[[:word:]]*")
)

(prescient-deftest prescient-sort-full-matches-first ()
Expand All @@ -136,5 +143,9 @@ for BINDINGS."
(candidates methods full-match-first query result)
'("rebar" "restart-emacs" "barracuda") '(literal initialism) nil "bar" '("rebar" "barracuda")
'("rebar" "restart-emacs" "barracuda") '(literal initialism) nil "re" '("rebar" "restart-emacs")
;; '("rebar" "restart-emacs" "barracuda") '(literal initialism) t "re" '("restart-emacs" "rebar")
'("rebar" "restart-emacs" "barracuda") '(literal initialism) t "re" '("restart-emacs" "rebar")
'("rebar" "restart-emacs" "barracuda") '(prefix initialism) t "re" '("restart-emacs" "rebar")
'("string-rectangle-word" "string-rectangle" "term-string-rectangle") '(prefix) t "s-r" '("string-rectangle" "string-rectangle-word" "term-string-rectangle")
'("test.el" ".elpaignore" "barracuda") '(prefix) t ".e" '(".elpaignore" "test.el")
'("test.el" ".elpaignore" "barracuda") '(prefix) nil ".e" '("test.el" ".elpaignore")
)

0 comments on commit b43e334

Please sign in to comment.