Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching for links fails for domain parts which aren't the first (sub-)domain #6686

Open
2 tasks done
t-nil opened this issue Nov 25, 2023 · 7 comments
Open
2 tasks done
Labels

Comments

@t-nil
Copy link

t-nil commented Nov 25, 2023

  • I have searched open and closed issues for duplicates
    • (searched for 'search link' and looked at first couple pages)
  • I am using Signal-Desktop as provided by the Signal team, not a 3rd-party package.

Bug Description

When searching for parts of a link url domain that isn't the first domain part, Signal Desktop reports no matches although there clearly are. This happens in both global and per-chat search. Parts of the link which don't belong to the domain seem unaffected.

Steps to Reproduce

  1. Send a recognized link (with or without preview)
  2. Search for second part (or later) of the link in either global or per-chat search

Actual Result:

The link is not reported in matches.

Expected Result:

The link is reported in matches.

Screenshots

image
image
image

-> Works


image
image
image
image

-> Doesn't work

Platform Info

Disclaimer: I am using the official manjaro/arch package (no AUR as far as I can see), but a friend has confirmed on Mint with the official repo.

Signal Version: 6.39.0

Operating System: Linux ra1n-desktop-manjaro 6.6.2-1-MANJARO #1 SMP PREEMPT_DYNAMIC Mon Nov 20 13:00:47 UTC 2023 x86_64 GNU/Linux

Linked Device Version: 6.41.2, Android

Debug log: https://debuglogs.org/desktop/6.39.0/783b3986d4d1010ad5af9bd5946ba13beeca79041d89d1a98079191ee089500e.gz

@t-nil
Copy link
Author

t-nil commented Nov 25, 2023

Update: It seems to go even deeper. If I send a link https://example.com/foo_bar, foo works, but bar and foo_bar don't. Something seems up with word splitting? Whitespace splitting? IDK how Signal search works.

Update 2: highlighting seems to be unaffected. In the message

If I send a link https://example.com/foo_bar, foo works, but bar and foo_bar don't.

when I search for foo_bar, the URL part gets highlighted, presumably because foo and bar also match. But If I only send the URL, there is no match displayed in the sidebar.

image

@togamid
Copy link

togamid commented Nov 25, 2023

The same bug is affecting me.
Signal Desktop Version: 6.30.2
OS: Linux Mint 21 Cinnamon

@norstbox
Copy link
Contributor

Something seems up with word splitting? Whitespace splitting? IDK how Signal search works.

Yes, you can see additional details and conclusion from the Signal developer in #6460

@indutny-signal it's worth noting that according to design decision, message containing https://pad.stuve.fau.de/ should be found when searching for pad.stuve, but it doesn't. Simply put, it is impossible to find www.example.com in Signal, even by searching for www.example.com.

@scottnonnenberg-signal
Copy link
Contributor

Related: #5964

@togamid
Copy link

togamid commented Nov 29, 2023

In my opinion, the bug might be caused because somewhere the search string is split at the point, converting "e. g." to "e g" and "www.example.com" to "www example com". However, because of the custom FTS5 Extension mentioned in #6460 (comment) , "www.example.com." isn't actually split in the message during the search. Thus, when searching for "www.example.com" it is looking for messages containing "www% AND example% AND com%" which doesn't match the message "www.example.com"

This is also consistent with this comment: #6460 (comment) and with the related issue that was just linked.

@indutny-signal
Copy link
Contributor

Thank you for this information. On a second look there is indeed a mismatch between the tokenizer and the query parsing. We will look into it.

@jumper444
Copy link

jumper444 commented Jul 26, 2024

Had a link in one of my chats to a simple CNN story:
https://www.cnn.com/2024/07/10/tech/samsung-z-fold-flip-6/index.html

i typed 'cnn' in the search box to find it and nothing came up.
"hmm...that's weird"
Came here and found this bug (and #6460).
(6460 said it was desired/planned behavior due to Unicode spec/processing/etc. I didn't understand that commentary.)

Just mentioning still happening: Signal 7.17.0 (win10,64bit)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

6 participants