-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unfound links #8
Comments
Hey again, I have investigated a bit and found the source of the problem, inside page.ml : let tag_selector tag = function
| "" -> tag
| s when s.[0]='*' -> s
| s when is_identifier_char s.[0] -> s
| s -> tag^s
(* ... *)
let links_with selector p =
p.soup
|> Soup.select (tag_selector "a" selector)
|> Soup.filter (tag_filter "a")
|> seq_from_nodes (fun elt -> Link.( {resolver = resolver p; elt} )) I see the use-case that was in mind when developing this was feeding links_with with a css attribute selector (aka surrounded by "[]"). Unfortunately, it does not work at all when we want to specify that the links we are looking for are descendants of elements with class A workaround is prefixing the css selector we give to links_with (and other _with functions) with "html ", as an example. To fix this issue, I suggest either:
| s when s.[0]='#' -> s
| s when s.[0]='.' -> s
I have a preference for the second solution because it is more predictable for users, and easy to fix. The first is even simpler, but maybe not exhaustive (now or in the future). Please tell me what you prefer so that I can make a PR accordingly ;) |
Hi @RuyBlast, thanks for the investigation! I believe the second solution is unfortunately backward incompatible. So, for now, I would vouch for the first one, which is simple and doesn't change current behavior. This lets us time to think about the second one 🙂 (maybe we can just add an alternative function to do so, |
Hi, I had troubles with unfound links so I built a minimal example to reproduce the bug:
ancors.ml
dune
My version of Mechamel is 1.2.1.
I do not know what is the problem, maybe relative links ? I hope I did not make a trivial usage mistake of the library though. As a supplementary information, if I try to select all links of this page, I get about a list of length ~500.
Best,
The text was updated successfully, but these errors were encountered: