Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blacklist certain subdomains from matching URL classes that otherwise allow subdomains to match #1623

Open
roachcord3 opened this issue Nov 10, 2024 · 6 comments

Comments

@roachcord3
Copy link
Contributor

Fanbox is a good example of this. They allow you to go to https://username.fanbox.cc or https://www.fanbox.cc/@username to get to the same page. Their API is hosted at https://api.fanbox.cc. This means that a URL class for fanbox.cc that matches subdomains will match api.fanbox.cc, which is not desirable, since that's not the gallery URL for an artist. This can lead to situations where it's hard to define a URL class that contains a redirect to a different subdomain of the same domain, since the class will match its own redirect URL.

So, what I'm requesting is either a subdomain blacklist, or, alternatively, allowing users to use regex in the domain part of the URL class (so that they can define something like (?!api\.)([a-z0-9][a-z0-9\-]*\.)?fanbox.cc.

@floogulinc
Copy link
Collaborator

More specific url classes should already take precedence over less specific ones. It should be the case that a url class with domain specifically api.fanbox.cc will be matched before one for fanbox.cc.

@roachcord3
Copy link
Contributor Author

@floogulinc that can be true in some cases, but not if the path components differ, which is actually the case as I'm trying to fix the latest community fanbox downloader (which is lacking gallery URLs for artists' pages and has a few other problems.) The API URLs have a required path component, while the gallery URLs don't (adding /posts is optional.) I ran into this error message about matching the URL's own redirect, and had to define the URL classes in an unnatural way, as a result.

I also think it would be beneficial for any other hypothetical site in which there are plenty of reserved subdomains for things like the API or settings pages or whatever, but they still do usernames-as-subdomains. It would be weird to have to define discrete URL classes for all the things you don't want. Of course, either way, you have to say what you don't want, but I think keeping it coupled with the main URL class makes sense both from a cleanliness perspective and for the principle of least surprise. And I guess to make it a bit harder to forget to include the URL classes if you export the downloader, too.

With that said, since there are workarounds for many cases, I wouldn't call this an urgent feature request by any means. It would just be nice to not have to define janky extra URL classes.

@floogulinc
Copy link
Collaborator

If it's not properly prioritizing the one with a defined subdomain I'd consider that a bug then

@roachcord3
Copy link
Contributor Author

@floogulinc is it? I thought required path components would make it not match.

I'll be more clear in the specifics of the problem here. See, I wanted to add a gallery URL class for a fanbox artist page. Ideally, I wanted it to be just one URL class, but I had to make two, one for the username-as-subdomain and one for the username-in-path pattern. So, here's what ended up being awkward about the username-as-subdomain URL class: I had to specify that it had a path component that matched ^(posts)?$, rather than just allowing extra path components with the option checkbox. See here:

edit_url_class
edit_url_class-2
edit_conversion

This ensured it did not conflict with the community downloader's URL class for an artist page via the API (shown below,) nor with its own API redirect URL.

edit_url_class-3

Maybe it was just a skill issue on my part, but it took me longer than I'd like to finagle the part about matching its own API redirect URL. I didn't really have a problem with avoiding collision with the existing artist page via API URL class. And I assume that even if I allowed extra path components, the post page URL classes (shown below) would take precedence due to their higher specificity. (I haven't been able to test, of course, because I'm not able to define the URL classes the way I want.)

edit_url_class-4
edit_url_class-5

All told, I don't think there's a bug here, but it's possible that I'm just misunderstanding something; what do you think?

@floogulinc
Copy link
Collaborator

floogulinc commented Nov 12, 2024

I don't think this is an issue but why do you have "should subdomains also match this class" enabled for the api.fanbox.cc class?

Also the url class for posts that has the username in the url probably shouldnt have subdomains checked either...

@roachcord3
Copy link
Contributor Author

Both of those are unchanged from upstream, but you're right, it doesn't make sense, so I'll fix that.

Zweibach pushed a commit to CuddleBear92/Hydrus-Presets-and-Scripts that referenced this issue Nov 21, 2024
Also includes an much more helpful README with tips about how to deal
with Cloudflare.

The URL classes are a bit awkward, but if
hydrusnetwork/hydrus#1623 is ever resolved, we
can consolidate the URL classes with `(subdomain)` in the names into the
main URL classes for posts/artist pages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants