Skip to content

Remove ssr.paths configuration and replace with ssr.excludePathPatterns which excludes specific paths from SSR #4227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 12, 2025

Conversation

FrancescoMolinaro
Copy link
Contributor

@FrancescoMolinaro FrancescoMolinaro commented Apr 22, 2025

References

Fixes #4132
Fixes #4042

Description

Adapted filtering of SSR pages blocking the rendering on specific pages instead of allowing it only on some pages.

Instructions for Reviewers

Navigating on pages 500, 404 or 403, or any invalid URL I should see the correct error status in the request.
In general the response status should be always correct for pages rendered in SSR.

List of changes in this PR:

Removed ssr.paths configuration and replaced with ssr.excludePathPatterns.
Add patterns to exclude community and collection browse, global browse, global search, community list, and statistics and various administrative tools.

For example:

ssr:
    excludePathPatterns:
    # Excludes community browse pages/tabs
    - pattern: "^/communities/[a-f0-9-]{36}/browse(/.*)?$",
      flag: "i"
    # Excludes collection browse pages/tabs
    - pattern: "^/collections/[a-f0-9-]{36}/browse(/.*)?$"
      flag: "i"
    # Excludes site-wide browse-by pages
    - pattern: "^/browse/"
    # Excludes site-wide search
    - pattern: "^/search$"
    # Exclude community-list page
    - pattern: "^/community-list$"
    # Exclude various tools that are under the /admin path
    - pattern: "^/admin/"
    # Exclude processes
    - pattern: "^/processes/?"
    # Exclude QA Notifications
    - pattern: "^/notifications/"
    # Exclude all statistics pages (site-wide and for individual objects)
    - pattern: "^/statistics/?"
    # Exclude access-control tools
    - pattern: "^/access-control/"
    # Exclude health page
    - pattern: "^/health$"

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You do not need to complete this checklist prior creating your PR (draft PRs are always welcome).
However, reviewers may request that you complete any actions in this list if you have not done so. If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • My PR is created against the main branch of code (unless it is a backport or is fixing an issue specific to an older branch).
  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes ESLint validation using npm run lint
  • My PR doesn't introduce circular dependencies (verified via npm run check-circ-deps)
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • My PR aligns with Accessibility guidelines if it makes changes to the user interface.
  • My PR uses i18n (internationalization) keys instead of hardcoded English text, to allow for translations.
  • My PR includes details on how to test it. I've provided clear instructions to reviewers on how to successfully test this fix or feature.
  • If my PR includes new libraries/dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR includes new features or configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

@alanorth alanorth added component: SEO Search Engine Optimization high priority port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release port to dspace-8_x This PR needs to be ported to `dspace-8_x` branch for next bug-fix release labels Apr 22, 2025
@alanorth alanorth added this to the 9.0 milestone Apr 22, 2025
@alanorth
Copy link
Contributor

Thanks @FrancescoMolinaro. I tested this on DSpace 7.6.4-SNAPSHOT (patch applies cleanly) and saw that the 403, 404, and 500 error pages were returning the correct codes. I also tested a few other paths for existing and non-existing items (for example) to make sure the codes didn't change.

@tdonohue tdonohue changed the title [DURACOM-344] set error pages status in response Set error pages status code in SSR response Apr 22, 2025
@tdonohue tdonohue changed the title Set error pages status code in SSR response Set error pages status code in response Apr 22, 2025
@tdonohue tdonohue self-requested a review April 23, 2025 14:02
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrancescoMolinaro : The code looks good overall now, but my tests are not all successful. Here's what I'm seeing:

  1. Running this PR in production mode (npm run build:prod; npm run serve:ssr)
  2. Test http://localhost:4000/500 -> Returns 500 error code (SUCCESS)
  3. Test http://localhost:4000/403 -> Returns 403 error code (SUCCESS)
  4. Test http://localhost:4000/404 -> Returns 404 error code (SUCCESS)
  5. Test http://localhost:4000/invalid-url -> Shows the /404 page, but returns 200 OK (FAILURE)

So, this works for the pages that it's checking for. But, it doesn't seem to work in scenarios where a different path results in a 404 error. It also therefore might not work if a different path resulted in a 500 or 403 error.

Overall, this PR provides better behavior than the current main. But, it doesn't seem to fully fix the problem of returning a proper error code when an error occurs from the backend. Instead, it just makes sure that if you access these specific error pages directly in your browser that you'll get a proper error code.

@tdonohue tdonohue moved this to 🙋 Needs Reviewers Assigned in DSpace 9.0 Release Apr 23, 2025
@FrancescoMolinaro
Copy link
Contributor Author

Hi @tdonohue , thanks for the feedback!

I agree that there are some scenario where this solution might not work.
I digged a bit more into the problem and found out the issue stems from this change we made #3682 ; so I reverted my changes and adapted the code accordingly.
The problem with unrecognized urls, e.g. http://localhost:4000/invalid-url , is that we skip SSR completely and a blank document with a code of 200 is returned. when the redirect to the 404 page happens we are already in CSR therefore too late to change the status.

I don't think there is an easy solution to this, what comes to mind to make a bit more robust the handling of the requests' status while keeping some pages out of SSR, could be importing the APP_ROUTES in the server.ts and checking the starting of the request's path so we can set a 404 in case none of the app's paths matches the starting of the request's one.

This would still leave a lot of options uncovered, for instance home-invalid, would still result as a known route with a 200 code.
Anyway I would really appreaciate a second opinion on this as it seems to me that we have a wider issue.

@tdonohue
Copy link
Member

Assigning @artlowel as a secondary reviewer since he offered to help take a look at this today.

@tdonohue tdonohue requested a review from artlowel April 24, 2025 19:24
@AndreaBarbasso
Copy link
Contributor

@tdonohue, @artlowel, the same situation (using a 200 error code for something that is not explicitly - eg. /404, /500 - a redirect/error) can be seen here: #4042.
I think @FrancescoMolinaro is on the right track, we might want to re-think the solution applied here and use a black-list approach instead of a white-list approach.

@tdonohue
Copy link
Member

tdonohue commented Apr 28, 2025

@AndreaBarbasso and @FrancescoMolinaro : If we can find a way to solve this by changing our settings to exclude specific paths from SSR, instead of including only specific paths in SSR, then I'd be OK with that approach. That does seem like it may provide better behavior for unexpected errors / redirects.

If we want to go that route, I'd recommend renaming the existing ssr.paths setting in config.*.yml to make the new behavior clear. Perhaps calling it something like ssr.excludePaths or similar. So, something like this:

ssr:
  excludePaths: [ '/browse/', '/community-list', '/search/', '/statistics']

The challenge may be that we may need to also support regex values if we also want to exclude browse pages for communities & collections which appear under /communities/[uuid]/browse/ and /collections/[uuid]/browse/ respectively.

Whatever solution is created, we'll have to make sure to also backport it to 7.6.x and 8.x.

@FrancescoMolinaro
Copy link
Contributor Author

Hi @tdonohue , thanks for the feedback!

I agree that a blacklist approach would make the solution more stable for error codes and redirects, so I have refactored the implementation following your suggestion.
I still think there will be some edge cases where we can't show a correct error code, e.g. redirect to login page if a user access a restricted collection or community, but I believe this change will anyway reduce considerably the number of these cases.

I will look forward to your feedback and thanks again!

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrancescoMolinaro : I haven't had a chance to test this yet, but I had some feedback on how we could clean up the new code. It may make since to just have one excludePaths setting, if you agree. See more inline below.

@github-project-automation github-project-automation bot moved this from 🙋 Needs Reviewers Assigned to 👀 Under Review in DSpace 9.0 Release Apr 29, 2025
@FrancescoMolinaro
Copy link
Contributor Author

Hi @tdonohue , thanks for the feedback and sorry for the poor naming choice, I appreciated a lot the article you shared.

I have refactored the solution so to have a single parameter based on regexes as you proposed, I agree that this give a better flexibility and power in handling the various paths compositiions.

I will look forward to your next feedback and thanks again.

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrancescoMolinaro : Thanks for the updates. Overall, this now looks good to me. I only have minor suggestions inline below.

I'd also recommend we update the Title & Description of this PR, as it's changed completely from its original purpose. If possible, we may also want to remove the earlier commits in this PR (by rebasing the PR & squashing commits). But, if you aren't able to easily do that, I can always merge squash this (reducing the entire PR to a single commit).

(UPDATE: I've changed the PR title. But, I haven't yet updated the description)

Overall, though, I'm +1. I also tested this today and verified that this approach works. I can verify that the excluded paths are not triggering SSR. I can also verify that the error pages & invalid URLs are returning correct error codes again.

@tdonohue tdonohue changed the title Set error pages status code in response Remove "ssr.paths" configuration and replace with "ssr.excludePathRegexes" which excludes specific paths from SSR Apr 30, 2025
@tdonohue tdonohue changed the title Remove "ssr.paths" configuration and replace with "ssr.excludePathRegexes" which excludes specific paths from SSR Remove ssr.paths configuration and replace with ssr.excludePathRegexes which excludes specific paths from SSR Apr 30, 2025
@FrancescoMolinaro FrancescoMolinaro force-pushed the task/main/DURACOM-344 branch from 4af4306 to c399976 Compare May 2, 2025 09:31
@FrancescoMolinaro FrancescoMolinaro force-pushed the task/main/DURACOM-344 branch from c399976 to b1c5460 Compare May 2, 2025 09:37
@FrancescoMolinaro
Copy link
Contributor Author

FrancescoMolinaro commented May 2, 2025

Hi @tdonohue , thanks again for the feedback.

I did implement the last changes you requested and I have updated the description and squashed all the commits so that we now have a single one.
The PR should be now in a cleaner state but please let me know if there is something else that needs improvement.

@tdonohue tdonohue self-requested a review May 6, 2025 17:45
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrancescoMolinaro : Apologies for not getting back to this PR for a while. I re-tested it today, and it's mostly looking good & working well. However, I've run in a few more small bugs described inline below. Basically, we need to make a small update to the "/statistics" regex, and I'm also having issues with overriding the default settings in my config.prod.yml.

Beyond that, I'm still basically a +1. I just want to ensure we get these final issues fixed before merging.

@atarix83
Copy link
Contributor

atarix83 commented May 7, 2025

@FrancescoMolinaro
I'd exclude also the following paths:

  • /admin
  • /processes
  • /notifications
  • /health

@FrancescoMolinaro
Copy link
Contributor Author

Hi @tdonohue , I checked and actually YAML doesn't support RegEx so they are treated as strings.
I have improved the solution so that the patterns can be directly RegEx or strings, I have tested your YAML configuration and seems to work.
In addition I added to the patterns also the ones suggested by @atarix83 .

@tdonohue tdonohue changed the title Remove ssr.paths configuration and replace with ssr.excludePathRegexes which excludes specific paths from SSR Remove ssr.paths configuration and replace with ssr.excludePathPatterns which excludes specific paths from SSR May 7, 2025
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrancescoMolinaro : Thanks for the updates. The basics appear to work, but I'm still not able to properly override settings using my config.*.yml.

I'm testing in Production Mode (npm run build:prod; npm run serve:ssr), and I'm turning off Javascript in my browser and attempting to access various paths which should be excluded.

Here's what I'm trying:

Add the following YAML settings (this is just a very basic example of overriding the defaults and only using one pattern)

ssr:
  excludePathPatterns:
    - /^\/community-list$/

Now, attempt to access http://localhost:4000/community-list and it wrongly is served via SSR.

However, if I comment out those YAML settings and use the defaults, then http://localhost:4000/community-list is correctly blocked from SSR.

So, it appears overrides in YAML are not working properly still. The default settings do work, but I cannot figure out how to override defaults.

I've also noticed a few of the new patterns appear to be incorrect. I believe these all need to be corrected

# There's no "/admin" path, but lots of paths like "/admin/[subpath]"
/^\/admin\//
# This needs to match both "/processes" and "/processes/[id]"
/^\/processes\/?/
# There's no "/notifications" path, but a few paths like "/notifications/[subpath]"
/^\/notifications\//
# I gave the wrong pattern for this one, as we also need to match "/statistics"
/^\/statistics\/?/
# For completion, we may also want to add this as other admin tools use "/access-control/[subpath]".
/^\/access-control\//

(Please double check my regex patterns are correct. I think these are correct, but I also might have made a mistake. I wasn't able to easily test these because the YAML overrides are not working.)

Finally, one note inline below.

@philipprumpf
Copy link
Contributor

Some more details about #4227 (comment):

The example config.yml contains this:

  excludePathPatterns:
    - /^\/communities\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\/browse(\/.*)?$/i

I think this pattern doesn't work, which means we would incorrectly attempt SSR for this path.

We're in a yml file, and yml does not do anything special for regexp syntax: the entire string is treated as an unquoted yml string and written to config.json as:

    "excludePathPatterns": [
      "/^\\/communities\\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\\/browse(\\/.*)?$/i",

The string is then passed to isExcludedFromSsr:

function isExcludedFromSsr(path: string, excludePathPattern: (string | RegExp)[]): boolean {
  return excludePathPattern.some((pattern) => {
    const regex = new RegExp(pattern);
    return regex.test(path)
  });
}

This calls the RegExp constructor, effectively, as:

    const regex = new RegExp("/^\\/communities\\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\\/browse(\\/.*)?$/i");

This results in the RegExp object printed as:

    /\/^\/communities\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\/browse(\/.*)?$\/i/

This is an impossible RegExp which does not match any string, so we'll incorrectly attempt to use SSR for this page.

Unfortunately, the fix isn't quite trivial: JavaScript does not provide a good way of parsing a RegExp specified as "/.../i". Traditionally, one might use eval, but that is obviously no longer an option.

  1. We could rewrite the regular expressions not to use extra slashes or the "i" flag. I'm not even sure whether uppercase hex digits ought to be allowed in the UUID: it's unusual to write it like that. On the other hand, it's quite conceivable the UUID is obtained, in some use case, by rewriting a DOI, and DOIs are case-insensitive and usually normalize to the capitalized variant. In any case, using [0-9a-fA-F] in the regexp would allow us to omit the "i" flag.

  2. We could parse the slash-pattern-slash-flags syntax. Either find a library which does this (https://www.npmjs.com/package/regex-parser), or roll our own parser like this:

    if (typeof pattern === "string") {
      const m = pattern.match(/^\/(.*)\/([dgimsuvy]*)$/);
      if (m) {
        pattern = new RegExp(m[1], m[2]);
      } else {
        pattern = new RegExp(pattern);
      }
    }

This seems simple enough, and it would make things work for regular expressions with or without slashes.

@tdonohue
Copy link
Member

@philipprumpf : Thanks for digging in here & figuring out why this PR is no longer working. I see similar behavior.

@FrancescoMolinaro : Could you update this PR to no longer use the /i regex (I prefer @philipprumpf 's first option listed above), and see if you can solve the issues I've mentioned above about overriding the default values? We will need to move quickly on this PR if we're going to get it merged for 9.0 (merger deadline is Friday)

@FrancescoMolinaro
Copy link
Contributor Author

Hi @tdonohue, @philipprumpf , thanks for the feedback and sorry for the delay, I wasn't working the past few days.

I have refactored the solution following the suggestions of @philipprumpf to prevent the slashes from the strings in the YAML and adapted a bit the logic so that we can pass the flag as an additional parameter, therefore adding a bit more flexibility in building the RegExp.

After running some tests it seem working to me, I tried accessing every path that should be covered but it would be great if you could double check them.

Regarding the issue with YAML override I believe it was connected with the fact that the patterns would have a different value when passed from the YAML, I tried the patterns you suggested and they seem correct and working when applied from the YAML.

I will look forward to your next feedback and thanks again.

@tdonohue tdonohue self-requested a review May 12, 2025 17:27
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks @FrancescoMolinaro ! This looks good to me now. I've retested today, and all the prior issues are fixed. The error codes are working properly now for the /404, /403 and /500 error pages (and any invalid URLs). I've also verified that overriding the default settings in your config.*.yml now works properly.

@github-project-automation github-project-automation bot moved this from 👀 Under Review to 👍 Reviewer Approved in DSpace 9.0 Release May 12, 2025
@tdonohue tdonohue merged commit 6a138ec into DSpace:main May 12, 2025
15 checks passed
@github-project-automation github-project-automation bot moved this from 👍 Reviewer Approved to ✅ Done in DSpace 9.0 Release May 12, 2025
@dspace-bot
Copy link
Contributor

Backport failed for dspace-7_x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin dspace-7_x
git worktree add -d .worktree/backport-4227-to-dspace-7_x origin/dspace-7_x
cd .worktree/backport-4227-to-dspace-7_x
git switch --create backport-4227-to-dspace-7_x
git cherry-pick -x b1c5460bbba131d4f430f7d415000bc9a9809546 4c9638150ab994834ec6e732dc0045fad5166127 c442d355057adc6b6bcf27dde9138019c9e15d55

@dspace-bot
Copy link
Contributor

@tdonohue
Copy link
Member

@FrancescoMolinaro : Could you manually backport this to dspace-7_x please? It was only able to be automatically backported to 8.x.

@FrancescoMolinaro
Copy link
Contributor Author

Hi @tdonohue , I have opened a PR for the backporting towards dspace-7_x here #4332.

@tdonohue tdonohue removed the port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release label May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: SEO Search Engine Optimization high priority
Projects
Status: ✅ Done
7 participants