Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GitHub Action to Check for Dead Links in Documentation #433

Closed
wants to merge 28 commits into from

Conversation

Khushalsarode
Copy link
Contributor

Contributor checklist


Description

  • Installs the necessary dependencies from requirements.txt
  • Runs the Sphinx linkcheck builder to identify broken links.
  • The results are displayed, and if broken links are found, the action will fail.

Related issue

Copy link

github-actions bot commented Oct 18, 2024

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

  • The linting and formatting workflow within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

@andrewtavis andrewtavis added the hacktoberfest-accepted Accepted as a part of Hacktoberfest label Oct 18, 2024
@andrewtavis
Copy link
Member

I wondering why the workflow didn't run on this, @Khushalsarode 🤔

@Khushalsarode
Copy link
Contributor Author

I wondering why the workflow didn't run on this, @Khushalsarode 🤔

I was thinking the same.

@andrewtavis
Copy link
Member

Can you take a look at this and edit the on to be like:

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
    types: [opened, reopened, synchronize]

Then from there we can look into why it's not running :)

@Khushalsarode
Copy link
Contributor Author

Sure

@Khushalsarode
Copy link
Contributor Author

Can you take a look at this and edit the on to be like:

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
    types: [opened, reopened, synchronize]

Then from there we can look into why it's not running :)

Thanks for suggestion and help!
I did run but failed build with error and warnings!
can there be problem at docs side?
@andrewtavis

@andrewtavis
Copy link
Member

Ok so this is generally ok, @Khushalsarode, but then the error output from the check is kind of messy 🤔 Can you see if there's a way to clean it up a bit? Maybe capture the output and return only the broken links? The thing is that it's hard to find them throughout the entire output that also includes the ones that aren't broken.

@Khushalsarode
Copy link
Contributor Author

Khushalsarode commented Oct 20, 2024

@andrewtavis ok? It's running and showing as you asked!

@andrewtavis
Copy link
Member

Ideally at the end of this @Khushalsarode what we'd get is only those items that are broken because of Not Found for url, and then from there the workflow would fail as well. Can you make those adjustments, and maybe we can also do this for the markdown files as well, not just the docs?

@Khushalsarode
Copy link
Contributor Author

@andrewtavis

  • it will show only link broken because of Not Found for url and if found it will exit and mark it as fail flow.
    OK?

@Khushalsarode
Copy link
Contributor Author

Ideally at the end of this @Khushalsarode what we'd get is only those items that are broken because of Not Found for url, and then from there the workflow would fail as well. Can you make those adjustments, and maybe we can also do this for the markdown files as well, not just the docs?

if you would like shall I work on workflow for checking markdown files links.

@andrewtavis
Copy link
Member

Hey @Khushalsarode 👋 I still see the whole error output at this link that includes links that are ok as well as the ones we want to see. Ideally it would only be the broken links and no others so it's easy for us to maintain.

@Khushalsarode
Copy link
Contributor Author

Khushalsarode commented Oct 26, 2024

@andrewtavis
What I get is

  • only want output containing broken links and broken due to url not found
  • other than this like ok,redirecting should not to be shown as output
    Correct?

@andrewtavis
Copy link
Member

Exactly, @Khushalsarode :)

@andrewtavis
Copy link
Member

And something similar for markdown files :)

@Khushalsarode
Copy link
Contributor Author

Yes this is great, @Khushalsarode :) Do you want to add in a check for the Markdown files as well? There should be a GitHub action that you can use for this 😊

@andrewtavis
Yes! was waiting for your reponse!
do you want me to create new issue for this or continue ?

@andrewtavis
Copy link
Member

You can continue in here, @Khushalsarode :)

@Khushalsarode
Copy link
Contributor Author

Khushalsarode commented Oct 27, 2024

Ok, I will do it!

@Khushalsarode
Copy link
Contributor Author

@andrewtavis
#499 for markdown files link check workflow

@andrewtavis
Copy link
Member

I'll check these out soon, @Khushalsarode :) Thanks for them!

@andrewtavis
Copy link
Member

Can you take a quick look at why it's failing now, @Khushalsarode? All I did was remove some comments and now the check isn't being ran properly 🤔

@andrewtavis
Copy link
Member

andrewtavis commented Oct 29, 2024

An idea on this, @Khushalsarode: Could you maybe make a Python check script that we could also run locally? Bring in all of the URLs that are in the project even and see if the link is valid via Python requests, and then from there we'll do the check for the docs and the markdown in one go? And to boot we could check it locally and not trail and error the PR tests?

What I'm thinking:

  • Get all files that are live under /docs (not /docs/build) and /src
  • Convert the file contents to strings
  • Use regex to get all the links
  • Use requests to check the response for each of the links
  • Update a dictionary with the file path and a link if it's broken
  • Display the file path and link to the user if it's a bad response

How does this sound? :)

@Khushalsarode
Copy link
Contributor Author

An idea on this, @Khushalsarode: Could you maybe make a Python check script that we could also run locally? Bring in all of the URLs that are in the project even and see if the link is valid via Python requests, and then from there we'll do the check for the docs and the markdown in one go? And to boot we could check it locally and not trail and error the PR tests?

What I'm thinking:

  • Get all files that are live under /docs (not /src/build) and /src
  • Convert the file contents to strings
  • Use regex to get all the links
  • Use requests to check the response for each of the links
  • Update a dictionary with the file path and a link if it's broken
  • Display the file path and link to the user if it's a bad response

How does this sound? :)

i will check this one also and will work on your suggestion too!

@Khushalsarode
Copy link
Contributor Author

Don't know what happen to flow suddenly it was running file and script is same but now it's throwing error.
I will work again on this one and will do the one like suggested too

@Khushalsarode
Copy link
Contributor Author

An idea on this, @Khushalsarode: Could you maybe make a Python check script that we could also run locally? Bring in all of the URLs that are in the project even and see if the link is valid via Python requests, and then from there we'll do the check for the docs and the markdown in one go? And to boot we could check it locally and not trail and error the PR tests?

What I'm thinking:

  • Get all files that are live under /docs (not /src/build) and /src
  • Convert the file contents to strings
  • Use regex to get all the links
  • Use requests to check the response for each of the links
  • Update a dictionary with the file path and a link if it's broken
  • Display the file path and link to the user if it's a bad response

How does this sound? :)

Sure @andrewtavis that's also an good valid approach compared workflow!
I will make sure to address this issue soon!
Thanks you!
Also I will try to figure out why this flow is going Rouge to us! 😅🤣

@andrewtavis
Copy link
Member

Ultimately once this is done, we'll just run the workflow as the others are ran, @Khushalsarode, so it won't be as much of a problem :)

@Khushalsarode
Copy link
Contributor Author

@andrewtavis no issue leave this to me! I have got clearly how we are planning to implement this.

@andrewtavis
Copy link
Member

Note @Khushalsarode that there was a typo: we don't want urls in /docs/build included :) I'd written /src/build.

@Khushalsarode
Copy link
Contributor Author

Note @Khushalsarode that there was a typo: we don't want urls in /docs/build included :) I'd written /src/build.

ok, we want to check all the links under src/ and docs/ folder? correct?
@andrewtavis

@andrewtavis
Copy link
Member

Yes, except not under /docs/build :)

@andrewtavis
Copy link
Member

Hey @Khushalsarode 👋 In 2bc53e8 I added a pre-commit hook to check the markdown links. I think that this should be good for the discussed functionality. Appreciate you talking through the potential process with me!

@andrewtavis andrewtavis closed this Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Accepted as a part of Hacktoberfest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants