Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken API links in the user guide (404 page not found) + stale documentation example (fetch function) #18852

Closed
npielawski opened this issue Sep 23, 2024 · 4 comments · Fixed by #18872
Labels
documentation Improvements or additions to documentation

Comments

@npielawski
Copy link
Contributor

Description

I noticed a dead link in the user guide, so I made a script to probe all links in docs/source/_build/API_REFERENCE_LINKS.yml It turns out that there are 22 dead links (HTTP response != 200) in the user guide. Many links are stale and need to be updated, there are a few typos, too. The issue concerns both Python and Rust links.

As an example of broken link Expressions / Aggregation has a broken link if you click on API Categorical for the Python code example.

I made another script to look at stale tags that are not referenced in the API, and there are 17 such instances. This is assuming that the links are only being used in docs/ and that the line contains code_block. I am double checking the positives manually to make sure there are no false positive.

Finally, the fetch link (which gives a 404) doesn't have a API documentation page anymore, likely due to the function being deprecated. It would be best to rewrite the section in docs/source/user-guide/lazy/execution.md L52-79 and use head+collect instead (since this is what is recommended in the source code).

If this issue is accepted, I can submit a PR and update the links (already did the work), I can start writing a new Execution on a partial dataset section as well. I am wondering if the stale tags should be removed at all (the links are all returning HTTP 200), and I am not 100% certain I won't break something by removing them.

Here is the list of links:

https://docs.pola.rs/api/python/stable/reference/api/polars.Categorical.html
https://docs.pola.rs/api/python/stable/lazyframe/api/polars.lazyframe.engine_config.GPUEngine.html
https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.col.html
https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.prefix.html
https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.suffix.html
https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.map_alias.html
https://docs.pola.rs/api/python/stable/reference/lazyframe/api/polars.LazyFrame.fetch.html
https://docs.pola.rs/api/python/stable/reference/sql
https://docs.pola.rs/api/python/stable/reference/api/polars.SQLContext.register.html#polars.SQLContext.register
https://docs.pola.rs/api/python/stable/reference/api/polars.SQLContext.register_many.html
https://docs.pola.rs/api/python/stable/reference/api/polars.SQLContext.query.html
https://docs.pola.rs/api/python/stable/reference/api/polars.SQLContext.execute.html
https://docs.pola.rs/api/python/stable/reference/api/polars.date_range.html
https://docs.pola.rs/api/python/stable/reference/api/polars.Array.html
https://docs.pola.rs/api/rust/dev/polars_core/frame/hash_join/index.html
https://docs.pola.rs/api/python/stable/reference/sql.html
https://docs.pola.rs/api/rust/dev/polars_io/csv/struct.CsvReader.html
https://docs.pola.rs/api/rust/dev/polars_io/csv/struct.CsvWriter.html
https://docs.pola.rs/api/rust/dev/polars_io/parquet/struct.ParquetReader.html
https://docs.pola.rs/api/rust/dev/polars_io/parquet/struct.ParquetWriter.html
https://docs.pola.rs/api/rust/dev/polars_io/prelude/struct.IpcReader.html
https://docs.pola.rs/api/rust/dev/polars_lazy/dsl/fn.concat_lst.html

Here is the list of unused tags:

GPUEngine
Config
min
max
prefix
suffix
map_alias
concat_list
implode
read_database_connectorx
Series.dt.day
min
max
implode
arr.eval
concat_list
Series.dt.day

The code to find all broken links (run in docs/source/_build):

for url in `cat API_REFERENCE_LINKS.yml | grep -o "https://.*$"`
do
  if ! curl -s -i $url | grep -q "HTTP/2 200"
then
  echo $url
fi
done

The code to find stale tags (run in docs/source/_build, using ripgrep):

#!/bin/bash
for tag in `yq '.[] | keys | .[]' API_REFERENCE_LINKS.yml`
do
  if ! rg -q "code_block.*'$tag'" ../..
  then
    echo $tag
    # Uncomment to go manually through the hits and avoid false positives
    # rg "$tag" --iglob "!API_REFERENCE_LINKS.yml" ../..
  fi
done

Link

https://docs.pola.rs/user-guide/expressions/aggregation/

@npielawski npielawski added the documentation Improvements or additions to documentation label Sep 23, 2024
@rodrigogiraoserrao
Copy link
Collaborator

Hey there, thanks for this!

Regarding the link for fetch / the section “Execution on a partial dataset”, see #18033. My suggestion is that you share with the OP of that PR what you intended to do with head + collect.

As for the broken links, please do submit the corrected links.

When you talk about “stale tags”, I assume you are talking about entries in the YAML file that are not referenced in code blocks, is that it?

@npielawski
Copy link
Contributor Author

When you talk about “stale tags”, I assume you are talking about entries in the YAML file that are not referenced in code blocks, is that it?

Yes exactly

@rodrigogiraoserrao
Copy link
Collaborator

Yes exactly

Ok, I see. To be honest with you, I am not 100% sure if those tags are relevant elsewhere, so if those links are all just working fine, I'd recommend we keep them for now.

Fixing the broken & used links seems more useful in the short term and since you were kind enough to share the scripts you used to check the links we can always go through the tags again later.

@rodrigogiraoserrao
Copy link
Collaborator

Thanks for the PR, @npielawski

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants