Skip to content

Commit

Permalink
Docs for new release (#57)
Browse files Browse the repository at this point in the history
- Fix docstrings to properly appear in the docs
- Add release notes for new version.
- Add multiprocessing for broken urls decreasing execution time by 50%.
- Add Advanced Usage Documentation
- Add docs for reports
  • Loading branch information
john0isaac authored Aug 6, 2024
1 parent b2bd651 commit 15b9992
Show file tree
Hide file tree
Showing 21 changed files with 183 additions and 35 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,17 @@ All notable changes to this project will be documented in this file.

### Other Changes

## [v0.2.0] 3 Aug 2024
- Redesign the package.
- Port to using Click instead of arg_parser.
- Expose options for external users to allow for more customization.
- Increase coverage for paths by including paths that start with `/` or nothing.
- Add retires for URLs before flagging them as broken.
- Preform head request on URL which falls back to get if both not working flag as broken after retries count finishes.
- Analyze all web URLs except the ones in skip_domains list.
- Change Syntax of terminal comments to improve readability.
- Add Spinner to indicate that the tool is working (Not compatible with all terminals)

## [v0.1.5] 8 Jul 2024
- Increase timeout for requests to check web urls alive or not. https://github.com/john0isaac/markdown-checker/pull/52

Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ pip install markdown-checker
2. Run `markdown-checker -d {src} -f {func} -gu {url}`. Replace `{src}` with the directory you want to analyze, `{func}` with the available functions like `check_broken_paths`, `{gu}` with your contribution guidance full URL.
3. The output will be displayed in the terminal and in a `comments.md` file.

For more customizations read the docs.

## Using `markdown-checker` in GitHub Actions

You can run this tool within a GitHub workflow using the [action-check-markdown](https://github.com/marketplace/actions/check-markdown) GitHub action.
Expand Down
9 changes: 9 additions & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,13 @@ plugins:
nav:
- About: index.md
- Usage: usage.md
- Advanced Usage: advanced.md
- API Reference:
- API Reference: api.md
- Main: ./api/main.md
- Urls: ./api/urls.md
- Paths: ./api/paths.md
- Markdown Link Base: ./api/markdown_link_base.md
- Utilities: ./api/utils.md
- Reports : ./api/reports.md
- Automate: automate.md
87 changes: 87 additions & 0 deletions docs/source/advanced.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
## Advanced Usage

To further customize your experience with the Markdown Checker, you can utilize additional command-line interface (CLI) options.

## Command Line Options

### `-d`, `--dir`
- **Type**: `click.Path`
- **Description**: Path to the root directory to check.
- **Required**: Yes

### `-ext`, `--extensions`
- **Type**: `list[str]`
- **Description**: File extensions to filter the files.
- **Default**:
- `.md`
- `.ipynb`
- **Required**: Yes

### `-td`, `--tracking-domains`
- **Type**: `list[str]`
- **Description**: List of tracking domains to check if they have a tracking id or not.
- **Default**:
- `github.com`
- `microsoft.com`
- `visualstudio.com`
- `aka.ms`
- `azure.com`
- **Required**: Yes

### `-sf`, `--skip-files`
- **Type**: `list[str]`
- **Description**: List of file names to skip check.
- **Default**:
- `CODE_OF_CONDUCT.md`
- `SECURITY.md`
- **Required**: Yes

### `-sd`, `--skip-domains`
- **Type**: `list[str]`
- **Description**: List of domains to skip checking if their urls are working or not.
- **Default**: `[]`
- **Required**: No

### `-suc`, `--skip-urls-containing`
- **Type**: `list[str]`
- **Description**: List of strings to skip checking if their urls are working or not.
- **Default**:
- `https://www.microsoft.com/en-us/security/blog`
- `video-embed.html`
- **Required**: No

### `-gu`, `--guide-url`
- **Type**: `str`
- **Description**: Full URL of your contributing guide.
- **Required**: Yes

### `-to`, `--timeout`
- **Type**: `int`
- **Description**: Timeout in seconds for the requests before retrying.
- **Default**: `10`
- **Required**: No

### `-rt`, `--retries`
- **Type**: `int`
- **Description**: Number of retries for the requests before flagging a url as broken.
- **Default**: `3`
- **Required**: No

### `-o`, `--output-file-name`
- **Type**: `str`
- **Description**: Name of the output file.
- **Default**: `comments`
- **Required**: Yes


## Other Options

### `--version`
- **Type**: `bool`
- **Description**: Show the version and exit.
- **Required**: No

### `--help`
- **Type**: `bool`
- **Description**: Show the help message and exit.
- **Required**: No
5 changes: 5 additions & 0 deletions docs/source/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
- [Main](./api/main.md)
- [Urls](./api/urls.md)
- [Paths](./api/paths.md)
- [Markdown Link Base](./api/markdown_link_base.md)
- [Utilities](./api/utils.md)
1 change: 1 addition & 0 deletions docs/source/api/main.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: markdown_checker
1 change: 1 addition & 0 deletions docs/source/api/markdown_link_base.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: markdown_checker.markdown_link_base
1 change: 1 addition & 0 deletions docs/source/api/paths.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: markdown_checker.paths
3 changes: 3 additions & 0 deletions docs/source/api/reports.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
::: markdown_checker.reports.md_reports.generator

::: markdown_checker.reports.generator_base
1 change: 1 addition & 0 deletions docs/source/api/urls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: markdown_checker.urls
7 changes: 7 additions & 0 deletions docs/source/api/utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
::: markdown_checker.utils.extract_links

::: markdown_checker.utils.format_output

::: markdown_checker.utils.list_files

::: markdown_checker.utils.spinner
5 changes: 4 additions & 1 deletion docs/source/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

The library provides the following functions:

- [Usage](#usage)
[Usage](#usage):

- [`check_broken_paths`](#check_broken_paths)
- [`check_broken_urls`](#check_broken_urls)
- [`check_urls_locale`](#check_urls_locale)
Expand Down Expand Up @@ -58,3 +59,5 @@ Example:
```bash
markdown-checker -d . -f check_urls_tracking -gu https://github.com/john0isaac/markdown-checker/blob/main/CONTRIBUTING.md
```

## Want to do more? Check out the [Advanced Usage](./advanced.md) page.
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[project]
name = "markdown-checker"
description= "A markdown link validation reporting tool"
version = "0.1.5"
description= "A markdown link validation reporting tool."
version = "0.2.0"
authors = [{ name = "John Aziz", email = "[email protected]" }]
maintainers = [{ name = "John Aziz", email = "[email protected]" }]
license = {file = "LICENSE"}
Expand Down
38 changes: 27 additions & 11 deletions src/markdown_checker/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
following some Guidelines
"""

import concurrent.futures
import platform
import sys
from pathlib import Path
Expand All @@ -20,6 +21,17 @@
from markdown_checker.utils.spinner import spinner


def check_url(url: MarkdownURL, skip_domains: list[str], skip_urls_containing: list[str], timeout: int, retries: int):
if any(url.host_name().lower() in domain.lower() for domain in skip_domains) or any(
url.link in substring for substring in skip_urls_containing
):
return None
if not url.is_alive(timeout=timeout, retries=retries):
url.issue = "is broken"
return url
return None


def detect_issues(
func: str,
file_path: Path,
Expand All @@ -42,7 +54,7 @@ def detect_issues(
retries (int): Number of retries for the requests
Returns:
tuple[list[Union[MarkdownPath, MarkdownURL]], int]: Detected issues and links count
Detected issues and links count.
"""
detected_issues: list[Union[MarkdownPath, MarkdownURL]] = []
links_count = 0
Expand Down Expand Up @@ -83,14 +95,18 @@ def detect_issues(
"upload.wikimedia.org",
]
)
for url in all_links.urls:
if any(url.host_name().lower() in domain.lower() for domain in skip_domains) or any(
url.link in substring for substring in skip_urls_containing
):
continue
if not url.is_alive(timeout=timeout, retries=retries):
url.issue = "is broken"
detected_issues.append(url)
with concurrent.futures.ProcessPoolExecutor() as executor:
results = list(
executor.map(
check_url,
all_links.urls,
[skip_domains] * len(all_links.urls),
[skip_urls_containing] * len(all_links.urls),
[timeout] * len(all_links.urls),
[retries] * len(all_links.urls),
)
)
detected_issues.extend(filter(None, results))
links_count += len(all_links.urls)
elif func == "check_urls_tracking":
for url in all_links.urls:
Expand Down Expand Up @@ -255,7 +271,7 @@ def main(
tracking_domains: list[str],
output_file_name: str,
) -> None:
"""Markdown Link Checker"""
"""A markdown link validation reporting tool."""
_ = tuple(Path(item) for item in src) or (Path("./"),) # default to current directory

_, files_paths = get_files_paths_list(Path(dir), extensions)
Expand All @@ -280,7 +296,7 @@ def main(
)
links_checked_count += links_count
if len(detected_issues) > 0:
formatted_output += f"| `{file_path}` |" + format_links(detected_issues)
formatted_output += f"| [`{file_path}`]({file_path}) |" + format_links(detected_issues)
all_files_issues.extend(detected_issues)
click.echo(
click.style(f"\n🔍 Checked {links_checked_count} links in {len(files_paths)} files.", fg="blue"), err=False
Expand Down
4 changes: 2 additions & 2 deletions src/markdown_checker/markdown_link_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def has_locale(self) -> bool:
Check if the link has a locale
Returns:
bool: True if the link has a locale, False otherwise
True if the link has a locale, False otherwise
"""
locale_pattern = re.compile(r"\/[a-z]{2}-[a-z]{2}\/")
matches = re.findall(locale_pattern, self.link)
Expand All @@ -29,7 +29,7 @@ def has_tracking(self) -> bool:
Check if the link has a tracking ID
Returns:
bool: True if the link has a tracking ID, False otherwise
True if the link has a tracking ID, False otherwise
"""
tracking_pattern = re.compile(r"(\?|\&)(WT|wt)\.mc_id=")
matches = re.findall(tracking_pattern, self.link)
Expand Down
10 changes: 5 additions & 5 deletions src/markdown_checker/paths.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def path_without_fragments(self) -> Path:
Get the path without the fragment
Returns:
Path: The path without the fragment
The path without the fragment
"""
return Path(self.remove_fragments())

Expand All @@ -25,7 +25,7 @@ def remove_fragments(self) -> str:
Remove the fragments from a path
Returns:
str: The path without the fragment
The path without the fragment
"""
# Find the last occurrence of the dot
dot_index = self.link.rfind(".")
Expand All @@ -51,7 +51,7 @@ def get_full_path(self) -> Path:
Get the full path of the file by resolving the path without fragments
Returns:
Path: The full path of the file
The full path of the file
"""
try:
return self.path_without_fragments.resolve()
Expand All @@ -63,7 +63,7 @@ def get_full_path_relative(self) -> str:
Get the full path of the file by resolving the path without fragments
Returns:
str: The full path of the file
The full path of the file
"""
return os.path.normpath(os.path.join(os.path.dirname(self.file_path), self.remove_fragments()))

Expand All @@ -72,7 +72,7 @@ def exists(self) -> bool:
Check if the path exists
Returns:
bool: True if the path exists, False otherwise
True if the path exists, False otherwise
"""
# Paths starting with / are considered absolute and not resolved
# so we need to remove the / and check if the path exists
Expand Down
2 changes: 2 additions & 0 deletions src/markdown_checker/urls.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import time
from dataclasses import dataclass
from urllib.parse import ParseResult, urlparse

Expand Down Expand Up @@ -47,4 +48,5 @@ def is_alive(self, timeout: int = 10, retries: int = 3) -> bool:
return response.status_code == 200
except requests.RequestException:
continue
time.sleep(1)
return False
2 changes: 1 addition & 1 deletion src/markdown_checker/utils/extract_links.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ def get_links_from_md_file(file_path: Path) -> MarkdownLinks:
file_path (Path): The file path to check.
Returns:
MarkdownLinks: Dataclass with urls and paths
markdown_links (MarkdownLinks): Dataclass with urls and paths
"""
markdown_links = MarkdownLinks(urls=[], paths=[])
link_pattern = re.compile(r"\]\((.*?)\)| \)")
Expand Down
4 changes: 2 additions & 2 deletions src/markdown_checker/utils/format_output.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ def format_links(links: list[Union[MarkdownPath, MarkdownURL]]) -> str:
Formats a List of links into a string with numbered bullets.
Args:
links (List): A List of links.
links (list[Union[MarkdownPath, MarkdownURL]]): The list of links to format.
Returns:
str: The formatted string with numbered bullets.
formatted_links (str): The formatted string with numbered bullets.
"""
formatted_links = ""
i = 1
Expand Down
6 changes: 3 additions & 3 deletions src/markdown_checker/utils/list_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ def get_files_paths_list(root_path: Path, extensions: list[str] = []) -> tuple[l
Get a list of file paths from a root directory and its subdirectories, filtered by file extension.
Args:
- root_path (Path): The root directory to start the search.
- extensions (list[str]): A list of file extensions to filter the search.
root_path (Path): The root directory to start the search.
extensions (list[str]): A list of file extensions to filter the search.
Returns:
- tuple[list[Path], list[Path]]: A tuple containing a list of subdirectories and a list of file paths.
A tuple containing a list of subdirectories and a list of file paths.
"""

sub_folders: list[Path] = []
Expand Down
15 changes: 7 additions & 8 deletions src/markdown_checker/utils/spinner.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,13 @@ def spinner(
The spinner is created only if stdout is not redirected, or if the spinner
is forced using the `force` parameter.
Parameters
----------
beep : bool
Beep when spinner finishes.
disable : bool
Hide spinner.
force : bool
Force creation of spinner even when stdout is redirected.
Args:
beep (bool):
Beep when spinner finishes.
disable (bool):
Hide spinner.
force (bool):
Force creation of spinner even when stdout is redirected.
Example
-------
Expand Down

0 comments on commit 15b9992

Please sign in to comment.