Skip to content

Commit

Permalink
Add initial vale configuration and apply editorial changes to documen…
Browse files Browse the repository at this point in the history
…tation (#3567)

* Fix editorial issues raised by Vale

* Add vale configuration

* Prepare vale pre-commit hook

* Add .vale/justfile to help local development of styles

* Generate DAG docs for Vale editorial change

* Configure easier to use just recipes

* Revert unnecessary lint recipe changes

* Update Vale config documentation

* Always run using local Vale docker image and pin Vale package versions

* Prevent CI docker issue

* Check all markdown files, not just documentation site

* Prevent false positives for term casing

* Check admonition blocks

* Include mdx as markdown

* Add .vale to CODEOWNERS

* Address vale errors for files added during rebase
  • Loading branch information
sarayourfriend authored Jan 29, 2024
1 parent 6d8bf25 commit e12fd14
Show file tree
Hide file tree
Showing 42 changed files with 298 additions and 90 deletions.
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ tsconfig.base.json @WordPress/openverse-frontend

# Specific assignments for the 'openverse-maintainers' group
.codespell/ @WordPress/openverse-maintainers
.vale/ @WordPress/openverse-maintainers
.devcontainer/ @WordPress/openverse-maintainers
.github/ @WordPress/openverse-maintainers
automations/ @WordPress/openverse-maintainers
Expand Down
1 change: 1 addition & 0 deletions .github/filters.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ lint:
- .pre-commit-config.yaml
- tsconfig.base.json
- .codespell
- .vale
mgmt:
- .github/**
- .devcontainer/**
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci_cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ jobs:
uses: docker/build-push-action@v5
with:
context: ${{ matrix.context }}
target: ${{ matrix.target }}
target: ${{ matrix.target || '' }}
push: false
tags: openverse-${{ matrix.image }}
file: ${{ matrix.file }}
Expand Down
10 changes: 10 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -110,13 +110,15 @@ repos:
entry: bash -c 'pnpm run -r types'
language: system
pass_filenames: false

- id: eslint
name: eslint
files: (frontend|automations|packages).*?\.(js|ts|vue|json5|json)$
"types": [file] # ESLint only accepts [javascript] by default.
language: system
pass_filenames: false
entry: bash -c 'pnpm run eslint --fix'

- id: test:unit
name: test:unit
files: ^(frontend|packages)/.*$
Expand All @@ -125,13 +127,21 @@ repos:
pass_filenames: false
stages:
- push

- id: render-release-drafter
name: render-release-drafter
files: ^templates/.*$
entry: bash -c 'just automations/js/render-release-drafter'
language: system
pass_filenames: false

- id: vale
name: vale
language: system
entry: bash -c 'just .vale/run'
pass_filesnames: false
files: (.vale/.*|.mdx?)$

- repo: https://github.com/renovatebot/pre-commit-hooks
rev: 37.116.0
hooks:
Expand Down
3 changes: 3 additions & 0 deletions .vale/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
styles/*
!styles/Vocab/
!styles/Openverse/
43 changes: 43 additions & 0 deletions .vale/.vale.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
StylesPath = styles

MinAlertLevel = suggestion
Vocab = Openverse
# Using an explicit reference to downloads for package zips rather than the shorthand allows us to
# pin the package to a specific version. Otherwise, changes in the upstream style configuration can
# cause sudden and unexpected failures during linting.
Packages = https://github.com/errata-ai/proselint/releases/download/v0.3.3/proselint.zip

# The default settings also ignore `pre` and `code`, which includes admonition code blocks.
# We may encounter false positives by including code blocks, but we use admonitions
# so frequently in Openverse documentation that it is probably worth it to include them.
SkippedScopes = script, style, figure
IgnoredScopes = tt

[formats]
# Treat mdx as markdown (for Storybook support)
mdx = md

# Only configure Markdown, because that's the only language we use to write documentation in
# (Except MDX, but if we want to add Vale there it will have to be at a later date to avoid complexity
# in the initial setup)
[*.md]
BasedOnStyles = proselint, Vale, Openverse

# Avoid Vale.Spelling for now, but we do want Vale.Terms for the vocab at least
Vale.Spelling = NO

# The suggested terms are obscure
proselint.AnimalLabels = NO

# These are rarely necessary changes and can be more fiddly than they are helpful
proselint.Typography = NO

# proselint.Needless considers a host of terms "needless" which are natural and wide-spread changes in the English language,
# and is clearly favouring US changes rather than others (the US isn't the only or even the largest English speaking country).
# You don't have to look far to find examples of this rule preferring variants of terms in line with 18th century elite preferences:
# https://www.merriam-webster.com/grammar/preventive-or-preventative
# In other words, it's a linguistic prescriptive holdover to insist on some of these terms being "needless" when they're in
# fact widespread and well understood. Not all the terms have this issue, but we'd have to rewrite
# practically the entire rule to sort out which ones we cared about. If we find any we _do_ care about, we can add them to our
# own substitution rule.
proselint.Needless = NO
9 changes: 9 additions & 0 deletions .vale/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
FROM jdkato/vale:v2.30.0

WORKDIR /vale
COPY .vale.ini .
COPY styles styles

RUN vale sync

ENTRYPOINT ["vale", "--config=/vale/.vale.ini"]
29 changes: 29 additions & 0 deletions .vale/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Openverse Vale configuration

Openverse runs Vale using Docker. This bypasses the need for contributors to
install the Vale binary on their computers. It also prevents Vale styles getting
downloaded into the repository in clear text, which is critical to avoid lists
of sensitive terms being accidentally too-easily available.

For more information about this motivation to avoid lists of sensitive terms in
the Openverse monorepo, refer to the README at
[WordPress/openverse-sensitive-terms](https://github.com/WordPress/openverse-sensitive-terms).

To run Vale with Openverse's configuration, use the just recipe:

```
$ just .vale/run
```

This recipe _always_ builds Vale. The Openverse Vale docker image is fast to
build, and the most expensive steps are cached. Docker will automatically reuse
the pre-existing image unless there are changes to the Vale configuration.

Typically it is unnecessary to run Vale directly, as pre-commit automatically
runs Vale on each commit. You only need to run Vale directly when iterating on
changes to Openverse's Vale configuration.

Refer to the `VALE_FILES` variable [in the justfile](./justfile) to identify
which files we currently check with Vale. A comment on the variable explains the
rationale for that choice. The list of files will ideally expand in the future
to include all textual content in the repository.
64 changes: 64 additions & 0 deletions .vale/justfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
COLOR := "\\033[0;34m"
NO_COLOR := "\\033[0m"


# Show all available recipes
@_default:
printf "\n{{ COLOR }}# Vale (path: \`.vale/\`)\n"
printf "===================={{ NO_COLOR }}\n"
just --list --unsorted


# Build a local version of `openverse-vale:local` for testing
build:
docker build . -t openverse-vale:local


# The --glob setting excludes things that should not be linted, including
# build artefacts, dependencies, and automatically generated files like
# the changelogs (without excluding `changelogs/index.md`). Project proposals
# are also excluded for want of a good way to ignore existing proposals and only
# lint _new_ proposals. Project proposals aren't "living documents" in the way
# the rest of our documentation is, so it doesn't seem right to retroactively
# edit them for mere editorial purposes (rather than, for example, to correct
# some grave inacurracy).
# Leading `,` compensates for lack of trailing comma support
# Note the lack of space before the trailing \ on each line, this is to prevent
# the addition of a space between each pattern, which isn't supported by Vale's glob
# If you change this and Vale takes a very long time to run (more than 30 seconds)
# then chances are the change is breaking the glob pattern. Unfortunately the only
# feedback you get when the glob pattern is not working as intended is a very long
# run time for Vale as it checks everything that should have otherwise been ignored,
# but wasn't due to some minor issue in the pattern syntax.
# This is fiddly, but I can't find a good way around it.
_IGNORE_PATTERNS := """
_build\
,_static\
,venv\
,.venv\
,.nuxt\
,.pnpm-store\
,node_modules\
,test-results\
,storebook-static\
,.ruff-cache\
,projects/proposals\
,changelogs/api\
,changelogs/frontend\
,changelogs/catalog\
,changelogs/ingestion_server\
"""

VALE_GLOB := "--glob='!*{" + _IGNORE_PATTERNS + "}*'"


# Run Vale configured for Openverse in Docker.
# Using Docker avoids the need for contributors to install the Vale binary.
#Configuration defaults to what is used for CI.
run +files=".": build
docker run --rm \
-v $PWD/..:/src:rw,Z \
--workdir=/src \
openverse-vale:local \
{{ files }} \
{{ VALE_GLOB }}
8 changes: 8 additions & 0 deletions .vale/styles/Openverse/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Openverse vale style

The Openverse vale style is manually curated by Openverse maintainers. These
rules are primarily meant to cater to exceptions in the Openverse documentation
style that do not fit into pre-existing style guides, whether because of
disagreements in style or different institutional or contextual requirements
based on Openverse's domain and nature as a FOSS project within the WordPress
community.
21 changes: 21 additions & 0 deletions .vale/styles/Openverse/TermCasing.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# While Vale.Terms could be used for these, it is too inflexible, and thus incapable
# of ignoring common false-positives like GitHub usernames or GitHub team names
extends: substitution
message: "Incorrect casing. Use '%s' instead of '%s'."
level: error
ignorecase: false
scope:
- paragraph
swap:
# [^/\.] prevents matching things that look like URLs, file paths, or GitHub team mentions
# For example: @WordPress/openverse-maintainers
'[^/\.]openverse[^.\.]': Openverse
# OpenVerse should never be used, except as an example of something that is always wrong,
# in which case we'll tell Vale to ignore that line.
"OpenVerse": Openverse
'[^/\.]wordpress[^.\.]': WordPress
# Wordpress is the same as OpenVerse
"Wordpress": WordPress
'[^/\.]github[^.\.]': GitHub
# Github is the same as Wordpress and OpenVerse
"Github": GitHub
3 changes: 3 additions & 0 deletions .vale/styles/Vocab/Openverse/accept.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# This matches "cliche" rulesets, but we actually use this term in earnest and with clear explanation of what we mean by it
Decision-Making Process
decision-making process
1 change: 1 addition & 0 deletions automations/python/workflows/set_matrix_images.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ def ser_set(x):
build_matrix["image"] |= {"frontend", "frontend_nginx"}
publish_matrix["image"] |= {"frontend", "frontend_nginx"}


build_matrix["include"] = [includes[item] for item in build_matrix["image"]]

for item in build_matrix["include"]:
Expand Down
2 changes: 1 addition & 1 deletion catalog/dags/providers/provider_api_scripts/inaturalist.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
https://api.inaturalist.org/v1/docs/
But there is a full dump intended for sharing on S3.
https://github.com/inaturalist/inaturalist-open-data/tree/documentation/Metadata
Because these are very large normalized tables, as opposed to more document
Because these are exceptionally large normalized tables, as opposed to more document
oriented API responses, we found that bringing the data into postgres first
was the most effective approach. More detail in slack here:
https://wordpress.slack.com/archives/C02012JB00N/p1653145643080479?thread_ts=1653082292.714469&cid=C02012JB00N
Expand Down
8 changes: 5 additions & 3 deletions catalog/docs/data_models.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
_<TODO: This documentation is temporary and should be replaced by more thorough
documentation of our DB fields in
https://github.com/WordPress/openverse-catalog/issues/783>_
> **Note**
>
> This documentation is temporary and should be replaced by more thorough
> documentation of our DB fields in
> https://github.com/WordPress/openverse/issues/412.
# Data Models

Expand Down
6 changes: 4 additions & 2 deletions catalog/docs/provider_data_ingester_faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,10 @@ def get_record_data(self, data: dict) -> dict | list[dict] | None:
...
```

**NOTE**: When doing this, keep in mind that adding too many requests may slow
down ingestion. Be aware of rate limits from your provider API as well.
> **Note**
>
> When doing this, keep in mind that adding too many requests may slow down
> ingestion. Be aware of rate limits from your provider API as well.
## What if my API endpoint isn't static and needs to change from one request to another?

Expand Down
15 changes: 8 additions & 7 deletions documentation/api/reference/search_algorithm.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# Search Algorithm

Openverse currently uses a relatively simple and naïve search algorithm with
very limited options. The documentation on this page was written by referencing
the code in Openverse as well as parts of Openverse's historical development.
Parts of the story for how Openverse's indexes came to be configured as they are
today are likely missing. Future improvements to Openverse's indexing and search
will be more carefully documented here and in the code to ensure there is
greater longevity of understanding.
restricted modifications to the default Elasticsearch behaviour. The
documentation on this page was written by referencing the code in Openverse as
well as parts of Openverse's historical development. Parts of the story for how
Openverse's indexes came to be configured as they are today are likely missing.
Future improvements to Openverse's indexing and search will be more carefully
documented here and in the code to ensure there is greater longevitiy of
understanding.

> **Note**: This document avoids covering details covered in the
> [Openverse Search Guide](https://wordpress.org/openverse/search-help).
Expand Down Expand Up @@ -176,7 +177,7 @@ aspects of a document:
> to potentially change this fact.
Of these, title is weighted 10000 times more heavily than the description and
tags. This makes searches that match a title very closely rise to the "top" of
tags. This makes works whose titles closely match the query rise to the "top" of
the results, even if the same text is present word-for-word in a description. It
also breaks ties between documents, if, for example, two documents are returned,
one because the title matches and one because a tag matches, the title-matched
Expand Down
5 changes: 5 additions & 0 deletions documentation/catalog/guides/adding_a_new_provider.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,9 @@ that can be used to generate the files you'll need and get you started:

You should see output similar to this:

<!-- Ignore "NOTE" in the codeblock below. It is verbatim output from a script so should not be changed. -->
<!-- vale proselint.Annotations = NO -->

```
Creating files in /Users/staci/projects/openverse-projects/openverse
API script: openverse/catalog/dags/providers/provider_api_scripts/foobar_museum.py
Expand All @@ -86,6 +89,8 @@ API script test: openverse/catalog/tests/dags/providers/provider_api_scripts/t
NOTE: You will also need to add a new ProviderWorkflow dataclass configuration to the PROVIDER_WORKFLOWS list in `openverse-catalog/dags/providers/provider_workflows.py`.
```

<!-- vale proselint.Annotations = YES -->

This generates a provider script with a templated `ProviderDataIngester` for you
in the
[`provider_api_scripts` folder](https://github.com/WordPress/openverse/tree/main/catalog/dags/providers/provider_api_scripts),
Expand Down
7 changes: 6 additions & 1 deletion documentation/catalog/guides/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,17 @@ command runner installed.

To set up the local python environment along with the pre-commit hook, run:

<!-- Ignore `venv` repetition below as it is correct -->
<!-- vale Vale.Repetition = NO -->

```shell
python3 -m venv venv
source venv/bin/activate
just catalog/install
```

<!-- vale Vale.Repetition = YES -->

The containers will be built when starting the stack up for the first time. If
you'd like to build them prior to that, run:

Expand Down Expand Up @@ -171,7 +176,7 @@ just down -v

`docker volume prune` can also be useful if you've already stopped the running
containers, but be warned that it will remove all volumes associated with
stopped containers, not just openverse-catalog ones.
stopped containers, not just catalog ones.

To fully recreate everything from the ground up, you can use:

Expand Down
6 changes: 3 additions & 3 deletions documentation/catalog/reference/DAGs.md
Original file line number Diff line number Diff line change
Expand Up @@ -872,9 +872,9 @@ Notes: The iNaturalist API is not intended for data scraping.
https://api.inaturalist.org/v1/docs/ But there is a full dump intended for
sharing on S3.
https://github.com/inaturalist/inaturalist-open-data/tree/documentation/Metadata
Because these are very large normalized tables, as opposed to more document
oriented API responses, we found that bringing the data into postgres first was
the most effective approach. More detail in slack here:
Because these are exceptionally large normalized tables, as opposed to more
document oriented API responses, we found that bringing the data into postgres
first was the most effective approach. More detail in slack here:
https://wordpress.slack.com/archives/C02012JB00N/p1653145643080479?thread_ts=1653082292.714469&cid=C02012JB00N
We use the table structure defined here,
https://github.com/inaturalist/inaturalist-open-data/blob/main/Metadata/structure.sql
Expand Down
8 changes: 4 additions & 4 deletions documentation/frontend/reference/playwright_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,10 +178,10 @@ automatically by Playwright and will be placed in the `test-results` folder
under the fully qualified name of the test that failed (with every parent
describe block included).

Additionally, you can run run the tests in debug mode. This will run the tests
with a headed browser as opposed to a headless (invisible) one and allow you to
watch the test happen in real time. It's not possible for a headed browser to
run inside the docker container, however, so be aware that when debugging the
Additionally, you can run the tests in debug mode. This will run the tests with
a headed browser as opposed to a headless (invisible) one and allow you to watch
the test happen in real time. It's not possible for a headed browser to run
inside the docker container, however, so be aware that when debugging the
environment will be slightly different. For example, if you're on any OS other
than Linux, the browser you're running will have small differences in how it
renders the page compared to the docker container.
Expand Down
Loading

0 comments on commit e12fd14

Please sign in to comment.