Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Heetch Terms of Service #57

Merged
merged 1 commit into from
Aug 18, 2022
Merged

Conversation

OTA-Bot
Copy link
Contributor

@OTA-Bot OTA-Bot commented Aug 5, 2022

This suggestion has been created with the Contribution Tool, which enables graphical declaration of documents. You can see this declaration suggestion online or on your local instance if you have one set up.

Bots should take care of checking the formatting and the validity of the declaration. As a human reviewer, here are the things you should check:

  • The suggested document matches the scope of this instance: it targets a service in the language, jurisdiction, and industry that are part of those described for this instance.
  • The service name matches what you see on the web page, and it complies with the guidelines.
  • The service ID (i.e. the name of the file) is derived from the service name according to the guidelines.
  • The document type is appropriate for this document: if you read out loud the document type tryptich, you can say that “this document describes how the writer commits to handle the object for its audience.
  • The selectors seem to be stable: as much as possible, the CSS selectors are meaningful and specific (e.g. .tos-content rather than .ab23 .cK_drop > div).
  • The selectors are as simple as they can be: the CSS selectors do not have unnecessary specificity (e.g. if there is an ID, do not add a class).
  • The document content is relevant: it is not just a series of links, for example.
  • The generated version is readable: it is complete and not mangled.
  • The generated version is clean: it does not contain navigation links, unnecessary images, or extra content.

If there seems to be no appropriate document type for this document yet it is relevant to track for this instance, please check if there is already an open discussion about such a type and reference your case there, or open a new discussion if not.

Thanks to your work and attention, Open Terms Archive will ensure that high quality data is available for all reusers, enabling them to do their part in shifting the balance of power towards end users and regulators instead of spending time collecting and cleaning documents 👏💪

@martinratinaud
Copy link
Member

The version gathered contains some weird characters

I created a PR for that on our markdown transformer accordproject/markdown-transform#508

I suggest we wait for bug to be fixed for a week so that we do not capture difficult to process data.
In case it's not merged, then we can go on with it.

What do you think @MattiSG

@MattiSG
Copy link
Member

MattiSG commented Aug 10, 2022

As described in accordproject/markdown-transform#508 (comment), I believe the issue lies with the document itself, not with the transformer. All ti are replaced by (. Let's wait a bit indeed, but I doubt this can be recovered. The best would probably be to let the publisher know.

@MattiSG MattiSG added blocked The document cannot be tracked because of copy or bot protection and removed blocked The document cannot be tracked because of copy or bot protection labels Aug 17, 2022
@MattiSG
Copy link
Member

MattiSG commented Aug 17, 2022

I'm in favour of merging as it is. We could add a filter to replace \w+\( with ti, that would solve all cases where the ti is not at the beginning of a word (otherwise it could conflict with legit parens), but the reality of that document is that it does have this encoding glitch: anyone copy-pasting from it would have the same.

Copy link
Member

@MattiSG MattiSG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@martinratinaud martinratinaud merged commit 2923bc3 into main Aug 18, 2022
@martinratinaud martinratinaud deleted the add_heetch_terms_of_service branch August 18, 2022 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants