Skip to content
This repository has been archived by the owner on Oct 16, 2024. It is now read-only.

Semi-autogenerated docs #171

Open
mike0sv opened this issue Sep 3, 2022 · 13 comments
Open

Semi-autogenerated docs #171

mike0sv opened this issue Sep 3, 2022 · 13 comments
Labels
A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions

Comments

@mike0sv
Copy link
Contributor

mike0sv commented Sep 3, 2022

There are a number inconsistencies between mlem docs and actual mlem code. Sometimes it's because of new features that we forgot to add docs for, sometimes it's fixes in docs that are not reflected in mlem code. To make everything as consistent as possible, I suggest to auto-generate everything we can.
Of course, a big chunk of docs will remain hand-crafted. I am talking about parts of reference pages for API, CLI and upcoming Objects.

Ideal process:

  • for specific part of docs (cli/api/etc) a specification is generated from mlem codebase in a form of json file. It contains all docs-related stuff from code (docstrings, help messages etc). It's generated from latest mlem version in CI
  • in .md files a special "generate" expression is used (kind like this)
  • in CI (or maybe even in realtime) those expressions are substituted for actual docs generated from spec

For now:

  • same as above, but manually (not CI)
  • .md's stays the same
  • special script finds parts of .md files that should be autogenerated and replaces their contents with generated from spec
  • script runs locally, final .md committed

I will start with CLI for iterative/mlem#363 and create a PR shortly with examples

@mike0sv mike0sv added the type: enhancement Something is not clear, small updates, improvement suggestions label Sep 3, 2022
@shcheklein
Copy link
Member

It has been discussed a few times iterative/dvc.org#2770 before.

My take - this approach creates pretty bad docs (if you could run mlem something --help and get the same result, you don't need docs at all) or requires significant maintenance (proper docstrings that are hard actually to make so that they satisfy all the requirements that we have for docs - e.g. admons, code blocks, etc, etc).

@shcheklein
Copy link
Member

Even for python API (that makes more sense to me to generate), afair team decided to keep very simple in the code and longer descriptions / examples in docs.

@mike0sv
Copy link
Contributor Author

mike0sv commented Sep 5, 2022

Yes, the idea is to force the simple part to be the same in docs and in code. Longer descriptions and examples will be only in docs with all those fancy md things.
We already have tests in code for all classes, options and fields to have docstrings.
Just doing PR showed that we 1) had couple of commands left out of the docs because we forgot to add them 2) had a couple of cli options in docs that we deleted from code 3) docs team fixed a lot of wording/spelling/punctuation in docs that was not backported to code (I did the backporting manually in iterative/mlem#363)
And you can see that beside those discrepancies, PR didnt actually change anything else like formatting. So it should be best of both worlds - handcrafted docs with automation that checks if they are up to date

@mike0sv
Copy link
Contributor Author

mike0sv commented Sep 5, 2022

From iterative/dvc.org#2770 (comment) : my intention is exactly what @casperdcl wrote in the end: generate subset of (2) from (1)

@shcheklein
Copy link
Member

  1. had couple of commands left out of the docs because we forgot to add them
  2. had a couple of cli options in docs that we deleted from code

This can be solved by introducing a check. No need to generate or keep source code as a source for docs. I'm not sure how valuable everything else. Tbh from my experience it's still quite rare that we would even benefit from a checks like those.

docs team fixed a lot of wording/spelling/punctuation in docs that was not backported to code (I did the backporting manually in iterative/mlem#363)

this is minor. Usually major work is done by writing proper description of those options. Point here is - if you they are the same as --help and auto generated, you don't need them at all. In DVC they are far from being the same.

generate subset of (2) from (1)

I'm not sure it's possible tbh, unless I'm missing something. Usually 2 looks quite different from 1.

@casperdcl
Copy link

casperdcl commented Sep 5, 2022

I think it could be helpful to have a basic CI check that e.g. cml <command> --help lists the same options as show up in the bullet points in https://cml.dev/doc/ref/<command>#options for example...

@mike0sv
Copy link
Contributor Author

mike0sv commented Sep 5, 2022

The check you are talking about is almost the same as what I propose. To implement this check there are 2 ways: parse existing options section, find what options are there and compare with what --help have (extracting them from typer (click) api is even easier), or generate this section from code and compare with existing text. Second approach allow to use same code avoid re-writing all of this manually. If you are not happy with what was generated, you can always fix text in docstings or formatting in generator code

@mike0sv
Copy link
Contributor Author

mike0sv commented Sep 5, 2022

I'm not sure it's possible tbh, unless I'm missing something.

Mmm probably you are. I'm not talking about something like mlem cmd --help > cli-reference/cmd.md
Please take a look at #172

@shcheklein
Copy link
Member

@mike0sv how do you envision the workflow for this though? it should be a check that you run regularly anyway and then either you generate boilerplate automatically as a PR or fix it manually. If you don't automate this then who is responsible running this.

Anyways, my point is that from my experience this takes time to automate, takes time to maintain, etc, etc and in case of DVC was not solving much. Most of the work goes into writing meaningful option descriptions (neither --help nor docstrings give them).

Mmm probably you are. I'm not talking about something like mlem cmd --help > cli-reference/cmd.md

I understand that it doesn't generate the whole md file, it generates some parts of it, right? (not sure if it keeps or not options that already exist). And that's exactly what I was talking about- It drives bad docs to my mind.


For every PR in code that changes API / CLI we should be creating a proper PR with docs update. It should have examples, proper description (--help doesn't give it). This process guarantees that we have meaningful docs. Automation can help to check for discrepancies (e.g. run by cron) or bootstrap it the first time (similar to #172).

@jorgeorpinel jorgeorpinel added the A: docs Area: user documentation (gatsby-theme-iterative) label Sep 6, 2022
@omesser omesser changed the title Semi-autogerated docs Semi-autogenerated docs Sep 13, 2022
@omesser
Copy link
Contributor

omesser commented Sep 14, 2022

@shcheklein I understand the concern about docstrings contents restricting the doc site content 🙏 we discussed this offline as well. But it doesn't have to be this way imo. So it's very possible to achieve some automation here without any "new" workflow that would reduce quality.
The current alternative is that things become obsolete or are just plain dropped and forgotten, so I think this is undoubtably worse 😄

And that's exactly what I was talking about- It drives bad docs to my mind.

I think it doesn't have to. The generated docs are definitely better than nothing, even if they are just a skeleton for more examples / fleshed out content which requires time and attention. So it's not against that, but automating the repetitive content at least

This is the way I see it at least. So I do suggest we give this a try, dvc docs are more stable and 99% goes to handcrafted content, but mlem is in a different stage and things are more dynamic, this can potentially help guard us from drift between docsite and tool.

For this to be effective I also think we want to automate this somehow - run in a cronjob and generate a suggestion PR every week or so. would be a good reminder and even if not mergable, and we need a man-in-the-loop, it can provide the skeleton for the changes

@shcheklein
Copy link
Member

TL;DR: I'm fine to automate and try (but keep in mind we are spending time on this :) ).

The generated docs are definitely better than nothing, even if they are just a skeleton for more examples / fleshed out content which requires time and attention. So it's not against that, but automating the repetitive content at least

yes. But this is about bootstrapping pretty much? After the project is more or less stable I found it's hard to justify this level of automation (I mean making more and more sophisticated scripts to merge / embed, etc, etc). Everything can be done, but it has its own cost. While 99% time in docs goes into writing content. Creating manually a PR that just copy-pastes things when you change a command is not painful at all unless you change something every day (I doubt that it will be happening).


A bit of reflection on my approach / my thoughts.

  • Personal perception. I'm quite annoyed when I come to docs and only thing I see is a copy-paste of some existing content (I got in an IDE already, or I got it in CLI already). My feeling is exactly like that "folks automated and forgot about this since it's good enough". My feeling usually is that creators don't try to make my life easier.
  • It creates a false feeling of completeness / existence of docs, and there will be less incentive to allocating time on improving it. I hope an alternative can be really light weight (e.g. you do one option per week, one small document per week, etc) and it can get us very far.
  • Writing is an essential and super important skill for every engineer.

@casperdcl
Copy link

casperdcl commented Sep 15, 2022

I agree the automation might make more sense only for unique tools like CML where most pple don't download it to run --help locally. But even CML's online command ref goes a bit further than the CLI output... it has better markdown formatting, hyperlinks & URLs to more info, etc.

I'd only automate checking that all subcommands and --options exist in the command ref, but not checking the descriptions/wording.

@ryanjdillon
Copy link

ryanjdillon commented Jul 3, 2023

As a potential new user, I find the Python API docs on mlem.ai difficult to work with, as they are not up to date, and could benefit from further typehinting.

For example:
The docs in the code have corrected typos, which make them more intelligible, and only by looking there could I find that fs is defined by fsspec and see what filesystems are supported.

mlem.api.save on mlem.ai
mlem.api.save in code

While I understand this requires some additional dev work, it may be worth the prioritization. In my case, I am evaluating using mlem/dvc/gto for a model registry, after which I'd like to evaluate Interactive Studio, but I need to get through the docs first ;)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions
Projects
None yet
Development

No branches or pull requests

6 participants