Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically generate function documentation from comments / code #12432

Open
alamb opened this issue Sep 11, 2024 · 3 comments · May be fixed by #12668
Open

Automatically generate function documentation from comments / code #12432

alamb opened this issue Sep 11, 2024 · 3 comments · May be fixed by #12668
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@alamb
Copy link
Contributor

alamb commented Sep 11, 2024

Is your feature request related to a problem or challenge?

When we add a new function to datafusion's library we have to remember to document that function in the documentation, for example in https://datafusion.apache.org/user-guide/sql/scalar_functions.html

I observed this recently in #12429 (review). This likely means we have forgotten to document some functions or that the documentation has drifted over time

Also this means the help text for various functions can only be found on the DataFusion website, and not, for example within the function itself.

It would be awesome if you could do something like this from SQL:

> DESCRIBE sqrt;

Returns the square root of a number.

sqrt(numeric_expression)

Arguments
* numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

Describe the solution you'd like

I would like:

  1. The help text / description of a ScalarUDFImpl (and AggregateUDFImpl) is available programatically (see add examples and description to scalar/aggregate functions?  #8366)
  2. The contents of the SQL reference guide was automatically generated from the source code

DataFusion already does something like this for ConfigOptions

For example, the comments in https://docs.rs/datafusion/latest/datafusion/config/struct.SqlParserOptions.html are automatically added to the documentation programatically:

Describe alternatives you've considered

I suggest this as a high level approach

  1. Add methods to the ScalarUDFImpl trait as proposed by @universalmind303 in add examples and description to scalar/aggregate functions?  #8366 ScalarUDFImpl::description and ScalarUDFImpl::sql_example
  2. Create a script and extraction program, similar to how it is does for ConfigOptions to generate the sql reference from those functions.

In terms of implementation order I would personally suggest breaking this project into smaller parts:

A first PR that does:

  1. Add the methods to the trait
  2. Move the documentation for one or two of the methods into the traits / code
  3. A script that creates some of the documentation (perhaps we could start by creating a new temporary page like https://datafusion.apache.org/user-guide/sql/scalar_functions_new.html that has only the auto generated documentation)

Then we can work in multiple PRs to port the remaining documentation over to the code (which will automatically result in the new page getting updated)

And then finally we can remove the old page when all functions are ported.

If we start working on this project, we (I) can file follow tickets to track porting the remaining functions / doing the same thing for aggregate functions, etc.

Additional context

Also, similarly, GlareDB has a way to automatically annotate functions with documentation, and @universalmind303 proposed something similar here #8366

Also, @findepi is considering implementing SHOW FUNCTIONS as part of #12144 that could also likely take advantage of this documentation if it was present

@alamb alamb added enhancement New feature or request help wanted Extra attention is needed labels Sep 11, 2024
@Omega359
Copy link
Contributor

I took a quick look at this and I believe we need to add a doc_category for each udf to be able to slot it into the appropriate section in the documentation. For example, for scalar udf's that could be math, conditional, string, etc

The doc_category fn could return either be a simple string or more properly an enum, one for each type of UDF (scalar, aggregate, window, ...)

I don't like the fn name 'doc_category' but I couldn't come up with something better.

@alamb
Copy link
Contributor Author

alamb commented Sep 17, 2024

I don't like the fn name 'doc_category' but I couldn't come up with something better.

How about "doc_description_cateogory" or "doc_type" 🤔

@Omega359
Copy link
Contributor

take

Omega359 added a commit to Omega359/arrow-datafusion that referenced this issue Sep 28, 2024
@Omega359 Omega359 linked a pull request Sep 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants