Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Generate docs from macros. #12822

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

WIP: Generate docs from macros. #12822

wants to merge 11 commits into from

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Oct 8, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added common Related to common crate functions labels Oct 8, 2024
@comphead
Copy link
Contributor Author

comphead commented Oct 8, 2024

@Omega359 @alamb I tried to play with custom attributes to wrap up the documentation on top of the what @Omega359 already built. I'm experimenting with just 2 fields(description and examples) and the attribute has to be placed on top of the struct.

Cargo.toml Outdated
@@ -48,6 +48,8 @@ members = [
"datafusion-examples",
"test-utils",
"benchmarks",
"datafusion/macros",
"datafusion/pre-macros"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre macros needed to have Documentation structure already precreated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we go with this approach, maybe we can put the doc structure in a crate named specifically such as datafusion/documentation rather than something generic like pre-macros

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named it doc-gen as documentation sort of misleading, users gonna think this is a crate user documentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proc-macros would also be a standard rust name for this kind of hting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to make it proc-macro as a crate with all potential procedural macros in DF. Afaik we dont have other custom procedural macros?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any custom procedural macros, but I was thinking we could / should probably vendor (copy) the recursive one to reduce our dependency chain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following.... you mean copy stacker/recurisive macros introduced recently but this change not related to this PR?

@@ -37,6 +38,7 @@ use datafusion_expr::{
};
use datafusion_expr::{ScalarUDFImpl, Signature, Volatility};

#[udf_doc(description = "log_description", example = "log_example")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we put the documentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite cool ❤️

One thing I would be interested in seeing is if/how a multi line description and example would look. I have had trouble with multi-line macro / docs in the past

@@ -472,4 +471,16 @@ mod tests {
SortProperties::Unordered
);
}

#[test]
fn test_doc() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the test shows it applied

@github-actions github-actions bot removed the common Related to common crate label Oct 8, 2024
@comphead
Copy link
Contributor Author

comphead commented Oct 8, 2024

WDYT should we move in this direction?

@Omega359
Copy link
Contributor

Omega359 commented Oct 9, 2024

I find this intriguing and technically very interesting! I do have a few questions:

  • What are you goals with this? I am assuming cleaner looking/easier to create docs. If so I would wonder how it would help in cases were there is significant amounts of docs vs boilerplate such as in the to_date udf.
  • Can it be made to handle n arguments?
  • Is it possible to generate rustdoc with this approach?
  • I noticed that this approach creates a Documentation object on every call. It's obviously pretty minor but it would be nice if it could be static.

To me in general while I find this quite interesting I wonder if the benefit outweighs the added complexity of the code generation. To be clear I am absolutely not against this approach - I just have a few concerns.

@comphead
Copy link
Contributor Author

comphead commented Oct 9, 2024

I find this intriguing and technically very interesting! I do have a few questions:

  • What are you goals with this? I am assuming cleaner looking/easier to create docs. If so I would wonder how it would help in cases were there is significant amounts of docs vs boilerplate such as in the to_date udf.
  • Can it be made to handle n arguments?
  • Is it possible to generate rustdoc with this approach?
  • I noticed that this approach creates a Documentation object on every call. It's obviously pretty minor but it would be nice if it could be static.

To me in general while I find this quite interesting I wonder if the benefit outweighs the added complexity of the code generation. To be clear I am absolutely not against this approach - I just have a few concerns.

Thanks @Omega359
The idea is to reuse your current approach but make it on the level of custom Rust attribute instead of having the documentation directly in the code. So the steps can look like:

  • Move Documentation structure to pre-macros crate
  • generate the code from macros instead of having it statically
  • we can have it static everything will be the same as its now, the only difference is the code will be created through the macros.

I'm planning to play with to_date function as it covers also multiline comments as @alamb mentioned

@comphead comphead mentioned this pull request Oct 10, 2024
@comphead
Copy link
Contributor Author

comphead commented Oct 10, 2024

@alamb @Omega359 please have a look on real example to_date (I still need to include argements to be called with the builder). As you can see it is the same approach as before, the only difference the documentation is set by attributes rather than code, but it generates documentation builders as before.

Btw multiline seems to be working

// under the License.

#[derive(Debug, Clone)]
pub struct DocumentationTest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied all Documentation structures here, added Test so as not to break everything

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great personally

As long as the macro has enough documentation itself, I think this would help reduce a bunch of the boiler plate (though perhaps not all)

Cargo.toml Outdated
@@ -48,6 +48,8 @@ members = [
"datafusion-examples",
"test-utils",
"benchmarks",
"datafusion/macros",
"datafusion/pre-macros"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we go with this approach, maybe we can put the doc structure in a crate named specifically such as datafusion/documentation rather than something generic like pre-macros

use std::any::Any;
use std::sync::OnceLock;

#[udf_doc(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very cool

@Omega359 what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dumped the example #[udf_doc ...] into my editor to do a direct comparison to the existing solution and to be honest it definitely is nicer to visually parse.

There are situations currently where this approach will not work, at least as it currently seems to be implemented. For example: bit_and_or_xor, some of the regex udfs, etc.

datafusion/macros/Cargo.toml Outdated Show resolved Hide resolved
@comphead
Copy link
Contributor Author

comphead commented Oct 11, 2024

Thanks @alamb I'll consider the changes, @Omega359 is it good to have a documentation method on struct level instead of impl?

@Omega359
Copy link
Contributor

@Omega359 is it good to have a documentation method on struct level instead of impl?

I can't think of a reason why not off the top of my head. I'm hoping to have time to look over your latest updates in this PR later tonight

use std::any::Any;
use std::sync::OnceLock;

#[udf_doc(
doc_section(include = "true", label = "Time and Date Functions"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about this since it's almost certainly going to drift at some point in the future via typos, etc. Is it possible to reference the const's in scalar_doc_sections/window_doc_sections/aggregate_doc_sections ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if enum/constant can be used directly within the attribute, however there is a possibility to check the value in macro. So if the value is not satisfied to existing enum variants the macro can throw an exception with explaining what section names are supported

+---------------------------------------------------------------+\n\
```\n\n\
Additional examples can be found [here](https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/to_date.rs)",
standard_argument(name = "expression", expression_type = "String"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely better as expression_type -> prefix

let mut syntax_example: Option<LitStr> = None;
let mut sql_example: Option<LitStr> = None;
let mut standard_args: Vec<(Option<LitStr>, Option<LitStr>)> = vec![];
let mut udf_args: Vec<(Option<LitStr>, Option<LitStr>)> = vec![];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be missing related udfs here.

@comphead
Copy link
Contributor Author

Thanks @Omega359 there is bunch of things needed to be added, its not a final PR. I'll add some minor missing pieces and the PR will be ready for the review, most likely next week. Because I need to make a cleanup, remove test structures, polish crates and everything.

@comphead comphead marked this pull request as draft October 11, 2024 19:44
@comphead
Copy link
Contributor Author

I'll get back on it little later as need to finish some more urgent work with joins

@comphead
Copy link
Contributor Author

I'm closer to this

@comphead
Copy link
Contributor Author

We should also take care on handing things like #13367

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants