-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More examples in docstrings #321
Comments
We should improve docstrings, but in the mean time maybe this https://bkamins.github.io/julialang/2021/11/19/dfm.html would help you? |
Ah yes, these are good, thank you @bkamins. To be fair the referred I guess the docstring for """
@rtransform(x, args...)
Row-wise version of @transform, i.e. all operations use @byrow by default. See @transform for details.
### Example
```jldoctest
julia> df = DataFrame(x=1:5, y=11:15)
5×2 DataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1 11
2 │ 2 12
3 │ 3 13
4 │ 4 14
5 │ 5 15
julia> @rtransform(df, :a = 2 * :x, :b = :x * :y ^ 2)
5×4 DataFrame
Row │ x y a b
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 11 2 121
2 │ 2 12 4 288
3 │ 3 13 6 507
4 │ 4 14 8 784
5 │ 5 15 10 1125 """
|
I should definitely submit a PR. This is something I run into all the time, it would be fantastic to be in the habit of making contributions that people in the data engineering DE community would enjoy. |
Looks good. Thank you! |
Yes, this would be appreciated. But to clarify the problem some more, did you do |
Even if |
@pdeffebach I did So, yes I couldn't be bothered but at least it was for work reasons. A little more about my workflow: I come from a SQL and Scala Spark background in my work, and a couple years ago I decided to 1) expunge all usage of excel and 2) incorporate julia into my work. This is swimming against the tide in a big way, since colleagues and the industry are I've had success in 2021 incorporating julia at my job, developing workflows in production with containerized environments. It's really a pleasure, especially the DataFrames syntax but additionally the package management. It comes with a lot of up front cost, like this process of learning DataFramesMeta commands, totally worth it in my view. It would have been so convenient to see an example right in the REPL docs, so I'll contribute by filling in those gaps where I hit them. |
Thanks for the background. Yes, makes sense. No use making people play Zork for docs. Please submit a PR adding examples! |
PR created! |
I don't understand the cause of the doctest failure, I'll read up on this. Probably a bit of missing syntax? |
I'm very happy to see the REPL examples in there, I think they are more effective than googling, reading doc pages, etc because of the diverted attention. Now I've spent some time figuring some things out I made personal notes of the following equivalent dataframe tranformations. I needed column value assignments conditional on other rows. This shows how convenient and readable DataFramesMeta can be: df = DataFrame(flag = [0, 1, 0, 1, 0, 1]
, amt = [19.00, 11.00, 35.50, 32.50, 5.99, 5.99]
, qty = [1, 4, 1, 3, 21, 109]
, item = ["B001", "B001", "B020", "B020", "BX00", "BX00"]
, day = Date.(["2021-01-01", "2021-01-01", "2112-12-12", "2020-10-20", "2021-05-04", "1984-07-04"])
)
6×5 DataFrame
Row │ flag amt qty item day
│ Int64 Float64 Int64 String Date
─────┼───────────────────────────────────────────
1 │ 0 19.0 1 B001 2021-01-01
2 │ 1 11.0 4 B001 2021-01-01
3 │ 0 35.5 1 B020 2112-12-12
4 │ 1 32.5 3 B020 2020-10-20
5 │ 0 5.99 21 BX00 2021-05-04
6 │ 1 5.99 109 BX00 1984-07-04
@rtransform(df
, :Tax = :flag * 0.11 * :amt
, :Discount = :item == "B020" ? -0.25 * :amt : 0
)
transform(df
, [:flag, :amt] => ByRow((x,y) -> x * 0.11 * y) => :Tax
, [:item, :amt] => ByRow((x,y) -> x == "B020" ? -0.25 * y : 0) => :Discount
)
transform(df
, [:flag, :amt] => ((x,y) -> x * 0.11 .* y) => :Tax
, [:item, :amt] => ((x,y) -> (x .== "B020") * -0.25 .* y ) => :Discount
)
6×7 DataFrame
Row │ flag amt qty item day Tax Discount
│ Int64 Float64 Int64 String Date Float64 Float64
─────┼──────────────────────────────────────────────────────────────
1 │ 0 19.0 1 B001 2021-01-01 0.0 -0.0
2 │ 1 11.0 4 B001 2021-01-01 1.21 -0.0
3 │ 0 35.5 1 B020 2112-12-12 0.0 -8.875
4 │ 1 32.5 3 B020 2020-10-20 3.575 -8.125
5 │ 0 5.99 21 BX00 2021-05-04 0.0 -0.0
6 │ 1 5.99 109 BX00 1984-07-04 0.6589 -0.0
# OK I haven't figured out the broadcast operation with ternary operator, however the dfs pass `==` test. I wonder if this example of comparative constructions would be useful in the DataFramesMeta documentation page? I really struggled to figure this out, but it looks so obvious now. |
This is mentioned in certain places. Check out the first code block here. A PR on this section would be welcomed. I don't want to make the translations too prominent at the beginning because I don't want new users to get too intimidated. My ideal user is probably a first year masters student in the social sciences who is programming for the first time. It would be great to work on a PR for this in detail, but with those constraints in mind. Additionally, remember
|
This is great. The more time I spend in Julia the better I like it. I think what the DataFramesMeta docs pages are missing is a simple front page that shows clear examples of how easy the syntax is to formulate for common tasks. Also a clear message about the mission of the package. The difficulty from the new user's perspective:
Here's the first sentence of the Introduction on the repo REAME:
As a non-expert user, especially not knowing much about meta-programming, this already looks too advanced for me. I recommend something more immediately obvious by saying something like:
df = DataFrame(x=1:5, y=11:15)
# DataFramesMeta syntax via assignment
@rtransform(df, :y = :x == 1 ? true : false)
# DataFrames typical pairs selector syntax with the ByRow() helper and anonymous function
transform(df, :x => ByRow(x -> x == 1 ? true : false) => :y) This would make it easy to see what the purpose of this package is, I think. |
Thanks! Ill take a look later today. |
It would be very helpful if each macro/command had an example in the docstring.
For example, I've been having a lot of trouble using
@rtranform
to make a new column based on conditional aspects of other columns at the row level. The doctring in the REPL is:I didn't find the help I needed here, a good example would keep me moving along with my work.
I'm not a computer scientist, and its time-intensive to unravel how these excellent tools are implemented. I've invested time to apply julia in my work vs. the python ecosystem because of its great qualities, however the most consistent hurdle for me is the lack of examples.
Perhaps the usage is obvious to package developers, but for more pedestrian types like me nothing could be more illuminating than a good example.
The text was updated successfully, but these errors were encountered: