-
-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'rows affected' output when using sql engine #2051
base: master
Are you sure you want to change the base?
Conversation
also return data ("scalar numeric that specifies the number of rows affected") in case of an is_sql_update_query, and not just SELECT-like queries
and print by default a message with number of rows affected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a quick look. This looks fine overall but my main concern is about knitr writing text output. It is not the same to write a resulting table than writing text including a data result. I believe we must not presuppose what a user would like to have written as body text in its output document. Also, it seems the output text is not really relevant. Let's look into that.
First, I am all for saving data
for the result of dbExecute()
so that it can be saved in output_var
and then used later.
Regarding printing, I think it would be ok to have like a informative message()
printed, written in english and formatted as we want, but that would output like code result by default. knitr already give such messages in several places. However, it is not the same to return as body text (using results = 'asis'
), all the more a formatted text in English with no way to customize. See my inline comments for why.
If we want to stay with the idea of outputting body text (results = 'asis')
, we could offer configuration so that it can be customized. This is done already for the table when no sql.print
function is provided. options$tab.cap
can be specified to customize the table caption. Otherwise a default caption is shown.
See line 611 in the engine.R file for the SQL engine.
However, a table caption is not the same a text in a report. Personally, I would prefer we output a log-like message, like a message.
If we think customization, a user could save the result in output_var
, and format itself the text he would like to show, as he would do with any other chunk output I believe, using the variable created and inline r chunks. So is this important to print a formatted output asis ?
Also, regarding the textual output, I am thinking about your comment
it seems SQLite does not actually return the number of rows affected (apparently always returns 0). Yet, this is pretty standard in most RDBMS and it is also the default in DBI::dbExecute, so I guess this is just a quirk specific to SQLite.
If we are not sure that all RDBMS outputs a number of rows, I guess it would be puzzling to have written in your report
Number of affected rows: 0
after a update query... 🤔
From your RStudio projects (https://rstudio.cloud/project/2931829) (thanks for this by the way, that is awesome!), it seems that not all requests are returning a number of rows. For example,
DROP TABLE IF EXISTS toy;
after having created and computed on toy
also returns 0 which writes in the output as Markdown text that 0 zeros are affected. It could be puzzling to users.
I have published your HTML document into R Pubs to easily see it: https://rpubs.com/cderv/knitr-pr-2051
So overall, about the SQL engine, my thinking is :
- We should stay clause to what the DB specialized tools are doing.
DBI::DBexecute()
returns a value, and it seems fair that we allow it to be accessible to the userDBI::DBexecute()
does not print log message, and just return value. So we would in knitr do more than the tools offers. However, it makes sense to a publishing to tool to format some output (like we do with tables). But we must do this properly without assuming too much on what a user would like.
After this long review, does anyone of you two @edalfon or @yihui are sharing my concerns ?
@@ -1,3 +1,7 @@ | |||
- The `sql` engine now produces an output for update-like SQL queries, indicating the number of rows affected (as returned by DBI::dbExecute). So far, `knitr` produces a nicely formatted output for queries returning a result set (typically SELECT...), but gives no feedback at all for queries making changes to the DB (e.g. INSERT|UPDATE|DELETE|CREATE|DROP; see not exported function knitr:::is_sql_update_query). For such queries, `knitr` now shows the number of affected rows. You can also set the chunk option `output.var` to assign the number of affected rows to a variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so you know, this should be under the last header 1 you see.
1.35 is the next version so this should be moved under.
I'll do it before merging.
if (is.numeric(data) && length(data) == 1 && is.null(varname)) { | ||
options$results = 'asis' | ||
# format = "fg" instead of "d". Row counts on DB may be greater than integer max value | ||
output = paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ',')) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow the style of the previous output assignment, and making it more clear about the different output possible, I would use a else
here for the previous if
if (is.numeric(data) && length(data) == 1 && is.null(varname)) { | |
options$results = 'asis' | |
# format = "fg" instead of "d". Row counts on DB may be greater than integer max value | |
output = paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ',')) | |
} | |
else if (is.numeric(data) && length(data) == 1 && is.null(varname)) { | |
options$results = 'asis' | |
# format = "fg" instead of "d". Row counts on DB may be greater than integer max value | |
paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ',')) | |
} |
@@ -633,6 +633,11 @@ eng_sql = function(options) { | |||
|
|||
} else print(display_data) # fallback to standard print | |||
}) | |||
if (is.numeric(data) && length(data) == 1 && is.null(varname)) { | |||
options$results = 'asis' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know if we want to force results = 'asis'
here. The user would have not control over it.
I understand why it makes sens to have it as markdown text in the output document, but it may not please everyone who would prefer to have it code output.
I don't really know but usually users have control over how the output is show. 🤔
@yihui what do you think ?
if (is.numeric(data) && length(data) == 1 && is.null(varname)) { | ||
options$results = 'asis' | ||
# format = "fg" instead of "d". Row counts on DB may be greater than integer max value | ||
output = paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ',')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to previous comment, just wondering about internationalization:
- We are enforcing a english sentence to be output in a report. This is not perfect for non English report. With
result = 'asis'
, this sentence would be shown as usual text. If I am writing a French report, that would be weird to have such English output in the middle of other text, wouldn't it ?
That is why at least, not enforcing asis output seems better to me - Also, about
big.mark = ","
, I am just wondering if this is necessary. It is not something usual worldwide, and could even be confusing for some international user. For example, we don't have that in Europe, we often have no big mark or a space. A comma is used as decimal too. However, the context of number of rows makes12,500
clear enough for a european even if it can be confusing. We would use12 500
or just12500
. Is it something common for such update query output ?
Maybe we could / should make at least all that configurable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree!
What I did was trying to keep it similar to how the caption is formatted (line 614 on the same file)
rows_formatted = formatC(rows, format = "d", big.mark = ',')
but you are right that in that case, the user has options to configure the output
Many thanks for reviewing this. I agree with pretty much everything you said. In particular, that saving And the reason I feel less strongly about those concerns is that I think the most common use cases for this output are:
But I did not think such an output would be mainly used in -let's say- a public facing document (e.g. final report or the like). In that case, as you said, the users can set But for the two use cases above, I thought a quick default without configuring anything else would be useful (the user simply keep using a I totally share your concerns. It's not cool forcing |
a note on this
My bad here!!! SQLite do also return the number of rows affected. I wrote that after a couple of failed attempts, but I cannot really reproduce them now (perhaps I just looked at the results of DROP TABLE, which are in fact usually 0). Having said that, |
Adds a default output for
sql
chunks whereis_sql_update_query
evaluates to TRUE, as proposed in #2050.It does it simply by:
DBI::DBI::dbExecute
's result todata
, so that users can setoutput.var
and get the number of affected rows assigned to a variable.I included a line in
NEWS.md
describing this. However, I did not include a test. I was not familiar withknitr
's testing procedures, but I learned from this recent PR (#1987) that there is one major test forknitr
's sql engine in another repository (https://raw.githubusercontent.com/yihui/knitr-examples/master/115-engine-sql.Rmd). This already covers the changes in this PR. I used it to test the changes and everything works. So I didn't really think it was necessary to include additional tests.A final note: as discussed in knitr's issue #1637, I think a similar change would be needed on RStudio's side, to make it also work when running chunks interactively (RStudio's Run current chunk, Run all, etc.). I haven't looked into that, though, and I have no idea who to poke for this, but I'd be happy to take a look.
Huge thanks for
knitr
and hope this helps a tiny bit.