Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent code formatting / minor fixes to vignettes #6521

Merged
2 changes: 1 addition & 1 deletion vignettes/datatable-faq.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -632,7 +632,7 @@ Yes, for both 32-bit and 64-bit on all platforms. Thanks to CRAN. There are no s
## I think it's great. What can I do?
Please file suggestions, bug reports and enhancement requests on our [issues tracker](https://github.com/Rdatatable/data.table/issues). This helps make the package better.

Please do star the package on [GitHub](https://github.com/Rdatatable/data.table/wiki). This helps encourage the developers and helps other R users find the package.
Please do star the package on [GitHub](https://github.com/Rdatatable/data.table). This helps encourage the developers and helps other R users find the package.

You can submit pull requests to change the code and/or documentation yourself; see our [Contribution Guidelines](https://github.com/Rdatatable/data.table/blob/master/.github/CONTRIBUTING.md).

Expand Down
10 changes: 5 additions & 5 deletions vignettes/datatable-importing.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ h2 {
}
</style>

This document is focused on using `data.table` as a dependency in other R packages. If you are interested in using `data.table` C code from a non-R application, or in calling its C functions directly, jump to the [last section](#non-r-API) of this vignette.
This document is focused on using `data.table` as a dependency in other R packages. If you are interested in using `data.table` C code from a non-R application, or in calling its C functions directly, jump to the [last section](#non-r-api) of this vignette.

Importing `data.table` is no different from importing other R packages. This vignette is meant to answer the most common questions arising around that subject; the lessons presented here can be applied to other R packages.

Expand All @@ -27,11 +27,11 @@ One of the biggest features of `data.table` is its concise syntax which makes ex

It is very easy to use `data.table` as a dependency due to the fact that `data.table` does not have any of its own dependencies. This applies both to operating system and to R dependencies. It means that if you have R installed on your machine, it already has everything needed to install `data.table`. It also means that adding `data.table` as a dependency of your package will not result in a chain of other recursive dependencies to install, making it very convenient for offline installation.

## `DESCRIPTION` file {DESCRIPTION}
## `DESCRIPTION` file {#DESCRIPTION}

The first place to define a dependency in a package is the `DESCRIPTION` file. Most commonly, you will need to add `data.table` under the `Imports:` field. Doing so will necessitate an installation of `data.table` before your package can compile/install. As mentioned above, no other packages will be installed because `data.table` does not have any dependencies of its own. You can also specify the minimal required version of a dependency; for example, if your package is using the `fwrite` function, which was introduced in `data.table` in version 1.9.8, you should incorporate this as `Imports: data.table (>= 1.9.8)`. This way you can ensure that the version of `data.table` installed is 1.9.8 or later before your users will be able to install your package. Besides the `Imports:` field, you can also use `Depends: data.table` but we strongly discourage this approach (and may disallow it in future) because this loads `data.table` into your user's workspace; i.e. it enables `data.table` functionality in your user's scripts without them requesting that. `Imports:` is the proper way to use `data.table` within your package without inflicting `data.table` on your user. In fact, we hope the `Depends:` field is eventually deprecated in R since this is true for all packages.

## `NAMESPACE` file {NAMESPACE}
## `NAMESPACE` file {#NAMESPACE}

The next thing is to define what content of `data.table` your package is using. This needs to be done in the `NAMESPACE` file. Most commonly, package authors will want to use `import(data.table)` which will import all exported (i.e., listed in `data.table`'s own `NAMESPACE` file) functions from `data.table`.

Expand Down Expand Up @@ -195,7 +195,7 @@ For more canonical documentation of defining packages dependency check the offic

Some of internally used C routines are now exported on C level thus can be used in R packages directly from their C code. See [`?cdt`](https://rdatatable.gitlab.io/data.table/reference/cdt.html) for details and [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) _Linking to native routines in other packages_ section for usage.

## Importing from non-r Applications {non-r-api}
## Importing from non-r Applications {#non-r-api}

Some tiny parts of `data.table` C code were isolated from the R C API and can now be used from non-R applications by linking to .so / .dll files. More concrete details about this will be provided later; for now you can study the C code that was isolated from the R C API in [src/fread.c](https://github.com/Rdatatable/data.table/blob/master/src/fread.c) and [src/fwrite.c](https://github.com/Rdatatable/data.table/blob/master/src/fwrite.c).

Expand Down Expand Up @@ -275,7 +275,7 @@ result <- merge(dt, other_dt, by = "x")
```

### Benefits of using `Imports`
- **User-Friendliness*: `Depends` alters your users' `search()` path, possibly without their wanting to do so.
- **User-Friendliness**: `Depends` alters your users' `search()` path, possibly without their wanting to do so.
- **Namespace Management**: Only the functions your package explicitly imports are available, reducing the risk of function name clashes.
- **Cleaner Package Loading**: Your package's dependencies are not attached to the search path, making the loading process cleaner and potentially faster.
- **Easier Maintenance**: It simplifies maintenance tasks as upstream dependencies' APIs evolve. Depending too much on `Depends` can lead to conflicts and compatibility issues over time.
2 changes: 1 addition & 1 deletion vignettes/datatable-intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -665,7 +665,7 @@ We can do much more in `i` by keying a `data.table`, which allows for blazing fa

3. Compute on columns: `DT[, .(sum(colA), mean(colB))]`.

4. Provide names if necessary: `DT[, .(sA =sum(colA), mB = mean(colB))]`.
4. Provide names if necessary: `DT[, .(sA = sum(colA), mB = mean(colB))]`.

5. Combine with `i`: `DT[colA > value, sum(colB)]`.

Expand Down
4 changes: 2 additions & 2 deletions vignettes/datatable-reference-semantics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ DF$c <- 18:13 # (1) -- replace entire column
DF$c[DF$ID == "b"] <- 15:13 # (2) -- subassign in column 'c'
```

both (1) and (2) resulted in deep copy of the entire data.frame in versions of `R` versions `< 3.1`. [It copied more than once](https://stackoverflow.com/q/23898969/559784). To improve performance by avoiding these redundant copies, *data.table* utilised the [available but unused `:=` operator in R](https://stackoverflow.com/q/7033106/559784).
both (1) and (2) resulted in deep copy of the entire data.frame in versions of `R < 3.1`. [It copied more than once](https://stackoverflow.com/q/23898969/559784). To improve performance by avoiding these redundant copies, *data.table* utilised the [available but unused `:=` operator in R](https://stackoverflow.com/q/7033106/559784).

Great performance improvements were made in `R v3.1` as a result of which only a *shallow* copy is made for (1) and not *deep* copy. However, for (2) still, the entire column is *deep* copied even in `R v3.1+`. This means the more columns one subassigns to in the *same query*, the more *deep* copies R does.

Expand Down Expand Up @@ -247,7 +247,7 @@ head(flights)

* We use the `LHS := RHS` form. We store the input column names and the new columns to add in separate variables and provide them to `.SDcols` and for `LHS` (for better readability).

* Note that since we allow assignment by reference without quoting column names when there is only one column as explained in [Section 2c](#delete-convenience), we can not do `out_cols := lapply(.SD, max)`. That would result in adding one new column named `out_col`. Instead we should do either `c(out_cols)` or simply `(out_cols)`. Wrapping the variable name with `(` is enough to differentiate between the two cases.
* Note that since we allow assignment by reference without quoting column names when there is only one column as explained in [Section 2c](#delete-convenience), we can not do `out_cols := lapply(.SD, max)`. That would result in adding one new column named `out_cols`. Instead we should do either `c(out_cols)` or simply `(out_cols)`. Wrapping the variable name with `(` is enough to differentiate between the two cases.

* The `LHS := RHS` form allows us to operate on multiple columns. In the RHS, to compute the `max` on columns specified in `.SDcols`, we make use of the base function `lapply()` along with `.SD` in the same way as we have seen before in the *"Introduction to data.table"* vignette. It returns a list of two elements, containing the maximum value corresponding to `dep_delay` and `arr_delay` for each group.

Expand Down
4 changes: 2 additions & 2 deletions vignettes/datatable-reshape.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed = TRUE)]
DT.c1 = dcast(DT.m1, family_id + age_mother + child ~ variable, value.var = "value")
DT.c1

str(DT.c1) ## gender column is character type now!
str(DT.c1) ## gender column is class IDate now!
```

#### Issues
Expand Down Expand Up @@ -241,7 +241,7 @@ melt(two.iris, measure.vars = measure(value.name, dim, sep="."))
```

Using the code above we get one value column per flower part. If we
instead want a value column for each measurement dimension, we can do
instead want a value column for each measurement dimension, we can do:

```{r}
melt(two.iris, measure.vars = measure(part, value.name, sep="."))
Expand Down
Loading