Merge pull request #60 from JonasMoss/JOSSrevision2

Joss revision 2
JonasMoss · Aug 7, 2019 · 76bcb8d · 76bcb8d
2 parents 825d477 + 5747aa0
commit 76bcb8d
Show file tree

Hide file tree

Showing 7 changed files with 55 additions and 43 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -16,15 +16,17 @@ Description: Handles univariate non-parametric density estimation with
     copula kernel of Jones & Henderson (2007) <doi:10.1093/biomet/asm068>.
     User-supplied kernels, parametric starts, and bandwidths are supported.
 License: MIT + file LICENSE
+URL: https://github.com/JonasMoss/kdensity
+BugReports: https://github.com/JonasMoss/kdensity/issues
 Encoding: UTF-8
 LazyData: true
 Suggests: extraDistr,
     SkewHyperbolic,
     testthat,
     covr,
-    EQL,
     knitr,
     rmarkdown
-Imports: assertthat
+Imports: assertthat,
+    EQL
 RoxygenNote: 6.1.1
 VignetteBuilder: knitr
diff --git a/R/builtin_bandwidths.R b/R/builtin_bandwidths.R
@@ -69,8 +69,6 @@ bw_environment$JH = function(x, kernel = NULL, start = NULL, support = NULL) {
 }
 
 bw_environment$RHE = function(x, kernel = NULL, start = NULL, support = NULL) {
-  assertthat::assert_that("EQL" %in% rownames(utils::installed.packages()), msg =
-                            "The bandwidth function 'RHE' requires the package 'EQL' to work.")
 
   max_degree = 5  # The maximum degree of the Hermite polynomials.
   n <- length(x)

diff --git a/R/kdensity.R b/R/kdensity.R
@@ -122,10 +122,10 @@ kdensity = function(x, bw = NULL, adjust = 1, kernel = NULL, start = NULL,
   data.name = deparse(substitute(x))
   has.na = anyNA(x)
 
-  if(has.na) {
-    if(!na.rm) stop("x contains NAs and na.rm = FALSE.")
-    x = x[!is.na(x)]
-  }
+  assertthat::assert_that(!(has.na & !na.rm),
+                          msg = "x contains NAs and na.rm = FALSE.")
+
+  x = x[!is.na(x)] # This line is reached only if (has.na & !na.rm) is FALSE.
 
   ## 'kernel', 'start' and 'bw' can be custom made: In this case, they must
   ## be added to their environments.
@@ -149,7 +149,7 @@ kdensity = function(x, bw = NULL, adjust = 1, kernel = NULL, start = NULL,
   ## The case of bw == Inf is special! In this case, we return the parametric
   ## start itself.
 
-  ## Now we massage and handle the combinations of kernel, start and support.
+  ## Now we handle the combinations of kernel, start and support.
   ## This is fancy defaults management.
   kss_list = get_kernel_start_support(kernel, start, support)
 

diff --git a/README.Rmd b/README.Rmd
@@ -32,10 +32,12 @@ knitr::opts_chunk$set(out.width='750px', dpi=200)
 ## Overview
 kdensity is an implementation of univariate kernel density estimation with support for parametric starts and asymmetric kernels. Its main function is `kdensity`, which is has approximately the same syntax as `stats::density`. Its new functionality is:
 
-* `kdensity` has built-in support for many *parametric starts*, such as `normal` and `gamma`, but you can also supply your own. 
-*  It supports several asymmetric kernels ones such as `gcopula` and `gamma` kernels, but also the common symmetric ones. In addition, you can also supply your own kernels. 
-* A selection of choices for the bandwidth function `bw`, again including an option to specify your own.
-* The returned value is callable: The density estimator returns a density function when called.
+* `kdensity` has built-in support for many *parametric starts*, such as `normal` 
+  and `gamma`,   but you can also supply your own. 
+* It supports several asymmetric kernels ones such as `gcopula` and `gamma` kernels, but also     the common symmetric ones. In addition, you can also supply your own kernels. 
+* A selection of choices for the bandwidth function `bw`, again including an option to specify    your own.
+* The returned value is density function. This can be used for e.g. numerical 
+  integration, numerical differentiation, and point evaluations. 
 
 A reason to use `kdensity` is to avoid *boundary bias* when estimating densities on the unit interval or the positive half-line. Asymmetric kernels such as `gamma` and `gcopula` are designed for this purpose. The support for parametric starts allows you to easily use a method that is often superior to ordinary kernel density estimation.
 
@@ -61,15 +63,17 @@ install.packages("kdensity")
 devtools::install_github("JonasMoss/kdensity")
 ```
 
-Call the `library` function and use it just like `stats:density`, but with optional additional arguments.
+## Usage Example
+Call the `library` function and use it just like `stats::density`, but with optional additional arguments.
 ```{r simpleuse, echo = TRUE, eval = FALSE}
 library("kdensity")
 plot(kdensity(mtcars$mpg, start = "normal"))
 ```
 
 ## Description
 
-Kernel density estimation with a *parametric start* was introduced by Hjort and Glad in [Nonparametric Density Estimation with a Parametric Start (1995)](https://projecteuclid.org/euclid.aos/1176324627). The idea is to start out with a parametric density before you do your kernel density estimation, so that your actual kernel density estimation will be a correction to the original parametric estimate. This is a good idea because the resulting estimator will be better than an ordinary kernel density estimator whenever the true density is close to your suggestion; and the estimator can be superior to the ordinary kernel density estimator even when the suggestion is pretty far off.
+Kernel density estimation with a *parametric start* was introduced by Hjort and Glad in [Nonparametric Density Estimation with a Parametric Start (1995)](https://projecteuclid.org/euclid.aos/1176324627). The idea is to start out with a parametric density before you do your kernel density estimation, so that your actual kernel density estimation will be a correction to the original parametric estimate. The resulting estimator will outperform the ordinary kernel density estimator in terms of asymptotic 
+integrated mean squared error whenever the true density is close to your suggestion; and the estimator can be superior to the ordinary kernel density estimator even when the suggestion is pretty far off.
 
 In addition to parametric starts, the package implements some *asymmetric kernels*. These kernels are useful when modelling data with sharp boundaries, such as data supported on the positive half-line or the unit interval. Currently we support the following asymmetric kernels:
 
@@ -85,8 +89,8 @@ These features can be combined to make asymmetric kernel densities estimators wi
 
 The function `kdensity` takes some `data`, a kernel `kernel` and a parametric start `start`. You can optionally specify the `support` parameter, which is used to find the normalizing constant.
 
-The following example uses the \code{airquality} data set plots both a gamma-kernel density estimate with a gamma start (black), the fully parametric gamma density (red), 
-and an ordinary `density` estimate (blue). Notice the boundary bias of the ordinary 
+The following example uses the \code{datasets::airquality} data set. The black curve is a gamma-kernel density estimate with a gamma start, the red curve a fully parametric gamma density
+and and the blue curve an ordinary `density` estimate. Notice the boundary bias of the ordinary 
 `density` estimator. The underlying parameter estimates are always maximum likelilood.
 
 ```{r example, echo = TRUE}
@@ -98,10 +102,13 @@ lines(density(airquality$Wind, adjust = 2), col = "blue")
 rug(airquality$Wind)
 ```
 
-Since the return value of `kdensity` is a function, it is callable, as in:
+Since the return value of `kdensity` is a function, `kde` is callable and can be
+used as any density function in `R` (such as `stats::dnorm`). For example, you can
+do:
 
 ```{r callable, echo = TRUE}
 kde(10)
+integrate(kde, lower = 0, upper = 1) # The cumulative distribution up to 1.
 ```
 
 You can access the parameter estimates by using `coef`. You can also access the log likelihood (`logLik`), AIC and BIC of the parametric start distribution.
@@ -112,10 +119,8 @@ logLik(kde)
 AIC(kde)
 ```
 ## How to Contribute or Get Help
-If you encounter a bug, have a feature request or need some help, don't hesitate
-to open an [issue](https://github.com/JonasMoss/kdensity/issues). If you want to
-contribute, make a pull request. This project follows a 
-[Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct.md).
+If you encounter a bug, have a feature request or need some help, open a [Github issue](https://github.com/JonasMoss/kdensity/issues). Create a pull requests
+to contribute. This project follows a [Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct.md).
 
 ## References
 

diff --git a/README.md b/README.md
@@ -32,8 +32,9 @@ function is `kdensity`, which is has approximately the same syntax as
     you can also supply your own kernels.
   - A selection of choices for the bandwidth function `bw`, again
     including an option to specify your own.
-  - The returned value is callable: The density estimator returns a
-    density function when called.
+  - The returned value is density function. This can be used for
+    e.g. numerical integration, numerical differentiation, and point
+    evaluations.
 
 A reason to use `kdensity` is to avoid *boundary bias* when estimating
 densities on the unit interval or the positive half-line. Asymmetric
@@ -69,7 +70,9 @@ install.packages("kdensity")
 devtools::install_github("JonasMoss/kdensity")
 ```
 
-Call the `library` function and use it just like `stats:density`, but
+## Usage Example
+
+Call the `library` function and use it just like `stats::density`, but
 with optional additional arguments.
 
 ``` r
@@ -84,11 +87,12 @@ Hjort and Glad in [Nonparametric Density Estimation with a Parametric
 Start (1995)](https://projecteuclid.org/euclid.aos/1176324627). The idea
 is to start out with a parametric density before you do your kernel
 density estimation, so that your actual kernel density estimation will
-be a correction to the original parametric estimate. This is a good idea
-because the resulting estimator will be better than an ordinary kernel
-density estimator whenever the true density is close to your suggestion;
-and the estimator can be superior to the ordinary kernel density
-estimator even when the suggestion is pretty far off.
+be a correction to the original parametric estimate. The resulting
+estimator will outperform the ordinary kernel density estimator in terms
+of asymptotic integrated mean squared error whenever the true density is
+close to your suggestion; and the estimator can be superior to the
+ordinary kernel density estimator even when the suggestion is pretty far
+off.
 
 In addition to parametric starts, the package implements some
 *asymmetric kernels*. These kernels are useful when modelling data with
@@ -126,11 +130,11 @@ The function `kdensity` takes some `data`, a kernel `kernel` and a
 parametric start `start`. You can optionally specify the `support`
 parameter, which is used to find the normalizing constant.
 
-The following example uses the  data set plots both a gamma-kernel
-density estimate with a gamma start (black), the fully parametric gamma
-density (red), and an ordinary `density` estimate (blue). Notice the
-boundary bias of the ordinary `density` estimator. The underlying
-parameter estimates are always maximum likelilood.
+The following example uses the  data set. The black curve is a
+gamma-kernel density estimate with a gamma start, the red curve a fully
+parametric gamma density and and the blue curve an ordinary `density`
+estimate. Notice the boundary bias of the ordinary `density` estimator.
+The underlying parameter estimates are always maximum likelilood.
 
 ``` r
 library("kdensity")
@@ -143,12 +147,15 @@ rug(airquality$Wind)
 
 <img src="man/figures/README-example-1.png" width="750px" />
 
-Since the return value of `kdensity` is a function, it is callable, as
-in:
+Since the return value of `kdensity` is a function, `kde` is callable
+and can be used as any density function in `R` (such as `stats::dnorm`).
+For example, you can do:
 
 ``` r
 kde(10)
 #> [1] 0.09980471
+integrate(kde, lower = 0, upper = 1) # The cumulative distribution up to 1.
+#> 1.27532e-05 with absolute error < 2.2e-19
 ```
 
 You can access the parameter estimates by using `coef`. You can also
@@ -167,11 +174,9 @@ AIC(kde)
 
 ## How to Contribute or Get Help
 
-If you encounter a bug, have a feature request or need some help, don’t
-hesitate to open an
-[issue](https://github.com/JonasMoss/kdensity/issues). If you want to
-contribute, make a pull request. This project follows a [Contributor
-Code of
+If you encounter a bug, have a feature request or need some help, open a
+[Github issue](https://github.com/JonasMoss/kdensity/issues). Create a
+pull requests to contribute. This project follows a [Contributor Code of
 Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct.md).
 
 ## References

diff --git a/tests/testthat/test_kdensity.R b/tests/testthat/test_kdensity.R
@@ -39,3 +39,5 @@ expect_error(kdensity(precip, bw = Inf))
 expect_equal(kdensity(precip, bw = Inf, start = "normal")(10), dnorm(10, mean = mean(precip), sd = sd(precip)))
 expect_equal(kdensity(precip, bw = 1)(10), kdensity(precip, bw = silly_width)(10))
 expect_error(kdensity(precip)())
+expect_error(kdensity(precip, start = "gumbel", kernel = "rectangular",
+                      bw = "ucv"))
diff --git a/vignettes/tutorial.Rmd b/vignettes/tutorial.Rmd
@@ -135,7 +135,7 @@ rug(LH)
 ```
 
 Since all the curves are in agreement, kernel density estimation appears to add
-unneccessary complexity without sufficient compensation in fit. We are justified 
+unnecessary complexity without sufficient compensation in fit. We are justified 
 in using the skew hyperbolic t-distribution if this simplifies our analysis down
 the line.