diff --git a/DESCRIPTION b/DESCRIPTION index abcb1f1..5ab2119 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -16,15 +16,17 @@ Description: Handles univariate non-parametric density estimation with copula kernel of Jones & Henderson (2007) . User-supplied kernels, parametric starts, and bandwidths are supported. License: MIT + file LICENSE +URL: https://github.com/JonasMoss/kdensity +BugReports: https://github.com/JonasMoss/kdensity/issues Encoding: UTF-8 LazyData: true Suggests: extraDistr, SkewHyperbolic, testthat, covr, - EQL, knitr, rmarkdown -Imports: assertthat +Imports: assertthat, + EQL RoxygenNote: 6.1.1 VignetteBuilder: knitr diff --git a/R/builtin_bandwidths.R b/R/builtin_bandwidths.R index 2bb39fd..846efde 100644 --- a/R/builtin_bandwidths.R +++ b/R/builtin_bandwidths.R @@ -69,8 +69,6 @@ bw_environment$JH = function(x, kernel = NULL, start = NULL, support = NULL) { } bw_environment$RHE = function(x, kernel = NULL, start = NULL, support = NULL) { - assertthat::assert_that("EQL" %in% rownames(utils::installed.packages()), msg = - "The bandwidth function 'RHE' requires the package 'EQL' to work.") max_degree = 5 # The maximum degree of the Hermite polynomials. n <- length(x) diff --git a/R/kdensity.R b/R/kdensity.R index dda559a..a5086be 100644 --- a/R/kdensity.R +++ b/R/kdensity.R @@ -122,10 +122,10 @@ kdensity = function(x, bw = NULL, adjust = 1, kernel = NULL, start = NULL, data.name = deparse(substitute(x)) has.na = anyNA(x) - if(has.na) { - if(!na.rm) stop("x contains NAs and na.rm = FALSE.") - x = x[!is.na(x)] - } + assertthat::assert_that(!(has.na & !na.rm), + msg = "x contains NAs and na.rm = FALSE.") + + x = x[!is.na(x)] # This line is reached only if (has.na & !na.rm) is FALSE. ## 'kernel', 'start' and 'bw' can be custom made: In this case, they must ## be added to their environments. @@ -149,7 +149,7 @@ kdensity = function(x, bw = NULL, adjust = 1, kernel = NULL, start = NULL, ## The case of bw == Inf is special! In this case, we return the parametric ## start itself. - ## Now we massage and handle the combinations of kernel, start and support. + ## Now we handle the combinations of kernel, start and support. ## This is fancy defaults management. kss_list = get_kernel_start_support(kernel, start, support) diff --git a/README.Rmd b/README.Rmd index 7d3512c..37098e7 100644 --- a/README.Rmd +++ b/README.Rmd @@ -32,10 +32,12 @@ knitr::opts_chunk$set(out.width='750px', dpi=200) ## Overview kdensity is an implementation of univariate kernel density estimation with support for parametric starts and asymmetric kernels. Its main function is `kdensity`, which is has approximately the same syntax as `stats::density`. Its new functionality is: -* `kdensity` has built-in support for many *parametric starts*, such as `normal` and `gamma`, but you can also supply your own. -* It supports several asymmetric kernels ones such as `gcopula` and `gamma` kernels, but also the common symmetric ones. In addition, you can also supply your own kernels. -* A selection of choices for the bandwidth function `bw`, again including an option to specify your own. -* The returned value is callable: The density estimator returns a density function when called. +* `kdensity` has built-in support for many *parametric starts*, such as `normal` + and `gamma`, but you can also supply your own. +* It supports several asymmetric kernels ones such as `gcopula` and `gamma` kernels, but also the common symmetric ones. In addition, you can also supply your own kernels. +* A selection of choices for the bandwidth function `bw`, again including an option to specify your own. +* The returned value is density function. This can be used for e.g. numerical + integration, numerical differentiation, and point evaluations. A reason to use `kdensity` is to avoid *boundary bias* when estimating densities on the unit interval or the positive half-line. Asymmetric kernels such as `gamma` and `gcopula` are designed for this purpose. The support for parametric starts allows you to easily use a method that is often superior to ordinary kernel density estimation. @@ -61,7 +63,8 @@ install.packages("kdensity") devtools::install_github("JonasMoss/kdensity") ``` -Call the `library` function and use it just like `stats:density`, but with optional additional arguments. +## Usage Example +Call the `library` function and use it just like `stats::density`, but with optional additional arguments. ```{r simpleuse, echo = TRUE, eval = FALSE} library("kdensity") plot(kdensity(mtcars$mpg, start = "normal")) @@ -69,7 +72,8 @@ plot(kdensity(mtcars$mpg, start = "normal")) ## Description -Kernel density estimation with a *parametric start* was introduced by Hjort and Glad in [Nonparametric Density Estimation with a Parametric Start (1995)](https://projecteuclid.org/euclid.aos/1176324627). The idea is to start out with a parametric density before you do your kernel density estimation, so that your actual kernel density estimation will be a correction to the original parametric estimate. This is a good idea because the resulting estimator will be better than an ordinary kernel density estimator whenever the true density is close to your suggestion; and the estimator can be superior to the ordinary kernel density estimator even when the suggestion is pretty far off. +Kernel density estimation with a *parametric start* was introduced by Hjort and Glad in [Nonparametric Density Estimation with a Parametric Start (1995)](https://projecteuclid.org/euclid.aos/1176324627). The idea is to start out with a parametric density before you do your kernel density estimation, so that your actual kernel density estimation will be a correction to the original parametric estimate. The resulting estimator will outperform the ordinary kernel density estimator in terms of asymptotic +integrated mean squared error whenever the true density is close to your suggestion; and the estimator can be superior to the ordinary kernel density estimator even when the suggestion is pretty far off. In addition to parametric starts, the package implements some *asymmetric kernels*. These kernels are useful when modelling data with sharp boundaries, such as data supported on the positive half-line or the unit interval. Currently we support the following asymmetric kernels: @@ -85,8 +89,8 @@ These features can be combined to make asymmetric kernel densities estimators wi The function `kdensity` takes some `data`, a kernel `kernel` and a parametric start `start`. You can optionally specify the `support` parameter, which is used to find the normalizing constant. -The following example uses the \code{airquality} data set plots both a gamma-kernel density estimate with a gamma start (black), the fully parametric gamma density (red), -and an ordinary `density` estimate (blue). Notice the boundary bias of the ordinary +The following example uses the \code{datasets::airquality} data set. The black curve is a gamma-kernel density estimate with a gamma start, the red curve a fully parametric gamma density +and and the blue curve an ordinary `density` estimate. Notice the boundary bias of the ordinary `density` estimator. The underlying parameter estimates are always maximum likelilood. ```{r example, echo = TRUE} @@ -98,10 +102,13 @@ lines(density(airquality$Wind, adjust = 2), col = "blue") rug(airquality$Wind) ``` -Since the return value of `kdensity` is a function, it is callable, as in: +Since the return value of `kdensity` is a function, `kde` is callable and can be +used as any density function in `R` (such as `stats::dnorm`). For example, you can +do: ```{r callable, echo = TRUE} kde(10) +integrate(kde, lower = 0, upper = 1) # The cumulative distribution up to 1. ``` You can access the parameter estimates by using `coef`. You can also access the log likelihood (`logLik`), AIC and BIC of the parametric start distribution. @@ -112,10 +119,8 @@ logLik(kde) AIC(kde) ``` ## How to Contribute or Get Help -If you encounter a bug, have a feature request or need some help, don't hesitate -to open an [issue](https://github.com/JonasMoss/kdensity/issues). If you want to -contribute, make a pull request. This project follows a -[Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct.md). +If you encounter a bug, have a feature request or need some help, open a [Github issue](https://github.com/JonasMoss/kdensity/issues). Create a pull requests +to contribute. This project follows a [Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct.md). ## References diff --git a/README.md b/README.md index 89b3ab5..02ead3d 100644 --- a/README.md +++ b/README.md @@ -32,8 +32,9 @@ function is `kdensity`, which is has approximately the same syntax as you can also supply your own kernels. - A selection of choices for the bandwidth function `bw`, again including an option to specify your own. - - The returned value is callable: The density estimator returns a - density function when called. + - The returned value is density function. This can be used for + e.g. numerical integration, numerical differentiation, and point + evaluations. A reason to use `kdensity` is to avoid *boundary bias* when estimating densities on the unit interval or the positive half-line. Asymmetric @@ -69,7 +70,9 @@ install.packages("kdensity") devtools::install_github("JonasMoss/kdensity") ``` -Call the `library` function and use it just like `stats:density`, but +## Usage Example + +Call the `library` function and use it just like `stats::density`, but with optional additional arguments. ``` r @@ -84,11 +87,12 @@ Hjort and Glad in [Nonparametric Density Estimation with a Parametric Start (1995)](https://projecteuclid.org/euclid.aos/1176324627). The idea is to start out with a parametric density before you do your kernel density estimation, so that your actual kernel density estimation will -be a correction to the original parametric estimate. This is a good idea -because the resulting estimator will be better than an ordinary kernel -density estimator whenever the true density is close to your suggestion; -and the estimator can be superior to the ordinary kernel density -estimator even when the suggestion is pretty far off. +be a correction to the original parametric estimate. The resulting +estimator will outperform the ordinary kernel density estimator in terms +of asymptotic integrated mean squared error whenever the true density is +close to your suggestion; and the estimator can be superior to the +ordinary kernel density estimator even when the suggestion is pretty far +off. In addition to parametric starts, the package implements some *asymmetric kernels*. These kernels are useful when modelling data with @@ -126,11 +130,11 @@ The function `kdensity` takes some `data`, a kernel `kernel` and a parametric start `start`. You can optionally specify the `support` parameter, which is used to find the normalizing constant. -The following example uses the data set plots both a gamma-kernel -density estimate with a gamma start (black), the fully parametric gamma -density (red), and an ordinary `density` estimate (blue). Notice the -boundary bias of the ordinary `density` estimator. The underlying -parameter estimates are always maximum likelilood. +The following example uses the data set. The black curve is a +gamma-kernel density estimate with a gamma start, the red curve a fully +parametric gamma density and and the blue curve an ordinary `density` +estimate. Notice the boundary bias of the ordinary `density` estimator. +The underlying parameter estimates are always maximum likelilood. ``` r library("kdensity") @@ -143,12 +147,15 @@ rug(airquality$Wind) -Since the return value of `kdensity` is a function, it is callable, as -in: +Since the return value of `kdensity` is a function, `kde` is callable +and can be used as any density function in `R` (such as `stats::dnorm`). +For example, you can do: ``` r kde(10) #> [1] 0.09980471 +integrate(kde, lower = 0, upper = 1) # The cumulative distribution up to 1. +#> 1.27532e-05 with absolute error < 2.2e-19 ``` You can access the parameter estimates by using `coef`. You can also @@ -167,11 +174,9 @@ AIC(kde) ## How to Contribute or Get Help -If you encounter a bug, have a feature request or need some help, don’t -hesitate to open an -[issue](https://github.com/JonasMoss/kdensity/issues). If you want to -contribute, make a pull request. This project follows a [Contributor -Code of +If you encounter a bug, have a feature request or need some help, open a +[Github issue](https://github.com/JonasMoss/kdensity/issues). Create a +pull requests to contribute. This project follows a [Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct.md). ## References diff --git a/tests/testthat/test_kdensity.R b/tests/testthat/test_kdensity.R index cb2eed0..b32383a 100644 --- a/tests/testthat/test_kdensity.R +++ b/tests/testthat/test_kdensity.R @@ -39,3 +39,5 @@ expect_error(kdensity(precip, bw = Inf)) expect_equal(kdensity(precip, bw = Inf, start = "normal")(10), dnorm(10, mean = mean(precip), sd = sd(precip))) expect_equal(kdensity(precip, bw = 1)(10), kdensity(precip, bw = silly_width)(10)) expect_error(kdensity(precip)()) +expect_error(kdensity(precip, start = "gumbel", kernel = "rectangular", + bw = "ucv")) diff --git a/vignettes/tutorial.Rmd b/vignettes/tutorial.Rmd index be7c077..6bc0b11 100644 --- a/vignettes/tutorial.Rmd +++ b/vignettes/tutorial.Rmd @@ -135,7 +135,7 @@ rug(LH) ``` Since all the curves are in agreement, kernel density estimation appears to add -unneccessary complexity without sufficient compensation in fit. We are justified +unnecessary complexity without sufficient compensation in fit. We are justified in using the skew hyperbolic t-distribution if this simplifies our analysis down the line.