Skip to content

Commit

Permalink
Merge pull request #60 from JonasMoss/JOSSrevision2
Browse files Browse the repository at this point in the history
Joss revision 2
  • Loading branch information
JonasMoss committed Aug 7, 2019
2 parents 825d477 + 5747aa0 commit 76bcb8d
Show file tree
Hide file tree
Showing 7 changed files with 55 additions and 43 deletions.
6 changes: 4 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,17 @@ Description: Handles univariate non-parametric density estimation with
copula kernel of Jones & Henderson (2007) <doi:10.1093/biomet/asm068>.
User-supplied kernels, parametric starts, and bandwidths are supported.
License: MIT + file LICENSE
URL: https://github.com/JonasMoss/kdensity
BugReports: https://github.com/JonasMoss/kdensity/issues
Encoding: UTF-8
LazyData: true
Suggests: extraDistr,
SkewHyperbolic,
testthat,
covr,
EQL,
knitr,
rmarkdown
Imports: assertthat
Imports: assertthat,
EQL
RoxygenNote: 6.1.1
VignetteBuilder: knitr
2 changes: 0 additions & 2 deletions R/builtin_bandwidths.R
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,6 @@ bw_environment$JH = function(x, kernel = NULL, start = NULL, support = NULL) {
}

bw_environment$RHE = function(x, kernel = NULL, start = NULL, support = NULL) {
assertthat::assert_that("EQL" %in% rownames(utils::installed.packages()), msg =
"The bandwidth function 'RHE' requires the package 'EQL' to work.")

max_degree = 5 # The maximum degree of the Hermite polynomials.
n <- length(x)
Expand Down
10 changes: 5 additions & 5 deletions R/kdensity.R
Original file line number Diff line number Diff line change
Expand Up @@ -122,10 +122,10 @@ kdensity = function(x, bw = NULL, adjust = 1, kernel = NULL, start = NULL,
data.name = deparse(substitute(x))
has.na = anyNA(x)

if(has.na) {
if(!na.rm) stop("x contains NAs and na.rm = FALSE.")
x = x[!is.na(x)]
}
assertthat::assert_that(!(has.na & !na.rm),
msg = "x contains NAs and na.rm = FALSE.")

x = x[!is.na(x)] # This line is reached only if (has.na & !na.rm) is FALSE.

## 'kernel', 'start' and 'bw' can be custom made: In this case, they must
## be added to their environments.
Expand All @@ -149,7 +149,7 @@ kdensity = function(x, bw = NULL, adjust = 1, kernel = NULL, start = NULL,
## The case of bw == Inf is special! In this case, we return the parametric
## start itself.

## Now we massage and handle the combinations of kernel, start and support.
## Now we handle the combinations of kernel, start and support.
## This is fancy defaults management.
kss_list = get_kernel_start_support(kernel, start, support)

Expand Down
31 changes: 18 additions & 13 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,12 @@ knitr::opts_chunk$set(out.width='750px', dpi=200)
## Overview
kdensity is an implementation of univariate kernel density estimation with support for parametric starts and asymmetric kernels. Its main function is `kdensity`, which is has approximately the same syntax as `stats::density`. Its new functionality is:

* `kdensity` has built-in support for many *parametric starts*, such as `normal` and `gamma`, but you can also supply your own.
* It supports several asymmetric kernels ones such as `gcopula` and `gamma` kernels, but also the common symmetric ones. In addition, you can also supply your own kernels.
* A selection of choices for the bandwidth function `bw`, again including an option to specify your own.
* The returned value is callable: The density estimator returns a density function when called.
* `kdensity` has built-in support for many *parametric starts*, such as `normal`
and `gamma`, but you can also supply your own.
* It supports several asymmetric kernels ones such as `gcopula` and `gamma` kernels, but also the common symmetric ones. In addition, you can also supply your own kernels.
* A selection of choices for the bandwidth function `bw`, again including an option to specify your own.
* The returned value is density function. This can be used for e.g. numerical
integration, numerical differentiation, and point evaluations.

A reason to use `kdensity` is to avoid *boundary bias* when estimating densities on the unit interval or the positive half-line. Asymmetric kernels such as `gamma` and `gcopula` are designed for this purpose. The support for parametric starts allows you to easily use a method that is often superior to ordinary kernel density estimation.

Expand All @@ -61,15 +63,17 @@ install.packages("kdensity")
devtools::install_github("JonasMoss/kdensity")
```

Call the `library` function and use it just like `stats:density`, but with optional additional arguments.
## Usage Example
Call the `library` function and use it just like `stats::density`, but with optional additional arguments.
```{r simpleuse, echo = TRUE, eval = FALSE}
library("kdensity")
plot(kdensity(mtcars$mpg, start = "normal"))
```

## Description

Kernel density estimation with a *parametric start* was introduced by Hjort and Glad in [Nonparametric Density Estimation with a Parametric Start (1995)](https://projecteuclid.org/euclid.aos/1176324627). The idea is to start out with a parametric density before you do your kernel density estimation, so that your actual kernel density estimation will be a correction to the original parametric estimate. This is a good idea because the resulting estimator will be better than an ordinary kernel density estimator whenever the true density is close to your suggestion; and the estimator can be superior to the ordinary kernel density estimator even when the suggestion is pretty far off.
Kernel density estimation with a *parametric start* was introduced by Hjort and Glad in [Nonparametric Density Estimation with a Parametric Start (1995)](https://projecteuclid.org/euclid.aos/1176324627). The idea is to start out with a parametric density before you do your kernel density estimation, so that your actual kernel density estimation will be a correction to the original parametric estimate. The resulting estimator will outperform the ordinary kernel density estimator in terms of asymptotic
integrated mean squared error whenever the true density is close to your suggestion; and the estimator can be superior to the ordinary kernel density estimator even when the suggestion is pretty far off.

In addition to parametric starts, the package implements some *asymmetric kernels*. These kernels are useful when modelling data with sharp boundaries, such as data supported on the positive half-line or the unit interval. Currently we support the following asymmetric kernels:

Expand All @@ -85,8 +89,8 @@ These features can be combined to make asymmetric kernel densities estimators wi

The function `kdensity` takes some `data`, a kernel `kernel` and a parametric start `start`. You can optionally specify the `support` parameter, which is used to find the normalizing constant.

The following example uses the \code{airquality} data set plots both a gamma-kernel density estimate with a gamma start (black), the fully parametric gamma density (red),
and an ordinary `density` estimate (blue). Notice the boundary bias of the ordinary
The following example uses the \code{datasets::airquality} data set. The black curve is a gamma-kernel density estimate with a gamma start, the red curve a fully parametric gamma density
and and the blue curve an ordinary `density` estimate. Notice the boundary bias of the ordinary
`density` estimator. The underlying parameter estimates are always maximum likelilood.

```{r example, echo = TRUE}
Expand All @@ -98,10 +102,13 @@ lines(density(airquality$Wind, adjust = 2), col = "blue")
rug(airquality$Wind)
```

Since the return value of `kdensity` is a function, it is callable, as in:
Since the return value of `kdensity` is a function, `kde` is callable and can be
used as any density function in `R` (such as `stats::dnorm`). For example, you can
do:

```{r callable, echo = TRUE}
kde(10)
integrate(kde, lower = 0, upper = 1) # The cumulative distribution up to 1.
```

You can access the parameter estimates by using `coef`. You can also access the log likelihood (`logLik`), AIC and BIC of the parametric start distribution.
Expand All @@ -112,10 +119,8 @@ logLik(kde)
AIC(kde)
```
## How to Contribute or Get Help
If you encounter a bug, have a feature request or need some help, don't hesitate
to open an [issue](https://github.com/JonasMoss/kdensity/issues). If you want to
contribute, make a pull request. This project follows a
[Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct.md).
If you encounter a bug, have a feature request or need some help, open a [Github issue](https://github.com/JonasMoss/kdensity/issues). Create a pull requests
to contribute. This project follows a [Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct.md).

## References

Expand Down
45 changes: 25 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,9 @@ function is `kdensity`, which is has approximately the same syntax as
you can also supply your own kernels.
- A selection of choices for the bandwidth function `bw`, again
including an option to specify your own.
- The returned value is callable: The density estimator returns a
density function when called.
- The returned value is density function. This can be used for
e.g. numerical integration, numerical differentiation, and point
evaluations.

A reason to use `kdensity` is to avoid *boundary bias* when estimating
densities on the unit interval or the positive half-line. Asymmetric
Expand Down Expand Up @@ -69,7 +70,9 @@ install.packages("kdensity")
devtools::install_github("JonasMoss/kdensity")
```

Call the `library` function and use it just like `stats:density`, but
## Usage Example

Call the `library` function and use it just like `stats::density`, but
with optional additional arguments.

``` r
Expand All @@ -84,11 +87,12 @@ Hjort and Glad in [Nonparametric Density Estimation with a Parametric
Start (1995)](https://projecteuclid.org/euclid.aos/1176324627). The idea
is to start out with a parametric density before you do your kernel
density estimation, so that your actual kernel density estimation will
be a correction to the original parametric estimate. This is a good idea
because the resulting estimator will be better than an ordinary kernel
density estimator whenever the true density is close to your suggestion;
and the estimator can be superior to the ordinary kernel density
estimator even when the suggestion is pretty far off.
be a correction to the original parametric estimate. The resulting
estimator will outperform the ordinary kernel density estimator in terms
of asymptotic integrated mean squared error whenever the true density is
close to your suggestion; and the estimator can be superior to the
ordinary kernel density estimator even when the suggestion is pretty far
off.

In addition to parametric starts, the package implements some
*asymmetric kernels*. These kernels are useful when modelling data with
Expand Down Expand Up @@ -126,11 +130,11 @@ The function `kdensity` takes some `data`, a kernel `kernel` and a
parametric start `start`. You can optionally specify the `support`
parameter, which is used to find the normalizing constant.

The following example uses the data set plots both a gamma-kernel
density estimate with a gamma start (black), the fully parametric gamma
density (red), and an ordinary `density` estimate (blue). Notice the
boundary bias of the ordinary `density` estimator. The underlying
parameter estimates are always maximum likelilood.
The following example uses the data set. The black curve is a
gamma-kernel density estimate with a gamma start, the red curve a fully
parametric gamma density and and the blue curve an ordinary `density`
estimate. Notice the boundary bias of the ordinary `density` estimator.
The underlying parameter estimates are always maximum likelilood.

``` r
library("kdensity")
Expand All @@ -143,12 +147,15 @@ rug(airquality$Wind)

<img src="man/figures/README-example-1.png" width="750px" />

Since the return value of `kdensity` is a function, it is callable, as
in:
Since the return value of `kdensity` is a function, `kde` is callable
and can be used as any density function in `R` (such as `stats::dnorm`).
For example, you can do:

``` r
kde(10)
#> [1] 0.09980471
integrate(kde, lower = 0, upper = 1) # The cumulative distribution up to 1.
#> 1.27532e-05 with absolute error < 2.2e-19
```

You can access the parameter estimates by using `coef`. You can also
Expand All @@ -167,11 +174,9 @@ AIC(kde)

## How to Contribute or Get Help

If you encounter a bug, have a feature request or need some help, don’t
hesitate to open an
[issue](https://github.com/JonasMoss/kdensity/issues). If you want to
contribute, make a pull request. This project follows a [Contributor
Code of
If you encounter a bug, have a feature request or need some help, open a
[Github issue](https://github.com/JonasMoss/kdensity/issues). Create a
pull requests to contribute. This project follows a [Contributor Code of
Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct.md).

## References
Expand Down
2 changes: 2 additions & 0 deletions tests/testthat/test_kdensity.R
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,5 @@ expect_error(kdensity(precip, bw = Inf))
expect_equal(kdensity(precip, bw = Inf, start = "normal")(10), dnorm(10, mean = mean(precip), sd = sd(precip)))
expect_equal(kdensity(precip, bw = 1)(10), kdensity(precip, bw = silly_width)(10))
expect_error(kdensity(precip)())
expect_error(kdensity(precip, start = "gumbel", kernel = "rectangular",
bw = "ucv"))
2 changes: 1 addition & 1 deletion vignettes/tutorial.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ rug(LH)
```

Since all the curves are in agreement, kernel density estimation appears to add
unneccessary complexity without sufficient compensation in fit. We are justified
unnecessary complexity without sufficient compensation in fit. We are justified
in using the skew hyperbolic t-distribution if this simplifies our analysis down
the line.

Expand Down

0 comments on commit 76bcb8d

Please sign in to comment.