Skip to content

Commit

Permalink
📚 Boundary bias.
Browse files Browse the repository at this point in the history
  • Loading branch information
JonasMoss committed Jul 11, 2019
1 parent d3cf592 commit 394fa37
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
2 changes: 1 addition & 1 deletion paper/paper.html
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ <h1>Summary</h1>
<p>Kernel density estimation <span class="citation">(Silverman 2018)</span> is a popular method for non-parametric density estimation based on placing kernels on each data point. <span class="citation">Hjort and Glad (1995)</span> extended kernel density estimation with <em>parametric starts</em>. The parametric start is a parametric density that is multiplied with the kernel estimate. When the data-generating density is reasonably close to the parametric start density, kernel density estimation with that parametric start will outperform ordinary kernel density estimation.</p>
<p>Asymmetric kernels are useful for estimating densities on the half-open interval <span class="math inline">\(\left[0,\infty\right)\)</span> and bounded intervals such as <span class="math inline">\(\left[0, 1\right]\)</span>. On such intervals symmetric kernels are prone to serious boundary bias that should be corrected <span class="citation">(Marron and Ruppert 1994)</span>. Asymmetric kernels are designed to avoid boundary bias.</p>
<p><code>kdensity</code> is an R package <span class="citation">(R Core Team 2019)</span> to calculate and display kernel density estimates using non-parametric starts and potentially asymmetric kernels. In addition to the classical symmetric kernels, <code>kdensity</code> supports the following asymmetric kernels: For the unit interval, the Gaussian copula kernel of <span class="citation">Jones and Henderson (2007)</span> and the beta kernels of <span class="citation">Chen (1999)</span> are supported. On the half-open interval the gamma kernel of <span class="citation">Chen (2000)</span> is supported. The supported non-parametric starts include the normal, Laplace, Gumbel, exponential, gamma, log-normal, inverse Gaussian, Weibull, Beta, and Kumaraswamy densities. The parameters of all parametric starts are estimated using maximum likelihood. The implemented bandwidth selectors are the classical bandwidth selectors from <code>stats</code>, unbiased cross-validation, the Hermite polynomial method from <span class="citation">Hjort and Glad (1995)</span>, and the tailored bandwidth selector for the Gaussian copula method of <span class="citation">Jones and Henderson (2007)</span>. User defined parametric starts, kernels and bandwidth selectors are also supported.</p>
<p>The following example uses the data set from the built-in R package <code>datasets</code>. Since the data is positive we use Chen’s gamma kernel. As the data is likely to be better approximated by a gamma distribution than a uniform distribution, we use the gamma parametric start. The plotted density is in figure 1, where the gamma distribution with parameters estimated by maximum likelihood is in red and the ordinary kernel density estimate in blue.</p>
<p>The following example uses the data set from the built-in R package <code>datasets</code>. Since the data is positive we use Chen’s gamma kernel. As the data is likely to be better approximated by a gamma distribution than a uniform distribution, we use the gamma parametric start. The plotted density is in figure 1, where the gamma distribution with parameters estimated by maximum likelihood is in red and the ordinary kernel density estimate in blue. Notice the boundary bias of the ordinary kernel density estimator.</p>
<pre class="r"><code># install.packages(&quot;kdensity&quot;)
library(&quot;kdensity&quot;)
kde = kdensity(airquality$Wind, start = &quot;gamma&quot;, kernel = &quot;gamma&quot;)
Expand Down
5 changes: 3 additions & 2 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ authors:
orcid: 0000-0002-6876-6964
affiliation: 1
- name: Martin Tveten
orcid: 0000-0000-0000-0000
orcid: 0000-0002-4236-633X
affiliation: 1
affiliations:
- name: University of Oslo
Expand Down Expand Up @@ -56,7 +56,8 @@ R package `datasets`. Since the data is positive we use Chen's gamma kernel.
As the data is likely to be better approximated by a gamma distribution than a
uniform distribution, we use the gamma parametric start. The plotted density is
in figure 1, where the gamma distribution with parameters estimated by maximum
likelihood is in red and the ordinary kernel density estimate in blue.
likelihood is in red and the ordinary kernel density estimate in blue.
Notice the boundary bias of the ordinary kernel density estimator.

```r
# install.packages("kdensity")
Expand Down

0 comments on commit 394fa37

Please sign in to comment.