Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The definition of IRD in original paper and the codes in NOF are different. #1

Open
BlueBirdHouse opened this issue Jul 18, 2019 · 2 comments

Comments

@BlueBirdHouse
Copy link

The package DDOutlier is a great job. In the searching for sophisticated outlier detection algorithms, I find the DDOutlier by accident. It proves the codes together with the associated papers, which are what I need.

I choose one algorithm called NOF; then, I read two papers associated with it: 'A non-parameter outlier detection algorithm based on Natural Neighbor' and 'Natural neighbor: A self-adaptive neighborhood method without parameter K.' There is no other reason to choose them except that they do not need parameter K. This is my first time to use a density-based algorithm. I have no idea about how to decide the parameter K, so it seems better for the algorithm to select them automatically.

To properly understand your code, I studied the language R last week!

Even your codes do not precisely reflect the idea of Huang et al., I agree with your codes. However, I need your confirm because of the lack of experience in the R language and the outlier detection algorithms. The writer in 'A non-parameter outlier detection algorithm based on Natural Neighbor' try to upgrade the LOF algorithm with a set of concepts about the nature neighbor. So, I agree that everything in LOF should be replaced. The writer references the definition of IRD in equation two. In this definition, everything is about the k-distance neighbor, so �we add all the 'o' in 'NNk(p)'. In definition 13 (equation 9), all the things should be associated with NIS. I think this is why 'k_dist <- as.vector(dist.obj$dist[NIS[[i]],max_nb]).' Note that the writer never renews the definition of IRD, so it should be the old one. I think this is the writer's typo. However, there needs a set of experiments to show that your code is right. Unfortunately, I am new to R. I do not have resources to compare different explains.

I need your opinion about this different point.

@jhmadsen
Copy link
Owner

As I replied to you in private by mail, I'm posting the response here as well.

First I need to mention that I had some difficulties understanding the natural neighborhood algorithm, specifically step 16 and step 17 in the original paper by Zhu et. al. To solve this issue, I texted Zhu himself, but unfortunately he never replied.

Second, the paper by Zuang et. al is a bit confusing I think. I'm also a bit confused about your argument.
What part of the paper is wrong/a typo? Definition 13 (equation 9) or definition 4 (equation 2)? Can you provide an example? Preferably with the Iris dataset.

I would happily accept your help. If you'd like, you can make a pull request on Github.

@BlueBirdHouse
Copy link
Author

The detailed bug reports come slowly because of my limited research ability. I am working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants