Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

有一处不是不是很明白? #1

Open
Iamshg opened this issue Dec 6, 2020 · 0 comments
Open

有一处不是不是很明白? #1

Iamshg opened this issue Dec 6, 2020 · 0 comments

Comments

@Iamshg
Copy link

Iamshg commented Dec 6, 2020

您好,我在看 DBSCAN算法的Spark实现.ipynb 代码时候,有一处不是很明白,希望可以解答一下。
文中在第八步骤

/*================================================================================*/
//  求每一个簇的代表核心和簇元素数量
/*================================================================================*/
....
val rdd_result = rdd_cluster.reduceByKey((a,b)=>{
  val id_set = a._3 | b._3 # 不明白
  val result = if(a._2>=b._2) (a._1,a._2,id_set)
  else (b._1,b._2,id_set)
  result
})
...

在这里,rdd_cluster进行了按照键的归并操作,但是rdd_cluster的已经是聚类的结果了,就说明rdd_cluster的键都是唯一的,没有重复的,所以我认为这一步骤reduceByKey是无用操作,是吗?
第二,如果我的第一的疑问有问题,那么成功运行了reduceByKey的逻辑,但是在reduceByKey的代码中,对相同类的邻居集合进行了并集|操作,为什么邻居数量result._2取的不是id_set的长度,而是Max(a._2,b._2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant