Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache CIM classes as pairRDD #2

Open
derrickoswald opened this issue Feb 12, 2017 · 0 comments
Open

cache CIM classes as pairRDD #2

derrickoswald opened this issue Feb 12, 2017 · 0 comments

Comments

@derrickoswald
Copy link
Owner

One of the most common operations for CIM class RDD is to generate a pairRDD for join operations with:

XXX.keyBy (_.id)

It may be advantageous to formalize this use-case by storing pre-keyed pairRDD in the persistent RDD cache pool instead of just CIM object RDD, since the id (CIM rdf:ID = mRID) is the unique identifier for each CIM object.

Unfortunately, this has pervasive downstream consequences. Each operation to "get" an RDD by name, which is used extensively in CIMScala and dependent code like CIMApplication, would need to be modified to take advantage of this - or to work-around it if the keyBy (_.id) is not required.

For example:

val elements = get ("Elements").asInstanceOf[RDD[Element]].keyBy (_.id).join (...

becomes

val elements = get ("Elements").asInstanceOf[RDD[Element]].join (...

and

val terms = get ("Terminal").asInstanceOf[RDD[Terminal]].keyBy (_.ConductingEquipment).join (...

becomes

val terms = get ("Terminal").asInstanceOf[RDD[Terminal]].values.keyBy (_.ConductingEquipment).join (...

This also has effects on partitioning. I believe that the first element of the pair's hash code is used as the partition function for pairRDD, and hence caching pairRDD would trigger a shuffle as objects were coalesced into the machine that "owns" them.

Benchmarks should be performed before and after this change to determine if there is an actual speed improvement with typical use-case scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant