Skip to content
Mahmoud Hanafy edited this page Mar 5, 2016 · 7 revisions

RDDGenerator provides an easy way to generate arbitrary RDDs, to be able to check any property. If you don't know scalacheck, I suggest you read about it first; to understand the concepts of properties and generators.

To Generate RDDs, use the following method:

genRDD[T: ClassTag](sc: SparkContext, minPartitions: Int = 1)(getGenerator: => Gen[T]): Gen[RDD[T]]

Which Generates an RDD of the desired type. Attempt to try different number of partitions so as to catch problems with empty partitions, etc. minPartitions defaults to 1, but when generating data too large for a single machine choose a larger value. getGenerator used to create the generator. This function will be used to create the generator as many times as required.

just create a generator for your required RDD type or use generators that is supported by default.

Example: (Use supported generator)

class RDDsCheck extends FunSuite with with SharedSparkContext with Checkers {
  test("map should not change number of elements") {
    val property =
      forAll(RDDGenerator.genRDD[String](sc)(Arbitrary.arbitrary[String])) {
        rdd => rdd.map(_.length).count() == rdd.count()
       }

    check(property)
  }
}

Example: (Custom Generator)

class RDDsCheck extends FunSuite with SharedSparkContext with Checkers {
  test("custom generator") {

    val property =
      forAll(RDDGenerator.genRDD[Person](sc) {
        val generator: Gen[Person] = for {
          name <- Arbitrary.arbitrary[String]
          age <- Arbitrary.arbitrary[Int]
        } yield (Person(name, age))

        generator
      }) {
        rdd => rdd.map(_.age).count() == rdd.count()
      }

    check(property)
  }
}

case class Person(name: String, age: Int)

You can specify the size of the RDDs using implicit PropertyCheckConfig.

Example:

class RDDsCheck extends FunSuite with SharedSparkContext with Checkers {

  test("generate rdd of specific size") {
    implicit val generatorDrivenConfig =
      PropertyCheckConfig(minSize = 10, maxSize = 20)
    val prop = forAll(RDDGenerator.genRDD[String](sc)(Arbitrary.arbitrary[String])){
      rdd => rdd.count() <= 20
    }

    check(prop)
  }
}
Clone this wiki locally