Skip to content
Mahmoud Hanafy edited this page Apr 21, 2016 · 7 revisions

RDDGenerator provides an easy way to generate arbitrary RDDs, to be able to check any property. If you don't know scalacheck, I suggest you read about it first; to understand the concepts of properties and generators.

You can generate RDDs using method arbitraryRDD, Which generates arbitrary RDDs of the desired type. Just create a generator for your required RDD type or use generators that are supported by default.

Example: (Use supported generator)

class RDDsCheck extends FunSuite with with SharedSparkContext with Checkers {
  test("map should not change number of elements") {
    val property =
      forAll(RDDGenerator.genRDD[String](sc)(Arbitrary.arbitrary[String])) {
        rdd => rdd.map(_.length).count() == rdd.count()
       }

    check(property)
  }
}

Example: (Custom Generator)

class RDDsCheck extends FunSuite with SharedSparkContext with Checkers {
  test("custom generator") {

    val property =
      forAll(RDDGenerator.genRDD[Person](sc) {
        val generator: Gen[Person] = for {
          name <- Arbitrary.arbitrary[String]
          age <- Arbitrary.arbitrary[Int]
        } yield (Person(name, age))

        generator
      }) {
        rdd => rdd.map(_.age).count() == rdd.count()
      }

    check(property)
  }
}

case class Person(name: String, age: Int)

You can specify the size of the RDDs using implicit PropertyCheckConfig.

Example:

class RDDsCheck extends FunSuite with SharedSparkContext with Checkers {

  test("generate rdd of specific size") {
    implicit val generatorDrivenConfig =
      PropertyCheckConfig(minSize = 10, maxSize = 20)
    val prop = forAll(RDDGenerator.genRDD[String](sc)(Arbitrary.arbitrary[String])){
      rdd => rdd.count() <= 20
    }

    check(prop)
  }
}
Clone this wiki locally