Skip to content
Mahmoud Hanafy edited this page Feb 28, 2016 · 9 revisions

DataframeGenerator provides an easy way to generate arbitrary DataFrames, to be able to check any property. If you don't know scalacheck, I suggest you read about it first; to understand the concepts of properties and generators.

To Generate DataFrames, use the following method:

genDataFrame(sqlContext: SQLContext, schema: StructType, minPartitions: Int = 1): Arbitrary[DataFrame]

Just provide the schema you want and all DataFrames will be generated with this schema.

Example:

class DataFrameCheck extends FunSuite with SharedSparkContext with Checkers {
  test("assert dataframes generated correctly") {
    val schema = StructType(List(StructField("name", StringType), StructField("age", IntegerType)))
    val sqlContext = new SQLContext(sc)
    val dataframeGen = DataframeGenerator.genDataFrame(sqlContext, schema)
  
    val property =
      forAll(dataframeGen.arbitrary) {
        dataframe => dataframe.schema === schema && dataframe.count >= 0
      }
  
    check(property)
  }
}
Clone this wiki locally