-
-
Notifications
You must be signed in to change notification settings - Fork 355
DataFrameGenerator
Mahmoud Hanafy edited this page Mar 5, 2016
·
9 revisions
DataframeGenerator
provides an easy way to generate arbitrary DataFrames, to be able to check any property.
If you don't know scalacheck, I suggest you read about it first; to understand the concepts of properties and generators.
To Generate DataFrames, use the following method:
genDataFrame(sqlContext: SQLContext, schema: StructType, minPartitions: Int = 1): Arbitrary[DataFrame]
Just provide the schema you want and all DataFrames will be generated with this schema.
Example:
class DataFrameCheck extends FunSuite with SharedSparkContext with Checkers {
test("assert dataframes generated correctly") {
val schema = StructType(List(StructField("name", StringType), StructField("age", IntegerType)))
val sqlContext = new SQLContext(sc)
val dataframeGen = DataframeGenerator.genDataFrame(sqlContext, schema)
val property =
forAll(dataframeGen.arbitrary) {
dataframe => dataframe.schema === schema && dataframe.count >= 0
}
check(property)
}
}