Skip to content

Commit

Permalink
[SPARK-48061][SQL][TESTS] Parameterize max limits of `spark.sql.test.…
Browse files Browse the repository at this point in the history
…randomDataGenerator`

### What changes were proposed in this pull request?

This PR aims to parameterize `MAX_ARR_SIZE`, `MAX_MAP_SIZE`, and `MAX_STR_LEN` of `spark.sql.test.randomDataGenerator` by supporting.
- `spark.sql.test.randomDataGenerator.maxArraySize`
- `spark.sql.test.randomDataGenerator.maxMapSize`
- `spark.sql.test.randomDataGenerator.maxStrLen`

### Why are the changes needed?

Apache Spark already has the code which needs these parameters. We had better support these to allow the developers to use them without changing and recompiling the source code.

https://github.com/apache/spark/blob/0329479acb6758c4d3e53d514ea832a181d31065/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryHashPartitionVerifySuite.scala#L155-L156

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual.

**BEFORE (golden file size: `269M`)**
```
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite"

$ ls -alh ./sql/core/target/scala-2.13/test-classes/structured-streaming/partition-tests/rowsAndPartIds
-rw-r--r--  1 dongjoon  staff   269M Apr 30 09:55 ./sql/core/target/scala-2.13/test-classes/structured-streaming/partition-tests/rowsAndPartIds
```

**AFTER (golden file size: `5.8M`)**
```
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite" \
-Dspark.sql.test.randomDataGenerator.maxStrLen=100 \
-Dspark.sql.test.randomDataGenerator.maxArraySize=4

$ ls -alh ./sql/core/target/scala-2.13/test-classes/structured-streaming/partition-tests/rowsAndPartIds
-rw-r--r--  1 dongjoon  staff   5.8M Apr 30 09:56 ./sql/core/target/scala-2.13/test-classes/structured-streaming/partition-tests/rowsAndPartIds
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46305 from dongjoon-hyun/SPARK-48061.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
dongjoon-hyun committed Apr 30, 2024
1 parent 0329479 commit 9caa6f7
Showing 1 changed file with 6 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,12 @@ object RandomDataGenerator {
*/
private val PROBABILITY_OF_NULL: Float = 0.1f

final val MAX_STR_LEN: Int = 1024
final val MAX_ARR_SIZE: Int = 128
final val MAX_MAP_SIZE: Int = 128
final val MAX_STR_LEN: Int =
System.getProperty("spark.sql.test.randomDataGenerator.maxStrLen", "1024").toInt
final val MAX_ARR_SIZE: Int =
System.getProperty("spark.sql.test.randomDataGenerator.maxArraySize", "128").toInt
final val MAX_MAP_SIZE: Int =
System.getProperty("spark.sql.test.randomDataGenerator.maxMapSize", "128").toInt

/**
* Helper function for constructing a biased random number generator which returns "interesting"
Expand Down

0 comments on commit 9caa6f7

Please sign in to comment.