Add parallel tests #192

ezra-varady · 2023-10-05T01:29:08Z

This PR adds support for a set of parallel tests to ensure that lantern behaves properly under a more realistic workload. At the moment they are not especially interesting, just a PoC, but this includes the changes to support the concept more generally. I have for the time being added these tests to make test to see how they'll interact with CI on this PR. They should eventually be restricted to make test-parallel @var77 if you have ideas about how best to include them in the github actions workflow any advice would be awesome.

Some notes:

To run the new tests alone run make test-parallel (currently they'll also run if you run make test but I'd like to change this before merging
The schedule for the new tests is test/parallel_schedule.txt
The tests themselves live in test/parallel they have the same layout as the existing regression tests
At the moment these tests only bring in sift10k do a few inserts and selects and check some obvious invariants at the end if anyone has suggestions for additional tests/invariants to add please let me know and I will include them

Fixes #177

test/parallel/parallel_test_runner.sh

test/parallel/sql/select.sql

var77

Look good, great changes! Added some small comments. For the CI we may modify run-tests-linux.sh and run-tests-mac.sh files. to run the parallel tests as well

…begin and end

Ngalstyan4

a couple nits. otherwise ready to merge. Well done!

Ngalstyan4 · 2023-10-06T06:51:01Z

test/test_runner.sh

+# if tests are parallel we only do this for the begin tests as we won't be dropping the database until the end
+# begin will handle initialization specific to the tests but expects the database already exists
+if [ "$PARALLEL" -eq 0 ]; then
+    psql "$@" -U ${DB_USER} -d postgres -v ECHO=none -q -c "DROP DATABASE IF EXISTS ${TEST_CASE_DB};" 2>/dev/null


should this not happen in both branches of the if?

Ngalstyan4 · 2023-10-06T06:52:30Z

test/test_runner.sh

+if [ "$PARALLEL" -eq 0 ]; then
+    psql "$@" -U ${DB_USER} -d postgres -v ECHO=none -q -c "DROP DATABASE IF EXISTS ${TEST_CASE_DB};" 2>/dev/null
+    db_init
+    psql "$@" -U ${DB_USER} -d ${TEST_CASE_DB} -v ECHO=none -q -f utils/common.sql 2>/dev/null


should this not happen in both branches as well?

good catch, I'll rework this

Ngalstyan4 · 2023-10-06T07:00:33Z

scripts/run_all_tests.sh

    else
-        TEST_FILES=$(cat schedule.txt | grep '^test:' | sed -e 's/^test://' | tr " " "\n" | sed -e '/^$/d')
+	    if [[ "$pgvector_installed" == "1" ]]; then
+		TEST_FILES=$(cat $SCHEDULE | grep -E '^(test:|test_pgvector:)' | sed -E -e 's/^test:|test_pgvector://' | tr " " "\n" | sed -e '/^$/d')


Unrelated to this PR, but still important: Do we currently run pgvector compat tests anywhere in ci/cd or release pipeline?

Yes we are running them in the pipeline, as we're installing the pgvector there

Ngalstyan4 · 2023-10-06T07:02:33Z

test/parallel/sql/utils/random_array.sql

@@ -0,0 +1,11 @@
+CREATE OR REPLACE FUNCTION random_int_array(dim integer, min integer, max integer) RETURNS integer[] AS $BODY$


I feel like this would be useful in regular tests as well. Should be in the utils for regular tests.

… standard tests, cleanup logic in test runner

ezra-varady · 2023-10-07T04:58:33Z

merged the two branches in the test runner and made copies of a couple utilities

Uses pg_regress to run tests in parallel against the database. Allows custom DB initialization and finalization which can be used to load relevant data in the beginning and check relevant invariants in the end

ezra-varady added 4 commits October 2, 2023 08:04

initial attempt at parallel test running in python

c0cae0d

reimplement in pg_regress

b666c4b

initial implementation of parallel test running

98d6e5e

add some more tests clean things up

d6c7da7