Integrated Boot into Proto-X #31

wangpatrick57 · 2024-04-15T14:55:34Z

Summary: Integrated Boot to accelerate the HPO and tuning loops of Proto-X

Demo:
On TPC-H SF10, using Boot (image 1), Proto-X finds a configuration with a "Best Seen Metric" of 69.67s after ~1hr of tuning. Without Boot (image 2), Proto-X only finds a configuration with a "Best Seen Metric" of 103.10s after ~1hr of tuning. Note that "Best Seen Metric" refers to the total runtime of the last run of all 22 queries TPC-H. "Best Metric" is set to "Best Seen Metric" whenever a run has no timed-out queries. Both runs used the same hyperparameters and the same per-query timeout.

Details:

Boot is automatically installed when running postgres build.
Redis (needed by Boot) is automatically started when Boot is enabled. This Redis does not conflict with the Redis instance used by Ray.
Boot can be toggled on and off with the CLI arg --enable-boot-during-[hpo|tune]. Boot is enabled separately for HPO and for tuning.
Boot can be configured with a .yaml file passed in through the CLI.
Currently, the results from Boot being enabled show the estimated runtime of the TPC-H workload, not the actual runtime. Thus, the results should be taken with a grain of salt. A future PR will resolve this.
Boot could potentially be even more effective if the per-query timeout was also decreased to reflect how Boot accelerates queries. This may be investigated in the future.
Currently, Boot's entire intelligent cache is cleared every time the configuration changes. A future PR will implement logic to selectively clear parts of the cache when the configuration changes.

…integrate-boot

…data existing

lmwnshn

LGTM, only comments are minor thoughts as I read it. Nice work independently integrating everything.

lmwnshn · 2024-04-17T15:25:01Z

experiments/protox_tpch_sf10/main.sh

+INTENDED_PGDATA_HARDWARE=ssd
+PGDATA_PARENT_DPATH=/mnt/nvme1n1/phw2/dbgym_tmp/
+
+# space for testing


I'm guessing this is temporary? Not sure if you wanted to remove this.

This is to run specific commands within main.sh. It's inconvenient to copy them directly because they rely on ENV_VARS, so this is my way of "copying" them. I'll add a comment better explaining this, but I imagine it'll be permanent.

lmwnshn · 2024-04-17T15:27:25Z

scripts/pat_test.sh

+INTENDED_PGDATA_HARDWARE=ssd
+PGDATA_PARENT_DPATH=/mnt/nvme1n1/phw2/dbgym_tmp/
+
+# space for testing


Similar comment, but doesn't matter much.

lmwnshn · 2024-04-17T15:33:52Z

tune/protox/env/util/pg_conn.py

@@ -208,12 +220,59 @@ def start_with_changes(
                "Waiting for postgres to bootup but it is not..."
            )

-        # Move the temporary over since we know the temporary can load.
+        # Set up Boot if we're told to do so


This leaks abstractions between modules somewhat, (proto-x knowing about Boot), but from briefly thinking about it, it doesn't seem like fixing the abstraction leak is worth it.

I think in an ideal world, the connection handling logic would be independent of the tuner -- then stuff like Boot doesn't need to get pushed down into the tuner.

However, given that the tuner currently manages its own connections, this is fine. As long as there aren't more than 3-5 things that leak, it probably won't be too bad.

Good point, that's something I didn't realize. I think there is an argument for why Proto-X should know about Boot though, because when you use Proto-X to tune, you need to indicate whether Boot was used during tuning or not to know whether the numbers are real runtimes or estimated. We can revisit this later though.

wangpatrick57 added 28 commits April 6, 2024 13:32

now logging to artifacts/ instead of artifacts/artifacts/

d858d04

small change

e3fc4af

boot is now set up

a1a8ff3

set up boot

4f99c34

tpch sf1 exp

1341680

faster pgdata creation on ssd

aa7ce95

ray and boot redis port

75d0a2b

now starting redis correctly

70010e8

now using execute() instead of psql() to set up boot

0cc9401

now saving postgresql.auto.conf

276b180

centralized where shared preload libs were defined

6b6d7bf

added use boot option

6450b9f

added use boot during hpo

409dbd4

Merge branch 'integrate-boot' of github.com:wangpatrick57/dbgym into …

7fe2447

…integrate-boot

hpoed_params -> hpo_params in all but tune.py

643674f

Merge branch 'integrate-boot' of github.com:wangpatrick57/dbgym into …

4adc7fb

…integrate-boot

added boot config fpath

6c23215

Merge branch 'integrate-boot' of github.com:wangpatrick57/dbgym into …

0e8cd76

…integrate-boot

added use boot option

34c814b

Merge branch 'integrate-boot' of github.com:wangpatrick57/dbgym into …

4aba7de

…integrate-boot

now passing boot settings

00e8f38

added rebuild command for postgres build

11d7e41

fixed multiple shared preload libraries

3b50f74

now saving boot config path when opening

86b3699

fixed create_pgdata to use link_result and work if there's already pg…

2a3dc35

…data existing

added args for tune boot

9eaeafc

added enable boot option to tune

640b0d7

merge

334740c

lmwnshn self-requested a review April 17, 2024 15:18

lmwnshn approved these changes Apr 17, 2024

View reviewed changes

explained testing space more

41e82ca

wangpatrick57 merged commit 523ceae into cmu-db:main Apr 17, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrated Boot into Proto-X #31

Integrated Boot into Proto-X #31

wangpatrick57 commented Apr 15, 2024 •

edited

Loading

lmwnshn left a comment

lmwnshn Apr 17, 2024

wangpatrick57 Apr 17, 2024

lmwnshn Apr 17, 2024

lmwnshn Apr 17, 2024

wangpatrick57 Apr 17, 2024

Integrated Boot into Proto-X #31

Integrated Boot into Proto-X #31

Conversation

wangpatrick57 commented Apr 15, 2024 • edited Loading

lmwnshn left a comment

Choose a reason for hiding this comment

lmwnshn Apr 17, 2024

Choose a reason for hiding this comment

wangpatrick57 Apr 17, 2024

Choose a reason for hiding this comment

lmwnshn Apr 17, 2024

Choose a reason for hiding this comment

lmwnshn Apr 17, 2024

Choose a reason for hiding this comment

wangpatrick57 Apr 17, 2024

Choose a reason for hiding this comment

wangpatrick57 commented Apr 15, 2024 •

edited

Loading