Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the ability to separate out reads and writes into their own connection pools. #344

Merged
merged 15 commits into from
Oct 30, 2023

Conversation

pjoshi30
Copy link
Contributor

This PR adds the ability to separate out reads and writes into their own separate DB connection pools. The reader pool would point to a read replica endpoint and the write pool would point to a writer endpoint. More details on how to do this has been added to the environment.md doc.

We have tested this change with Amazon Aurora (with the Postgres DB Engine v13.7). With Aurora Postgres, we have a single writer and multiple read replicas. As a side note, Aurora MySQL supports multiple writers which could potentially work with this setup, however, we have not tested these changes with Aurora MySQL.

A brief overview of changes:

  • Creates separate read/write pools via configuration.
  • Refactored some redundant code in the AsyncPostgresTable class.
  • Added documentation to explain read scaling and configuration with Amazon Aurora as an example.
  • Pulls in a few set of commits made by @romain-intel to optimize certain paths that showed up as a bottleneck in our scale tests.


>```
> USE_SEPARATE_READER_POOL = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking for something exactly like this PR. This variable seems a bit redundant. It could simply be:

USE_SEPARATE_READER_POOL = bool(MF_METADATA_DB_READ_REPLICA_HOST)

thus disabling the feature if no host is given and implicitly turning it on if one is given. This would also prevent the case where this is set to true and no host is given.

@pjoshi30 pjoshi30 force-pushed the pjoshi30_aurora_simple branch from 4cfbb2b to bf42469 Compare March 22, 2023 17:49
services/ui_backend_service/data/db/tables/run.py Outdated Show resolved Hide resolved
services/ui_backend_service/data/db/tables/run.py Outdated Show resolved Hide resolved
@@ -196,7 +206,8 @@ def __init__(self,
prefix="MF_METADATA_DB_",
pool_min: int = 1,
pool_max: int = 10,
timeout: int = 60):
timeout: int = 60,
read_replica_host: str = "localhost"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using localhost here as a default could lead to odd issues if the USE_SEPARATE_READER_POOL is set but the host has not been set. I would lean towards using the existence of [PREFIX]_READ_REPLICA_HOST as a feature gate directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed read_replica_host

@saikonen
Copy link
Collaborator

saikonen commented May 9, 2023

some testing notes:
I had a slight concern if this feature will have an effect on how the realtime updates for the UI work, but both notify.py and create_trigger_if_missing() are still using the "main" pool which should be writable. table triggers remain unaffected 👍

validated the PR by running a read replica locally. Basic operations seem to be working as intended.

Only pending concern is with the unrelated changes in the diff, which seem to stem from other open PR's not yet in master

@wangchy27 wangchy27 force-pushed the pjoshi30_aurora_simple branch 2 times, most recently from 16832a3 to 12209c9 Compare July 14, 2023 00:04
@romain-intel romain-intel force-pushed the pjoshi30_aurora_simple branch from 3019760 to 4c7c1b4 Compare August 14, 2023 21:48
services/data/postgres_async_db.py Outdated Show resolved Hide resolved
services/utils/__init__.py Outdated Show resolved Hide resolved
from functools import wraps
from typing import Dict
import logging
import psycopg2
from packaging.version import Version, parse

USE_SEPARATE_READER_POOL = os.environ.get("USE_SEPARATE_READER_POOL", "0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be cast to a boolean instead to make using it a bit cleaner.

Copy link
Collaborator

@saikonen saikonen Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also echo the comment that @dhpollack made on the need for a separate toggle for the reader pool, vs. simply looking at the existence of the MF_METADATA_DB_READ_REPLICA_HOST? This would cut down on the environment variables required to configure the feature as well.

The reader pool has a fallback to the default host if a read replica host is not configured. Is it a valid use case to want a separate reader pool if we're targeting only one host? wouldn't increasing the pool size serve the same purpose in this case?

@saikonen saikonen added the in review Currently under review label Aug 21, 2023
This remediates this by considering a run 'failed' if the hb hasn't been
updated within heartbeat_cutoff time as opposed to the heartbeat_threshold time
@roofurmston
Copy link

Would be interested in the functionality in this PR

Any idea on the status of it? Seems there have been some open comments for a long time.

@pjoshi30 @wangchy27 You are still interested in this PR?

saikonen and others added 4 commits October 25, 2023 13:06
* Upgrade Github actions used in `dockerimage` action (#379)

* upgrade github actions used in dockerimage action

* remove setup-buildx-action and pin to hashes.

* change deprecated pkg_resources to importlib.metadata (#387)

* In a previous commit, the detection of a failure became too aggressive. (#386)

* In a previous commit, the detection of a failure became too aggressive.

This remediates this by considering a run 'failed' if the hb hasn't been
updated within heartbeat_cutoff time as opposed to the heartbeat_threshold time

* change run finished at query to heartbeat_cutoff from threshold

* clean up unused values from run query

---------

Co-authored-by: Sakari Ikonen <[email protected]>

* fix PATH_PREFIX handling in metadata service so it doesn't interfere with mfgui routes (#388)

* Configurable SSL Connection (#373)

* [TRIS-297] Configurable SSL Connection (#1)

* Configurable SSL connection

* Update services/utils/__init__.py

* no ssl unit testing (#3)

* ssl seperate test (#4)

* dsn generator sslmode none (#5)

* fix run_goose.py not working without SSL mode env variables. (#390)

* change run inactive cutoff default to 6 minutes. cleanup unused constant (#392)

* clarify comment on read replica hosts

* make USE_SEPARATE_READER_POOL a boolean

* remove unnecessary conditionals for pool choice in execute_sql

---------

Co-authored-by: Tom Furmston <[email protected]>
Co-authored-by: Romain <[email protected]>
Co-authored-by: Oleg Avdeev <[email protected]>
Co-authored-by: RikishK <[email protected]>
@saikonen
Copy link
Collaborator

still an issue to be solved; the changes to the queries broke the transactional tag mutation, as that performs multiple queries over a cursor, which now gets closed mid-operation

@saikonen saikonen merged commit 741b1a5 into master Oct 30, 2023
6 checks passed
@saikonen saikonen deleted the pjoshi30_aurora_simple branch November 3, 2023 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in review Currently under review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants