Skip to content

Commit

Permalink
Unrevert: feat: Remove query splitters from the API (#5581)
Browse files Browse the repository at this point in the history
* Original commit: "feat: Remove query splitters from the API (#5571)"

After testing how much the splitters were being used, and the improvement they
were providing, splitters were only being used on a very small percentage
of queries (up to 5% on the highest dataset). After pausing the splitters,
there was almost no impact on the performance of queries. Only a single
referrer had a noticeable performance degradation, and that was not enough to
cause any actual impact on the queries (rate limiting, timeouts etc.)

* Unrevert: feat: Remove query splitters from the API

This was reverted because it was causing a test failure in Sentry master.
  • Loading branch information
evanh authored Mar 15, 2024
1 parent 7ddd864 commit ae9d2a5
Show file tree
Hide file tree
Showing 19 changed files with 55 additions and 1,077 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,7 @@ jobs:
tests/sentry/eventstream/kafka \
tests/sentry/post_process_forwarder \
tests/sentry/snuba \
tests/sentry/eventstore/snuba \
tests/sentry/search/events \
tests/sentry/event_manager \
tests/sentry/api/endpoints/test_organization_profiling_functions.py \
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ api-tests:
SNUBA_SETTINGS=test pytest -vv tests/*_api.py

backend-typing:
mypy snuba tests scripts --strict --config-file mypy.ini --exclude 'tests/datasets|tests/query|tests/test_split.py'
mypy snuba tests scripts --strict --config-file mypy.ini --exclude 'tests/datasets|tests/query'

install-python-dependencies:
pip uninstall -qqy uwsgi # pip doesn't do well with swapping drop-ins
Expand Down
18 changes: 0 additions & 18 deletions docs/source/architecture/queryprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,24 +122,6 @@ finds equality conditions on tags and replace them with the equivalent condition
on a tags hashmap (where we have a bloom filter index) making the filtering
operation faster.

Query Splitter
--------------

Some queries can be executed in an optimized way by splitting them into multiple
individual Clickhouse queries and by assembling the results of each one of them.

Two examples are time splitting and column splitting. Both hare `in this file <https://github.com/getsentry/snuba/blob/master/snuba/web/split.py>`_.

Time splitting splits a query (that does not contain aggregations and is properly
sorted) into multiple ones over a variable time range that increases in size
progressively and executes them in sequence stopping as soon we have enough
results.

Column splitting splits filtering and column fetching. It executes the filtering
part of the query on a minimal number of columns so Clickhouse loads fewer columns,
then, through a second query, fetches the missing columns only for the rows
filtered by the first query.

Query Formatter
---------------

Expand Down
9 changes: 0 additions & 9 deletions snuba/datasets/configuration/discover/storages/discover.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -163,12 +163,3 @@ allocation_policies:
default_config_overrides:
is_enforced: 0
is_active: 0
query_splitters:
- splitter: ColumnSplitQueryStrategy
args:
id_column: event_id
project_column: project_id
timestamp_column: timestamp
- splitter: TimeSplitQueryStrategy
args:
timestamp_col: timestamp
9 changes: 0 additions & 9 deletions snuba/datasets/configuration/events/storages/errors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -323,15 +323,6 @@ query_processors:
- environment
- project_id
- processor: TableRateLimit
query_splitters:
- splitter: ColumnSplitQueryStrategy
args:
id_column: event_id
project_column: project_id
timestamp_column: timestamp
- splitter: TimeSplitQueryStrategy
args:
timestamp_col: timestamp
mandatory_condition_checkers:
- condition: ProjectIdEnforcer
replacer_processor:
Expand Down
9 changes: 0 additions & 9 deletions snuba/datasets/configuration/events/storages/errors_ro.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -322,12 +322,3 @@ query_processors:
- environment
- project_id
- processor: TableRateLimit
query_splitters:
- splitter: ColumnSplitQueryStrategy
args:
id_column: event_id
project_column: project_id
timestamp_column: timestamp
- splitter: TimeSplitQueryStrategy
args:
timestamp_col: timestamp
9 changes: 1 addition & 8 deletions snuba/datasets/configuration/json_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ def string_with_description(description: str) -> dict[str, str]:
"description": "The stream loader for a writing to ClickHouse. This provides what is needed to start a Kafka consumer and fill in the ClickHouse table.",
}


######
# Column specific json schemas
def make_column_schema(
Expand Down Expand Up @@ -332,11 +333,6 @@ def registered_class_array_schema(
"QueryProcessor",
"Name of ClickhouseQueryProcessor class config key. Responsible for the transformation applied to a query.",
)
STORAGE_QUERY_SPLITTERS_SCHEMA = registered_class_array_schema(
"splitter",
"QuerySplitStrategy",
"Name of QuerySplitStrategy class config key. Responsible for splitting a query into two at runtime and combining the results.",
)
STORAGE_MANDATORY_CONDITION_CHECKERS_SCHEMA = registered_class_array_schema(
"condition",
"ConditionChecker",
Expand Down Expand Up @@ -542,7 +538,6 @@ def registered_class_array_schema(
"readiness_state": READINESS_STATE_SCHEMA,
"schema": SCHEMA_SCHEMA,
"query_processors": STORAGE_QUERY_PROCESSORS_SCHEMA,
"query_splitters": STORAGE_QUERY_SPLITTERS_SCHEMA,
"mandatory_condition_checkers": STORAGE_MANDATORY_CONDITION_CHECKERS_SCHEMA,
"allocation_policies": STORAGE_ALLOCATION_POLICIES_SCHEMA,
},
Expand All @@ -569,7 +564,6 @@ def registered_class_array_schema(
"schema": SCHEMA_SCHEMA,
"stream_loader": STREAM_LOADER_SCHEMA,
"query_processors": STORAGE_QUERY_PROCESSORS_SCHEMA,
"query_splitters": STORAGE_QUERY_SPLITTERS_SCHEMA,
"mandatory_condition_checkers": STORAGE_MANDATORY_CONDITION_CHECKERS_SCHEMA,
"allocation_policies": STORAGE_ALLOCATION_POLICIES_SCHEMA,
"replacer_processor": STORAGE_REPLACER_PROCESSOR_SCHEMA,
Expand Down Expand Up @@ -607,7 +601,6 @@ def registered_class_array_schema(
"postgres_table": TYPE_STRING,
"row_processor": CDC_STORAGE_ROW_PROCESSOR_SCHEMA,
"query_processors": STORAGE_QUERY_PROCESSORS_SCHEMA,
"query_splitters": STORAGE_QUERY_SPLITTERS_SCHEMA,
"mandatory_condition_checkers": STORAGE_MANDATORY_CONDITION_CHECKERS_SCHEMA,
"allocation_policies": STORAGE_ALLOCATION_POLICIES_SCHEMA,
"replacer_processor": STORAGE_REPLACER_PROCESSOR_SCHEMA,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,6 @@ query_processors:
- processor: TableRateLimit
- processor: TupleUnaliaser

query_splitters:
- splitter: TimeSplitQueryStrategy
args:
timestamp_col: end_timestamp

mandatory_condition_checkers:
- condition: ProjectIdEnforcer

Expand Down
5 changes: 0 additions & 5 deletions snuba/datasets/configuration/spans/storages/spans.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -169,11 +169,6 @@ query_processors:
- processor: TableRateLimit
- processor: TupleUnaliaser

query_splitters:
- splitter: TimeSplitQueryStrategy
args:
timestamp_col: end_timestamp

mandatory_condition_checkers:
- condition: ProjectIdEnforcer

Expand Down
5 changes: 0 additions & 5 deletions snuba/datasets/configuration/storage_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
from snuba.datasets.configuration.utils import (
get_mandatory_condition_checkers,
get_query_processors,
get_query_splitters,
parse_columns,
)
from snuba.datasets.message_filters import StreamMessageFilter
Expand Down Expand Up @@ -43,7 +42,6 @@
STREAM_LOADER = "stream_loader"
PRE_FILTER = "pre_filter"
QUERY_PROCESSORS = "query_processors"
QUERY_SPLITTERS = "query_splitters"
MANDATORY_CONDITION_CHECKERS = "mandatory_condition_checkers"
WRITER_OPTIONS = "writer_options"
SUBCRIPTION_SCHEDULER_MODE = "subscription_scheduler_mode"
Expand Down Expand Up @@ -79,9 +77,6 @@ def __build_readable_storage_kwargs(config: dict[str, Any]) -> dict[str, Any]:
QUERY_PROCESSORS: get_query_processors(
config[QUERY_PROCESSORS] if QUERY_PROCESSORS in config else []
),
QUERY_SPLITTERS: get_query_splitters(
config[QUERY_SPLITTERS] if QUERY_SPLITTERS in config else []
),
MANDATORY_CONDITION_CHECKERS: get_mandatory_condition_checkers(
config[MANDATORY_CONDITION_CHECKERS]
if MANDATORY_CONDITION_CHECKERS in config
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -248,11 +248,6 @@ query_processors:
- processor: TableRateLimit
- processor: TupleUnaliaser

query_splitters:
- splitter: TimeSplitQueryStrategy
args:
timestamp_col: finish_ts

mandatory_condition_checkers:
- condition: ProjectIdEnforcer

Expand Down
17 changes: 0 additions & 17 deletions snuba/datasets/configuration/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
String,
UInt,
)
from snuba.datasets.plans.splitters import QuerySplitStrategy
from snuba.query.processors.condition_checkers import ConditionChecker
from snuba.query.processors.physical import ClickhouseQueryProcessor
from snuba.utils.schemas import (
Expand All @@ -31,11 +30,6 @@ class QueryProcessorDefinition(TypedDict):
args: dict[str, Any]


class QuerySplitterDefinition(TypedDict):
splitter: str
args: dict[str, Any]


class MandatoryConditionCheckerDefinition(TypedDict):
condition: str
args: dict[str, Any]
Expand All @@ -52,17 +46,6 @@ def get_query_processors(
]


def get_query_splitters(
query_splitter_objects: list[QuerySplitterDefinition],
) -> list[QuerySplitStrategy]:
return [
QuerySplitStrategy.get_from_name(qs["splitter"]).from_kwargs(
**qs.get("args", {})
)
for qs in query_splitter_objects
]


def get_mandatory_condition_checkers(
mandatory_condition_checkers_objects: list[MandatoryConditionCheckerDefinition],
) -> list[ConditionChecker]:
Expand Down
64 changes: 0 additions & 64 deletions snuba/datasets/plans/splitters/__init__.py

This file was deleted.

Loading

0 comments on commit ae9d2a5

Please sign in to comment.