-
Notifications
You must be signed in to change notification settings - Fork 3
Fast Data Retrieval for Analytics Databases #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 16 commits
72ddc43
3d9194f
7788516
10cadb5
e64c943
6884f1a
56b0b12
13c39c1
14b8023
443e8f3
fd30f69
985578d
c4e9be9
a748077
764464c
fa1de3f
08fb114
9dafd5f
0144b72
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,6 +3,7 @@ channels: | |
- conda-forge | ||
- nodefaults | ||
dependencies: | ||
- pandas | ||
- python>=3.8 | ||
- pytest | ||
- pytest-cov | ||
|
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -3,6 +3,7 @@ | |||||||||
import functools | ||||||||||
import json | ||||||||||
import operator | ||||||||||
import warnings | ||||||||||
from abc import ABC, abstractmethod | ||||||||||
from collections import Counter | ||||||||||
from dataclasses import dataclass | ||||||||||
|
@@ -11,6 +12,17 @@ | |||||||||
import sqlalchemy as sa | ||||||||||
from sqlalchemy.sql.expression import FromClause | ||||||||||
|
||||||||||
from .utils import check_module_installed | ||||||||||
|
||||||||||
snowflake_available = check_module_installed("snowflake") | ||||||||||
pandas_available = check_module_installed("pandas") | ||||||||||
|
||||||||||
|
||||||||||
if snowflake_available and not pandas_available: | ||||||||||
warnings.warn( | ||||||||||
"For snowflake users: `pandas` is not installed, that means optimized data loading is not available." | ||||||||||
) | ||||||||||
|
||||||||||
|
||||||||||
def is_mssql(engine: sa.engine.Engine) -> bool: | ||||||||||
return engine.name == "mssql" | ||||||||||
|
@@ -648,7 +660,20 @@ def get_column( | |||||||||
|
||||||||||
if not aggregate_operator: | ||||||||||
selection = sa.select([column]) | ||||||||||
result = engine.connect().execute(selection).scalars().all() | ||||||||||
|
||||||||||
# snowflake-specific optimization iff pandas is installed | ||||||||||
if is_snowflake(engine) and pandas_available: | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Concerning the message, this is an option. You delay using the flag you created until here. I did split the if to make it clearer, but you can find a way to do this without splitting the if, you just have to first check if pandas is available or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wanted to try that one as well, unfortunately, now the error message would be printed for each and every query. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough, then you can try to import snowflake, if it succeeds you can do the pandas check and inform in case it's not present. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have created a Now, we just do from .utils import check_module_installed
snowflake_available = check_module_installed("snowflake")
pandas_available = check_module_installed("pandas") There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ivergara are you satisfied with this solution? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. |
||||||||||
snowflake_cursor = engine.connect().connection.cursor() | ||||||||||
|
||||||||||
# note: in addition to pyarrow, this currently requires pandas as well | ||||||||||
pa_table = snowflake_cursor.execute(str(selection)).fetch_arrow_all() | ||||||||||
if pa_table: # snowflake connector returns NoneType when the table is empty | ||||||||||
result = pa_table.column(0).to_numpy() | ||||||||||
YYYasin19 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
else: | ||||||||||
result = [] | ||||||||||
|
||||||||||
else: | ||||||||||
result = engine.connect().execute(selection).scalars().all() | ||||||||||
|
||||||||||
else: | ||||||||||
selection = sa.select([aggregate_operator(column)]) | ||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
def check_module_installed(module_name: str) -> bool: | ||
import importlib | ||
|
||
try: | ||
mod = importlib.import_module(module_name) | ||
return mod is not None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not just return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope, just checked, it always throws an error :) |
||
except ModuleNotFoundError: | ||
return False |
Uh oh!
There was an error while loading. Please reload this page.