Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Parallelize multiple cuDF query calls #7053

Closed
yuriy312 opened this issue Dec 29, 2020 · 5 comments
Closed

[QST] Parallelize multiple cuDF query calls #7053

yuriy312 opened this issue Dec 29, 2020 · 5 comments
Labels
Python Affects Python cuDF API. question Further information is requested

Comments

@yuriy312
Copy link

yuriy312 commented Dec 29, 2020

Hi, is there a CUDA way to parallelize multiple query calls on a cuDF dataframe or multiple query() calls on different cuDF dataframes?

Here is an example code snippet, following query doesn't take much GPU cycles and I'd like to be able to launch multiple such queries on copies of the same dataframe or different cuDF dataframes


NUM_ELEMENTS = 100000

df = cudf.DataFrame()
df['value1'] = cp.random.sample(NUM_ELEMENTS)
df['value2'] = cp.random.sample(NUM_ELEMENTS)
df['value3'] = cp.random.sample(NUM_ELEMENTS)


c1 = np.random.random()
c2 = np.random.random()
c3 = np.random.random()
res = df.query('((value1 < @c1) & (value2 > @c2) & (value3 < @c3))')

In Pandas I can run one query (or any Pandas function for that matter) per CPU core and I'd like to know what the equivalent model is in CUDA world.

Thanks

@yuriy312 yuriy312 added Needs Triage Need team to review and classify question Further information is requested labels Dec 29, 2020
@yuriy312 yuriy312 changed the title Parallelize multiple cuDF query calls [QST] Parallelize multiple cuDF query calls Dec 29, 2020
@harrism
Copy link
Member

harrism commented Jan 5, 2021

@kkraus14 @quasiben is there a dask approach that can be applied here?

@quasiben
Copy link
Member

quasiben commented Jan 5, 2021

Dask could be helpful here. Dask's implementation here is a blocked version of the pandas query (is this true for cuDF ?). So you could write something like the following:

In [1]: import dask

In [2]: ddf = dask.datasets.timeseries()

In [3]: q1 = ddf.query('x > 0.5')

In [4]: q2 = ddf.query('x < 0')

In [5]: res = dask.compute([q1, q2])

In [6]: res
Out[6]:
([                       id     name         x         y
  timestamp
  2000-01-01 00:00:15   986    Sarah  0.577679 -0.792372
  2000-01-01 00:00:18   992   Xavier  0.757104 -0.926112
  2000-01-01 00:00:19  1005   Ursula  0.723600  0.459464
  2000-01-01 00:00:20   961   Ingrid  0.804942  0.001995
  2000-01-01 00:00:22   952    Alice  0.993793  0.806622
  ...                   ...      ...       ...       ...
  2000-01-30 23:59:45  1032    Quinn  0.919070 -0.472543
  2000-01-30 23:59:48  1009  Norbert  0.631444  0.779816
  2000-01-30 23:59:54   998      Tim  0.599235 -0.834612
  2000-01-30 23:59:55   967    Frank  0.769159  0.853424
  2000-01-30 23:59:58   966   Ursula  0.866536  0.793179

  [647943 rows x 4 columns],
                         id     name         x         y
  timestamp
  2000-01-01 00:00:01  1004  Michael -0.365752  0.700723
  2000-01-01 00:00:02   963   Yvonne -0.616555  0.462348
  2000-01-01 00:00:04  1022  Norbert -0.636567 -0.090011
  2000-01-01 00:00:05   976      Tim -0.305983 -0.539153
  2000-01-01 00:00:06   994   George -0.593397 -0.349443
  ...                   ...      ...       ...       ...
  2000-01-30 23:59:51   993    Kevin -0.988354  0.510483
  2000-01-30 23:59:52  1015    Quinn -0.686493  0.710052
  2000-01-30 23:59:56  1049   Hannah -0.569486  0.934169
  2000-01-30 23:59:57  1013    Jerry -0.281122 -0.786347
  2000-01-30 23:59:59   960      Ray -0.591232  0.570027

  [1296654 rows x 4 columns]],)

With N+1 workers we should see the benefits parallelism -- looking at the task graph should give us a better idea:

Screen Shot 2021-01-04 at 8 26 06 PM

For a GPUs the graph above is somewhat limiting. Currently, with dask-cuda, we limit the number of threads/workers to 1 per GPU. This will hopefully change in the future as we are working towards stream support: rapidsai/dask-cuda#96

Still, I would expect to see some gains in that the data will be chunked and dask can leverage this chunking for execution on multiple GPUs

@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jan 8, 2021
@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@github-actions github-actions bot added the stale label Feb 16, 2021
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@vyasr
Copy link
Contributor

vyasr commented Oct 17, 2022

I'm going to close this question as answered, feel free to reopen if necessary.

@vyasr vyasr closed this as completed Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python Affects Python cuDF API. question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants