[Do not merge] Spike upgrading comps algorithm with taichi #236

jeancochrane · 2024-04-29T21:07:48Z

⚠️ I don't plan to merge this PR, but am requesting review for the sake of knowledge transfer. ⚠️

This PR tests out the use of taichi for the comps algorithm rather than numba.

Connects #229.

Findings

See Benchmarking below for detailed stats comparing this approach to our existing code on different architectures. Some high-level takeaways:

CUDA doesn't seem to make much of a difference, and is counterproductive if anything. This makes me wonder whether the algorithm needs to be redesigned to make better use of the GPU (note that numba has an entirely separate interface for CUDA programming), but I'm considering that question out of scope for now.
There are big performance gains to be had by simply bumping the instance type with the existing numba code. If the numbers below hold, we could speed up the comps code by 2x if we switched to c5.24xlarge instances, which are about twice as expensive as the m4.10xlarge instances we use now (meaning we should expect to break even on the change).
At small scales (20k observations/10k comparisons), taichi appears to outperform numba, but this improvement disappears if we scale up the size of the data. At a large scale (100k observations/50k comparisons), they perform about the same.
As evidenced by the code in this PR, the taichi interface is harder to work with than numba. Taichi is more strict about types, and doesn't support some basic operations that Python does (most notably, getting the shape of an array and returning an array from a function) making the code more confusing and un-Pythonic.

Benchmarking

20k observations, 10k comparisons

framework	instance type	arch	time	logs
taichi	g5.12xlarge	x86	2.36s	link
taichi	g5.12xlarge	CUDA	4.33s	link
taichi	m4.10xlarge	x86	4.44s	link
numba	g5.12xlarge	x86	6.07s	link
numba	m4.10xlarge	x86	10.52s	link

100k observations, 50k comparisons

framework	instance type	arch	time	logs
numba	g5.12xlarge	x86	31.87s	link
taichi	c5.24xlarge	x86	31.93s	link
taichi	m4.10xlarge	x86	34.09s	link
numba	c5.24xlarge	x86	37.31s	link
taichi	g5.12xlarge	x86	37.75s	link
taichi	g5.12xlarge	CUDA	43.58s	link
numba	m4.10xlarge	x86	64.19s	link

…yaml

jeancochrane · 2024-05-01T16:12:38Z

Dockerfile

-ENV RENV_CONFIG_SANDBOX_ENABLED FALSE
-ENV RENV_PATHS_LIBRARY renv/library
-ENV RENV_PATHS_CACHE /setup/cache
+FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime


Installing CUDA appears to be a pain, so for the purposes of benchmarking this PR I just switched to a base image that comes with CUDA installed and refactored the Dockerfile to only install dependencies necessary for running python/comps.py.

jeancochrane · 2024-05-01T16:15:22Z

python/comps.py

+    num_observations = observation_matrix.shape[0]
+    num_price_bins = price_bin_matrix.shape[0]


shape and len attributes are not available inside taichi kernels, so the recommended approach for these types of array metadata is to compute them outside the kernel and then pass them into the kernel as arguments.

jeancochrane · 2024-05-01T16:17:30Z

python/comps.py

+    observation_matrix = observation_df[["id", "predicted_value"]].values
+    taichi_obs_ndarray = ti.ndarray(dtype=int, shape=observation_matrix.shape)
+    taichi_obs_ndarray.from_numpy(observation_matrix)


Taichi cannot automatically convert numpy arrays to the data structures that it works with (called fields) so we have to do that manually prior to passing them into the kernel.

jeancochrane · 2024-05-01T16:18:04Z

python/comps.py

+    # Output vector
+    binned_vector = ti.ndarray(dtype=int, shape=(num_observations, 1))


Taichi also cannot return fields from kernels, so we need to define the output data structure ahead of time and then mutate it in the kernel function.

jeancochrane · 2024-05-01T16:19:04Z

python/comps.py

-        for bin in price_bin_matrix:
+    for obs_idx in range(num_observations):
+        observation_price = observation_matrix[obs_idx, observation_price_idx]
+        bin_found = False


for/else constructs are not supported in taichi, so we have to use a flag variable instead.

jeancochrane · 2024-05-01T16:20:24Z

python/comps.py

-        top_n_idxs = np.full(num_comps, -1, dtype=idx_dtype)
-        top_n_scores = np.zeros(num_comps, dtype=score_dtype)


Arrays cannot be defined inside a taichi kernel, and must instead be defined outside its scope and then passed in as an argument; hence, we needed to refactor this function to remove the dynamic allocation of arrays inside the context of the function.

wagnerlmichael

Nice catch on the instance bump! Thanks for sharing your findings. I wonder if mentioning the CUDA interface to the ex-intern who might take a look at this would be helpful, although I suspect their class is teaching CUDA interface programming.

I would also be happy to try the CUDA interface, I worked with it a bit in graduate school. I think trying CuPy could also be worthwhile.

dfsnow

Super cool work. Sad that it didn't speed things up. Sorry for sending you down the rabbit hole. Let's leave this up as I'd like to test it with a local GPU as well just to make sure we're using full GPU capacity.

jeancochrane and others added 3 commits April 26, 2024 17:20

WIP spec out comps refactor to taichi

2a540d8

Get comps.py running with taichi

2e4e1ca

Temporarily update Dockerfile to test python/comps.py

0daa9ac

jeancochrane linked an issue Apr 29, 2024 that may be closed by this pull request

Spike upgrading comps algorithm with taichi #229

Open

jeancochrane had a problem deploying to deploy April 29, 2024 21:48 — with GitHub Actions Failure

Install pip requirements in Dockerfile

08abed9

jeancochrane temporarily deployed to deploy April 29, 2024 22:10 — with GitHub Actions Inactive

Switch up comps pipeline to use gpu

1868e0d

jeancochrane temporarily deployed to deploy April 29, 2024 22:26 — with GitHub Actions Inactive

jeancochrane mentioned this pull request Apr 30, 2024

Temporarily update Dockerfile to run python/comps.py #237

Closed

jeancochrane added 2 commits April 30, 2024 11:53

Try switching Dockerfile base image for CUDA support

c2b8d86

Fix pytorch image path in Dockerfile

7b40331

jeancochrane had a problem deploying to deploy April 30, 2024 17:06 — with GitHub Actions Failure

Install libx11-dev in Dockerfile

7e99f2b

jeancochrane temporarily deployed to deploy April 30, 2024 19:38 — with GitHub Actions Inactive

Bump number of observations and comparisons in python/comps.py

90e7923

jeancochrane temporarily deployed to deploy April 30, 2024 19:53 — with GitHub Actions Inactive

Switch back to CPU instances for benchmarking in build-and-run-model.…

e34f68a

…yaml

jeancochrane temporarily deployed to deploy April 30, 2024 20:07 — with GitHub Actions Inactive

Try running python/comps.py on a GPU instance but configured for CPU

1afb1f2

jeancochrane temporarily deployed to deploy April 30, 2024 20:22 — with GitHub Actions Inactive

jeancochrane temporarily deployed to deploy April 30, 2024 20:39 — with GitHub Actions Inactive

Try c5 instances for benchmarking

3ae6262

jeancochrane temporarily deployed to deploy April 30, 2024 21:33 — with GitHub Actions Inactive

jeancochrane mentioned this pull request May 1, 2024

Spike upgrading comps algorithm with taichi #229

Open

Fix stray commas in python/comps.py

7bea4b6

jeancochrane commented May 1, 2024

View reviewed changes

jeancochrane changed the title ~~Spike upgrading comps algorithm with taichi~~ [Do not merge] Spike upgrading comps algorithm with taichi May 1, 2024

jeancochrane marked this pull request as ready for review May 1, 2024 16:20

jeancochrane requested a review from dfsnow as a code owner May 1, 2024 16:20

jeancochrane requested a review from wrridgeway as a code owner May 1, 2024 16:20

jeancochrane requested a review from wagnerlmichael May 1, 2024 16:21

wagnerlmichael reviewed May 1, 2024

View reviewed changes

dfsnow reviewed May 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do not merge] Spike upgrading comps algorithm with taichi #236

[Do not merge] Spike upgrading comps algorithm with taichi #236

jeancochrane commented Apr 29, 2024 •

edited

Loading

jeancochrane May 1, 2024

jeancochrane May 1, 2024

jeancochrane May 1, 2024

jeancochrane May 1, 2024

jeancochrane May 1, 2024

jeancochrane May 1, 2024

wagnerlmichael left a comment

dfsnow left a comment

		num_observations = observation_matrix.shape[0]
		num_price_bins = price_bin_matrix.shape[0]

		# Output vector
		binned_vector = ti.ndarray(dtype=int, shape=(num_observations, 1))

		top_n_idxs = np.full(num_comps, -1, dtype=idx_dtype)
		top_n_scores = np.zeros(num_comps, dtype=score_dtype)

[Do not merge] Spike upgrading comps algorithm with taichi #236

Are you sure you want to change the base?

[Do not merge] Spike upgrading comps algorithm with taichi #236

Conversation

jeancochrane commented Apr 29, 2024 • edited Loading

Findings

Benchmarking

20k observations, 10k comparisons

100k observations, 50k comparisons

jeancochrane May 1, 2024

Choose a reason for hiding this comment

jeancochrane May 1, 2024

Choose a reason for hiding this comment

jeancochrane May 1, 2024

Choose a reason for hiding this comment

jeancochrane May 1, 2024

Choose a reason for hiding this comment

jeancochrane May 1, 2024

Choose a reason for hiding this comment

jeancochrane May 1, 2024

Choose a reason for hiding this comment

wagnerlmichael left a comment

Choose a reason for hiding this comment

dfsnow left a comment

Choose a reason for hiding this comment

jeancochrane commented Apr 29, 2024 •

edited

Loading