Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add latest cuGraph #458

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from
Draft

Conversation

gitbuda
Copy link
Member

@gitbuda gitbuda commented Mar 29, 2024

  • Make sure all API changes produce correct results -> TESTING 🧪
    • cugraph.balanced_cut_clustering -> NO TESTS -> add at least the empty test
    • cugraph.betweenness_centrality -> unable to load symbol (SS below) -> tests are passing
    • cugraph.generator -> NO TESTS -> add at least the empty test
    • cugraph.hits
    • cugraph.katz_centrality
    • cugraph.leiden ⏳ -> unable to load symbol (SS below) -> free(): invalid pointer error
    • cugraph.louvain ⏳ -> invalid pointer error
    • cugraph.pagerank
    • cugraph.personalized_pagerank
    • cugraph.spectral_clustering -> NO TESTS -> add at least the empty test
  • Figure out undefined symbols under laiden and betweenness_centrality
    Screenshot 2024-05-12 at 3 06 22 PM
    • NOTE: The problem here was that even though /opt/conda/include/cugraph/algorithms.hpp contains functions that could be compiled, /opt/conda/lib/libcugraph.so only contains certain instantiations of these functions (NOTE: it's not just template arguments, also whole functions are missing) -> a useful command to figure stuff out is `nm -C /opt/conda/lib/libcugraph.so | grep "cugraph::betweenness"
  • Merge main and resolve conflicts
  • Minimize the image as much as possible in limited time
  • Make the pipeline to push image to Dockerhub
  • Push experimental image to Dockerhub
  • Figure out ML vs no-ML build (SLACK discussion)
  • --> DELIVERABLE#1 Follow up HERE (Discord) <--
  • Upgrade Python pip libs for the full mage with cugraph Docker image
  • Consider adding a template generator for Dockerfiles or some form of includes -> https://codeberg.org/devthefuture/dockerfile-x#include
  • Upgrade dgl -> any build (native/Docker) fails -> figure out how to compile
  • Extend mage/setup to pick individual libraries to install -> add that as a docker argument because usually, someone needs a specific binary (while the whole think doesn't compile); consider adding init bash script because usually, installation of deps fails (e.g. gdl is a very complex dependency)

Docker Build Commands

# make sure memgraph submodule under mage is initialized https://git-scm.com/book/en/v2/Git-Tools-Submodules

docker build -f Dockerfile.experiment_partial -t mage-cugraph-part --progress plain .
docker run -it --rm --gpus all mage-cugraph-part bash

docker build -f Dockerfile.experiment_full -t mage-cugraph-full --progress plain .

docker run -it --rm --gpus all mage-cugraph-full bash
cd /mage/cpp/build
cmake -DMAGE_CUGRAPH_ENABLE=ON -DMAGE_CUGRAPH_ROOT=/opt/conda ..
VERBOSE=1 make cugraph.pagerank
make cugraph.pagerank cugraph.personalized_pagerank cugraph.louvain cugraph.katz_centrality cugraph.leiden cugraph.betweenness_centrality cugraph.balanced_cut_clustering cugraph.spectral_clustering cugraph.hits cugraph.generator

python3 /mage/setup build --gpu --cpp-build-flags CMAKE_BUILD_TYPE=Release MAGE_CUGRAPH_ROOT=/opt/conda/ -p /usr/lib/memgraph/query_modules/

docker run -it --rm --gpus all -p 7687:7687 mage-cugraph-full --log-level=TRACE --also-log-to-stderr

# if building memgraph, do it outside the /mage/cpp/memgraph because that's copied to during Docker build
cd /tmp/mage/cpp/dist
cp ../build/cugraph_module/cugraph.pagerank.so ./
# where ever memgraph is built, cd there
./memgrpah --storage-properties-on-edges=True --query-modules-directory=/tmp/mage/cpp/dist --log-level=TRACE --also-log-to-stderr

cd /tmp/mage && ./test_e2e -k "pagerank_test-test_cugraph_influential"

Docker Base Images

Done

  • Compile only the cuGraph module with latest everything
  • Find and link libcugraph-ops++.so and libraft.so -> taken from rapidsai/base Docker image under /opt/conda/lib
    Screenshot 2024-05-12 at 1 53 02 PM
  • Make cugraph.pagerank to work correctly, implement the package process and release Docker image only with memgraph and pagerank internally

Experiment / Optional

  • Take a look at the edge mask feature (there is no masking of vertices)

Description

Please briefly explain the changes you made here.

Pull request type

  • Bugfix
  • Algorithm/Module
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

Related issues

Delete if this PR doesn't resolve any issues. Link the issue if it does.

######################################

Reviewer checklist (the reviewer checks this part)

Module/Algorithm

  • Core algorithm/module implementation
  • Query module implementation
  • Tests provided (unit / e2e)
  • Code documentation
  • README short description

Documentation checklist

  • Add the documentation label tag
  • Add the bug / feature label tag
  • Add the milestone for which this feature is intended
    • If not known, set for a later milestone
  • Write a release note, including added/changed clauses
    • [Release note text]
  • Link the documentation PR here
    • [Documentation PR link]
  • Tag someone from docs team in the comments

@@ -108,6 +110,6 @@ macro(target_mage_cugraph target_name)
PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MAGE_CXX_FLAGS}>"
"$<$<COMPILE_LANGUAGE:CUDA>:${MAGE_CUDA_FLAGS}>"
)
target_link_libraries("${target_name}" PRIVATE mage_cugraph)
target_link_libraries("${target_name}" PRIVATE mage_cugraph CUDA::toolkit)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDA::toolkit here probably should be deleted here because it's already included.

@@ -64,8 +64,8 @@ void BetweennessProc(mgp_list *args, mgp_graph *graph, mgp_result *result, mgp_m
weight_property, kDefaultWeight);
if (mg_graph->Empty()) return;

// TODO(gitbuda): The betweenness_centrality legacy function still works -> try to replace with latest option.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +87 to +91
// TODO(gitbuda): Update types and inject valid ids / edge types.
rmm::device_uvector<int64_t> cu_edge_ids(mg_edge_ids.size(), stream);
raft::update_device(cu_edge_ids.data(), mg_edge_ids.data(), mg_edge_ids.size(), stream);
rmm::device_uvector<int32_t> cu_edge_types(mg_edge_types.size(), stream);
raft::update_device(cu_edge_types.data(), mg_edge_types.data(), mg_edge_types.size(), stream);
Copy link
Member Author

@gitbuda gitbuda May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all optional, the only algo using edge edge ids and types, uniform_neighbor sampling, part of the GNN workflow

handle, std::move(cu_vertices), std::move(cu_src), std::move(cu_dst), std::move(cu_weight),
cugraph::graph_properties_t{false, false}, false, false);
std::move(cu_edge_ids), std::move(cu_edge_types), cugraph::graph_properties_t{false, false}, false, false);
stream.synchronize_no_throw();

return std::move(cu_graph);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add weights here that's 100% required

// TODO(gitbuda): Update types and inject valid ids / edge types.
rmm::device_uvector<int64_t> cu_edge_ids(mg_edge_ids.size(), stream);
raft::update_device(cu_edge_ids.data(), mg_edge_ids.data(), mg_edge_ids.size(), stream);
rmm::device_uvector<int32_t> cu_edge_types(mg_edge_types.size(), stream);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edge type is int32_t if prebuilt libraries are use
but since this is not used in our current use-cases, delete this part because it will just do useless work

// Define handle and operation stream
raft::handle_t handle{};
auto stream = handle.get_stream();
// TODO(gitbuda): Inject the valid seed.
raft::random::RngState rng_state(0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some mechanism for selecting random seed
depends on the use-case, param to the caller, there is a parallel race condition 10% of the times with the same seed

@nad010286
Copy link

any update? look like cugraph.pagerank still results different output compared with pagerank cpu version

@gitbuda
Copy link
Member Author

gitbuda commented May 30, 2024

@nad010286 nothing yet, just didn't have a chance to finish this work, but it's still on the short-term TODO list 😄

@gitbuda
Copy link
Member Author

gitbuda commented Jun 2, 2024

Hi @nad010286! I've made some progress on a small scale. Seems like the regular CPU pagerank and cugraph pagerank are producing the same outputs (CPU test case vs cugraph test case) 🤔

Can you please provide a test case from your side (something reasonable in size, max 100k nodes + edges), in the e2e test format 🙏:

  • The graph should be in the .cyp format (it's actually cypher per line, it can be many lines, but it's important to not break down single query into multiple lines) GRAPH INPUT EXAMPLE
  • The test case should be in the .yml format (make sure you have everything, the exact query, and exact outputs) TEST CASE EXAMPLE

Copy link

sonarcloud bot commented Jun 2, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
6 Security Hotspots

See analysis details on SonarCloud

@nad010286
Copy link

nad010286 commented Jun 6, 2024

Hi @nad010286! I've made some progress on a small scale. Seems like the regular CPU pagerank and cugraph pagerank are producing the same outputs (CPU test case vs cugraph test case) 🤔

Can you please provide a test case from your side (something reasonable in size, max 100k nodes + edges), in the e2e test format 🙏:

  • The graph should be in the .cyp format (it's actually cypher per line, it can be many lines, but it's important to not break down single query into multiple lines) GRAPH INPUT EXAMPLE
  • The test case should be in the .yml format (make sure you have everything, the exact query, and exact outputs) TEST CASE EXAMPLE

hey, Im having issue to build the cuGraph and MAGE from source :( any instructions or things I have to pay attention to? Or if you can send me the docker image that you already built, Im happy to perform some tests for you with loads of graph data

@intuitiveminds
Copy link

Hi @nad010286! I've made some progress on a small scale. Seems like the regular CPU pagerank and cugraph pagerank are producing the same outputs (CPU test case vs cugraph test case) 🤔
Can you please provide a test case from your side (something reasonable in size, max 100k nodes + edges), in the e2e test format 🙏:

  • The graph should be in the .cyp format (it's actually cypher per line, it can be many lines, but it's important to not break down single query into multiple lines) GRAPH INPUT EXAMPLE
  • The test case should be in the .yml format (make sure you have everything, the exact query, and exact outputs) TEST CASE EXAMPLE

hey, Im having issue to build the cuGraph and MAGE from source :( any instructions or things I have to pay attention to? Or if you can send me the docker image that you already built, Im happy to perform some tests for you with loads of graph data

I would also be very interested in an updated guide on how to build cugraph mage - I tried building the main branch with CUDA 11.8 but the nvcc compiler does not support C++ 20, so it fails with unorder_map::contains later on.

@nad010286
Copy link

managed to build and run some tests @gitbuda
The results are similar with very small difference. But processing time is worse than CPU. Not sure if GPU is used because by monitoring the GPU via nvidia-smi, I can only see power consumption increase after the result is out. Also no VRAM is used.

Copy link

sonarcloud bot commented Aug 7, 2024

Copy link

sonarcloud bot commented Sep 15, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants