Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(learning): feature_store & graph_store V1 #4237

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

Yi-Eaaa
Copy link

@Yi-Eaaa Yi-Eaaa commented Sep 18, 2024

What do these changes do?

Step 1: Implement GraphScope-based PyG Remote Backend and complete the end-to-end integration of GraphScope and PyG. (Finished)
Step 2: Get data from the Server through PyG Remote Backend and support sampling on the Client side. (Finished)

Related issue number

PyG Remote Backend Based on GraphScope #3739

Copy link

welcome bot commented Sep 18, 2024

Thanks for submitting your first pull request! You are awesome! 🤗
Please make sure you have signed the CLA, as this is a mandatory check before a PR being merged.
Welcome to the GraphScope community! 🎉 You can meet the community on DingTalk or Slack.

@CLAassistant
Copy link

CLAassistant commented Sep 18, 2024

CLA assistant check
All committers have signed the CLA.

@Yi-Eaaa Yi-Eaaa changed the title feature_store & graph_store V1 feat: feature_store & graph_store V1 Sep 18, 2024
@Zhanghyi
Copy link
Collaborator

@Yi-Eaaa Could you make sure that the PR-Check has passed?

r"""To be implemented by :class:`GSFeatureStore`."""
raise NotImplementedError

def _get_tensor_size(self, attr: TensorAttr) -> Optional[Tuple[int, ...]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do these methods _get_tensor_size get_all_tensor_attrs _build_features work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorAttr stores various attributes that uniquely represent the vertex/edge feature.
_get_tensor_size gets the length of the vertex feature corresponding to TensorAttr.
get_all_tensor_attrs retrieves all TensorAttrs that exist in the FeatureStore.
_build_features is what I referred to in dist_data earlier, it doesn't work here, I forgot to delete it.

r"""To be implemented by :class:`GSFeatureStore`."""
raise NotImplementedError

def get_all_edge_attrs(self) -> List[EdgeAttr]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does get_all_edge_attrs work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_all_edge_attrs returns the EdgeAttr for all types of edges of the graph. An EdgeAttr can uniquely represent one type of edge in a graph. A heterogeneous graph returns a list of multiple Edgeattrs, and a homogeneous graph returns a list containing one EdgeAttr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a replica of the original client.yaml?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the file I used to test FeatureStore and GraphStore in k8s, but the file name of the client code(PyG_client.py) has been changed.

from graphscope.learning.pyg_neighbor_sampler import PygNeighborSampler


class GLTNeighborLoader(NodeLoader):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between GLTNeighborLoader and NeighborLoader in PyG besides asserting neighbor_sampler is not None? or, can we just use NeighborLoader in PyG

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NeighborLoader PyG originally in iteration will call filter_fn () to convert the iterative output type (SamplerOutput/HeteroSamplerOutput - > Data/HeteroData), But the output of the PygNeighborSampler iteration is the Data/HeteroData type, so I've removed the parts related to filter_fn() here so that they can be properly connected.

Copy link
Collaborator

@Zhanghyi Zhanghyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow python code style guide to rename your file name and function name like GSFeatureStore.py and PyG_remote_backend

Copy link
Contributor

github-actions bot commented Sep 23, 2024

Please check the preview of the documentation changes at
https://87492b2b.graphscope-docs-preview.pages.dev

@Zhanghyi Zhanghyi changed the title feat: feature_store & graph_store V1 feat(learning): feature_store & graph_store V1 Sep 27, 2024
@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2024

Codecov Report

Attention: Patch coverage is 10.34483% with 26 lines in your changes missing coverage. Please review.

Project coverage is 60.99%. Comparing base (1de43e9) to head (976650c).
Report is 123 commits behind head on main.

Files with missing lines Patch % Lines
python/graphscope/client/session.py 7.14% 26 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #4237       +/-   ##
===========================================
+ Coverage   46.22%   60.99%   +14.76%     
===========================================
  Files         174      126       -48     
  Lines       16181    13264     -2917     
===========================================
+ Hits         7480     8090      +610     
+ Misses       8701     5174     -3527     
Flag Coverage Δ
65.61% <10.34%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
python/graphscope/__init__.py 83.33% <100.00%> (+0.47%) ⬆️
python/graphscope/client/session.py 65.32% <7.14%> (-0.73%) ⬇️

... and 114 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a82483...976650c. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants