Skip to content

Commit

Permalink
Merge branch 'main' into sm_image_dgl_2.4
Browse files Browse the repository at this point in the history
  • Loading branch information
jalencato authored Sep 30, 2024
2 parents f9925e2 + 7af14b9 commit d67e009
Show file tree
Hide file tree
Showing 36 changed files with 2,421 additions and 266 deletions.
2 changes: 1 addition & 1 deletion .github/workflow_scripts/e2e_gb_check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@ GS_HOME=$(pwd)
# Install graphstorm from checked out code
pip3 install "$GS_HOME" --upgrade

bash ./tests/end2end-tests/setup.sh
bash ./tests/end2end-tests/create_data.sh
bash ./tests/end2end-tests/graphbolt-gs-integration/graphbolt-graph-construction.sh
bash ./tests/end2end-tests/graphbolt-gs-integration/graphbolt-training-inference.sh
1 change: 1 addition & 0 deletions .github/workflows/continuous-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,7 @@ jobs:
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: arn:aws:iam::698571788627:role/github-oidc-role
role-duration-seconds: 14400
aws-region: us-east-1
- name: Checkout repository
uses: actions/checkout@v3
Expand Down
22 changes: 18 additions & 4 deletions docs/source/advanced/link-prediction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Optimizing model performance
----------------------------
GraphStorm incorporates three ways of improving model performance of link
prediction. Firstly, GraphStorm avoids information leak in model training.
Secondly, to better handle heterogeneous graphs, GraphStorm provides three ways
to compute link prediction scores: dot product, DistMult and RotatE.
Secondly, to better handle heterogeneous graphs, GraphStorm provides four ways
to compute link prediction scores: dot product, DistMult, TransE, and RotatE.
Thirdly, GraphStorm provides two options to compute training losses, i.e.,
cross entropy loss and contrastive loss. The following sub-sections provide more details.

Expand All @@ -32,7 +32,7 @@ GraphStorm provides supports to avoid theses problems:

Computing Link Prediction Scores
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
GraphStorm provides three ways to compute link prediction scores: Dot Product, DistMult and RotatE.
GraphStorm provides four ways to compute link prediction scores: Dot Product, DistMult, TransE, and RotatE.

* **Dot Product**: The Dot Product score function is as:

Expand All @@ -53,7 +53,21 @@ GraphStorm provides three ways to compute link prediction scores: Dot Product, D
The ``relation_emb`` values are initialized from a uniform distribution
within the range of ``(-gamma/hidden_size, gamma/hidden_size)``,
where ``gamma`` and ``hidden_size`` are hyperparameters defined in
:ref:`Model Configurations<configurations-model>`。
:ref:`Model Configurations<configurations-model>`.

* **TransE**: The TransE score function is as:

.. math::
score = gamma - \|h+r-t\|^{frac{1}{2}} \text{or} gamma - \|h+r-t\|
where the ``head_emb`` is the node embedding of the head node,
the ``tail_emb`` is the node embedding of the tail node,
the ``relation_emb`` is the relation embedding of the specific edge type.
The ``relation_emb`` values are initialized from a uniform distribution
within the range of ``(-gamma/(hidden_size/2), gamma/(hidden_size/2))``,
where ``gamma`` and ``hidden_size`` are hyperparameters defined in
:ref:`Model Configurations<configurations-model>`.
To learn more information about TransE, please refer to `the DGLKE doc <https://dglke.dgl.ai/doc/kg.html#transe>`__.

* **RotatE**: The RotatE score function is as:

Expand Down
6 changes: 2 additions & 4 deletions docs/source/api/references/graphstorm.eval.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,6 @@ Evaluators

GSgnnClassificationEvaluator
GSgnnRegressionEvaluator
GSgnnMrrLPEvaluator
GSgnnPerEtypeMrrLPEvaluator
GSgnnHitsLPEvaluator
GSgnnPerEtypeHitsLPEvaluator
GSgnnLPEvaluator
GSgnnPerEtypeLPEvaluator
GSgnnRconstructFeatRegScoreEvaluator
4 changes: 4 additions & 0 deletions docs/source/api/references/graphstorm.model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,7 @@ Decoder Layer
LinkPredictContrastiveDistMultDecoder
LinkPredictRotatEDecoder
LinkPredictContrastiveRotatEDecoder
LinkPredictWeightedRotatEDecoder
LinkPredictTransEDecoder
LinkPredictContrastiveTransEDecoder
LinkPredictWeightedTransEDecoder
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Full argument list of the ``gconstruct.construct_graph`` command
* **-\-num-processes-for-nodes**: the number of processes to process node data simultaneously. Increase this number can speed up node data processing.
* **-\-num-processes-for-edges**: the number of processes to process edge data simultaneously. Increase this number can speed up edge data processing.
* **-\-output-dir**: (**Required**) the path of the output data files.
* **-\-graph-name**: (**Required**) the name assigned for the graph.
* **-\-graph-name**: (**Required**) the name assigned for the graph. The graph name must adhere to the `Python identifier naming rules<https://docs.python.org/3/reference/lexical_analysis.html#identifiers>`_ with the exception that hyphens (``-``) are permitted and the name can start with numbers.
* **-\-remap-node-id**: boolean value to decide whether to rename node IDs or not. Adding this argument will set it to be true, otherwise false.
* **-\-add-reverse-edges**: boolean value to decide whether to add reverse edges for the given graph. Adding this argument sets it to true; otherwise, it defaults to false. It is **strongly** suggested to include this argument for graph construction, as some nodes in the original data may not have in-degrees, and thus cannot update their presentations by aggregating messages from their neighbors. Adding this arugment helps prevent this issue.
* **-\-output-format**: the format of constructed graph, options are ``DGL``, ``DistDGL``. Default is ``DistDGL``. It also accepts multiple graph formats at the same time separated by an space, for example ``--output-format "DGL DistDGL"``. The output format is explained in the :ref:`Output <gcon-output-format>` section above.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -482,12 +482,12 @@ Link Prediction Task
- Yaml: ``num_negative_edges_eval: 1000``
- Argument: ``--num-negative-edges-eval 1000``
- Default value: ``1000``
- **lp_decoder_type**: Set the decoder type for loss function in Link Prediction tasks. Currently GraphStorm support ``dot_product``, ``distmult`` and ``rotate``.
- **lp_decoder_type**: Set the decoder type for loss function in Link Prediction tasks. Currently GraphStorm support ``dot_product``, ``distmult``, ``rotate``, ``transe_l1``, and ``transe_l2``.

- Yaml: ``lp_decoder_type: dot_product``
- Argument: ``--lp-decoder-type dot_product``
- Default value: ``distmult``
- **gamma**: Set the value of the hyperparameter denoted by the symbol gamma. Gamma is used in the following cases: i/ focal loss for binary classification ii/ DistMult score function for link prediction and iii/ RotatE score function for link prediction.
- **gamma**: Set the value of the hyperparameter denoted by the symbol gamma. Gamma is used in the following cases: i/ focal loss for binary classification ii/ DistMult score function for link prediction, iii/ TransE score function for link prediction, and iv/ RotatE score function for link prediction.

- Yaml: ``gamma: 10.0``
- Argument: ``--gamma 10.0``
Expand Down Expand Up @@ -586,4 +586,4 @@ GraphStorm provides a set of parameters to control GNN distillation.

- Yaml: ``max_seq_len: 1024``
- Argument: ``--max-seq-len 1024``
- Default value: ``1024``
- Default value: ``1024``
Original file line number Diff line number Diff line change
Expand Up @@ -388,7 +388,7 @@ The rest of the arguments are passed on to ``sagemaker_train.py`` or ``sagemaker

* **--task-type**: Task type.
* **--graph-data-s3**: S3 location of the input graph.
* **--graph-name**: Name of the input graph.
* **--graph-name**: Name of the input graph. The graph name must adhere to the `Python identifier naming rules<https://docs.python.org/3/reference/lexical_analysis.html#identifiers>`_ with the exception that hyphens (``-``) are permitted and the name can start with numbers.
* **--yaml-s3**: S3 location of yaml file for training and inference.
* **--custom-script**: Custom training script provided by customers to run customer training logic. This should be a path to the Python script within the Docker image.
* **--output-emb-s3**: S3 location to store GraphStorm generated node embeddings. This is an inference only argument.
Expand Down
2 changes: 1 addition & 1 deletion graphstorm-processing/docker/push_gsprocessing_image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ parse_params() {
IMAGE='graphstorm-processing'
VERSION=`poetry version --short`
LATEST_VERSION=${VERSION}
REGION=$(aws configure get region)
REGION=$(aws configure get region) || REGION=""
REGION=${REGION:-us-west-2}
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
ARCH='x86_64'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
import json
import logging
import os
import re
from pathlib import Path
import tempfile
import time
Expand Down Expand Up @@ -540,7 +541,11 @@ def parse_args() -> argparse.Namespace:
parser.add_argument(
"--graph-name",
type=str,
help="Name for the graph being processed.",
help="Name for the graph being processed."
"The graph name must adhere to the Python "
"identifier naming rules with the exception "
"that hyphens (-) are permitted and the name "
"can start with numbers",
required=False,
default=None,
)
Expand All @@ -564,6 +569,33 @@ def parse_args() -> argparse.Namespace:
return parser.parse_args()


def check_graph_name(graph_name):
"""Check whether the graph name is a valid graph name.
We enforce that the graph name adheres to the Python
identifier naming rules as in
https://docs.python.org/3/reference/lexical_analysis.html#identifiers,
with the exception that hyphens (-) are permitted
and the name can start with numbers.
This helps avoid the cases when an invalid graph name,
such as `/graph`, causes unexpected errors.
Note: Same as graphstorm.utils.check_graph_name.
Parameter
---------
graph_name: str
Graph Name.
"""
gname = re.sub(r"^\d+", "", graph_name)
assert gname.replace("-", "_").isidentifier(), (
"GraphStorm expects the graph name adheres to the Python"
"identifier naming rules with the exception that hyphens "
"(-) are permitted and the name can start with numbers. "
f"Got: {graph_name}"
)


def main():
"""Main entry point for GSProcessing"""
# Allows us to get typed arguments from the command line
Expand All @@ -572,6 +604,7 @@ def main():
level=gsprocessing_args.log_level,
format="[GSPROCESSING] %(asctime)s %(levelname)-8s %(message)s",
)
check_graph_name(gsprocessing_args.graph_name)

# Determine execution environment
if os.path.exists("/opt/ml/config/processingjobconfig.json"):
Expand Down
3 changes: 2 additions & 1 deletion python/graphstorm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@
from .gsf import create_builtin_edge_model
from .gsf import create_builtin_node_model
from .gsf import (create_task_decoder,
create_evaluator)
create_evaluator,
create_lp_evaluator)

from .gsf import (create_builtin_node_decoder,
create_builtin_edge_decoder,
Expand Down
4 changes: 3 additions & 1 deletion python/graphstorm/config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@

from .config import (BUILTIN_LP_DOT_DECODER,
BUILTIN_LP_DISTMULT_DECODER,
BUILTIN_LP_ROTATE_DECODER)
BUILTIN_LP_ROTATE_DECODER,
BUILTIN_LP_TRANSE_L1_DECODER,
BUILTIN_LP_TRANSE_L2_DECODER)
from .config import SUPPORTED_LP_DECODER

from .config import (GRAPHSTORM_MODEL_EMBED_LAYER,
Expand Down
Loading

0 comments on commit d67e009

Please sign in to comment.