Graph Definition #558

RasmusOrsoe · 2023-07-18T13:35:45Z

This PR addresses the ongoing discussion in #462 and #521 (which has been a roadblock for a while) by changing Model such that it now consists of the modules

Model = [GraphDefinition, GNN, Task]

Where GraphDefinition is a single, problem/model dependent class that contains all the code responsible for data representations.

TLDR: Model, Dataset and GraphNeTI3Module now depends on GraphDefinition, which allows us to easily represent data as sequences, images, or whatever your heart desires. This change is breaking; older config files and pickled models are not compatible with these changes, but state_dicts are.

Conceptually, GraphDefinition contains all the code that alters the raw data from Dataset before it's passed to GNN. It's a single, swapable module that can be passed to Dataset and deployment modules. GraphDefinition consists of multiple submodules, and the data flow is GraphDefinition = [Detector, NodeDefinition, EdgeDefinition] and can be seen here. The definition exists at a point in the dataflow where events are unbatched, meaning that the construction of data representations can be done on CPU and in parallel, before it's batched and sent to the GPU. That means that the sequence creation included in #521 becomes much simpler and likely faster @Aske-Rosted, and should also be useful for the transformer exploration by @MoustHolmes.

The modules are defined as

NodeDefinition :
A generic class that defines what a node represents. Problem-specific versions can be implemented by overwriting the abstract method

def _construct_nodes(self, x: torch.tensor) -> Data:
        """Construct nodes from raw node features ´x´.

        Args:
            x: standardized node features with shape ´[num_pulses, d]´,
            where ´d´ is the number of node features.

        Returns:
            graph: graph without edges.
        """

_construct_nodes is the playground we've been missing for a while; it gives us the freedom to fully define exactly how we want the data to be structured for our Models. Here, one can use nodes to represent DOMs (by using Coarsening or some other method), create images for CNNs, define sequences or other forms of data representations. Our standard of representing pulses as nodes is just

class NodesAsPulses(NodeDefinition):
    """Represent each measured pulse of Cherenkov Radiation as a node."""

    def _construct_nodes(self, x: torch.tensor) -> Data:
        return Data(x=x)

EdgeDefinition:
A generic class that defines how edges are drawn between nodes in the graph. This is essentially a refactor of our GraphBuilder. One can create problem-specific implementations by overwriting the abstract method

def _construct_edges(self, graph: Data) -> Data:
        """Construct edges and assign them to graph. I.e. ´graph.edge_index = edge_index´.

        Args:
            graph: graph without edges

        Returns:
            graph: graph with edges assigned.
        """

Detector:
Virtually unchanged from it's known form. In charge of standardizing data and is now able to work on a subset of the feature space that it is defined on. I cleaned the class up a little bit. In the future, it will hold detector-specific geometry tables as mentioned in #462.

Our usual k-nn graph with nodes representing pulses can then be created like so:

from graphnet.models.graphs import GraphDefinition
from graphnet.models.graphs.nodes import NodesAsPulses
from graphnet.models.graphs.edges import KNNEdges
from graphnet.models.detector.prometheus import Prometheus

graph_definition = GraphDefinition(node_definiton = NodesAsPulses(nb_nearest_neighbours=8),
                                   edge_definiton = KNNEdges(),
                                   detector = Prometheus(),
                                     )

Alternatively, you can also just import this graph definition directly, as it's included in the PR:

from graphnet.models.graphs import KNNGraph

graph_definition = KNNGraph(
        detector=Prometheus(),
        node_definition=NodesAsPulses(),
        nb_nearest_neighbours=8,
        node_feature_names=features,
    )

It is the problem-specific implementation of a graph definition that defines the number of input parameters to our GNNs, available through graph_definition.nb_outputs. When we instantiate a Model, the syntax is now:

 gnn = DynEdge(
        nb_inputs=graph_definition.nb_outputs,
        global_pooling_schemes=["min", "max", "mean", "sum"],
    )
    task = ..
model = StandardModel(
        graph_definition=graph_definition,
        gnn=gnn,
        tasks=[task],
        ...)

Other things to note:

Dataset is now simpler, as graph-altering code has been moved to GraphDefinition
Changes are compatible with the Config system.
Some folder structure has been changed in graphnet.data to avoid circular imports.

RasmusOrsoe · 2023-07-28T12:48:25Z

I have addressed the initial comments from @AMHermansen, refactored the last few detectors and introduced GraphDefinition in the deployment code. In relation to this, I've created #560 which contains a list of to-do items for the deployment modules which fall outside the scope of this PR.

What's left is to update getting_started.md and considerations on default values. I'll wait with this last step until @Aske-Rosted has had a chance to give this a look after the ICRC conference.

Aske-Rosted · 2023-08-03T02:53:14Z

configs/datasets/test_data_sqlite.yml

+features: [sensor_pos_x, sensor_pos_y, sensor_pos_z, t]
+graph_definition:
+  arguments:
+    columns: [0, 1, 2]


I think the naming of this "columns" arguments (which I understand as the columns included in the "distance" calculation of your edge definition) becomes very vague. At first glance it hard to tell what this feature does. Something like "edge_defining_columns" would be more descriptive I think.

I agree that the name of that argument is not unambiguous, but I also fail to find better alternatives that are not very long. The name of this argument is the same as what we used to call it in the KNNGraphBuilder (see here) and I do think that the doc string for this argument (see here) is pretty clear. So even though it's a bit challenging to understand what that argument does when one reads the config file, I think we should keep it as-is and refer users to the docs instead (which is the intended usage anyway). Is that OK with you? @Aske-Rosted

Completely fine.

Aske-Rosted · 2023-08-03T02:57:21Z

src/graphnet/data/dataset/parquet/__init__.py

+
+    torch.multiprocessing.set_sharing_strategy("file_system")
+
+del has_torch_package


Is there a reason for deleting the has_torch_package here but keeping it in src/graphnet/data/dataset/sqlite/__init__.py?

I actually don't know. I did not add this line; the difference you point out here is present in the main branch currently also. I have added the del statement to the sqlite part now too.

Aske-Rosted · 2023-08-03T03:08:57Z

src/graphnet/utilities/config/model_config.py

@@ -133,7 +133,7 @@ def _construct_model(
            fn_kwargs={"trust": trust},
        )

-        # Construct model based on arguments
+        # Construct model based on


was this change intentional?

Nopes. Fixed.

Aske-Rosted · 2023-08-03T03:19:17Z

I have just finished looking through everything, I think it looks great! I also believe that this will add the necessary flexibility for a lot of the things I've been trying to implement, where I previously had to go make changes in the dataset class e.g. #521 . I am looking forward to trying it out.

AMHermansen

Hello @RasmusOrsoe I've noticed some more minor things.

AMHermansen · 2023-08-04T07:14:49Z

src/graphnet/models/detector/detector.py

+    ) -> Data:
+        for idx, feature in enumerate(node_feature_names):
+            try:
+                node_features[:, idx] = self.feature_map()[feature](  # type: ignore


I'm not sure if you're supposed to call self.feature_map, since it is classified as a property?

Hey. Thanks for pointing this out. In fact, the property decorator in detector.py is poorly placed - it is actually completely redundant. I've removed it.

AMHermansen · 2023-08-04T07:16:40Z

src/graphnet/models/graphs/graph_definition.py

+            # Assume all features in Detector is used.
+            node_feature_names = list(self._detector.feature_map().keys())  # type: ignore
+        self._node_feature_names = node_feature_names
+        if dtype is None:


If I understand this code correctly you're just getting the same behavior as if you changed the default value of dtype in the constructor to torch.float instead of None. But in a slightly more more complicated way :)

You're absolutely right. Fixed :-)

RasmusOrsoe · 2023-08-10T10:52:31Z

Thank you very much for your comments. I believe I have now addressed all of them, and I have updated the GETTING_STARTED.MD accordingly. @Aske-Rosted @AMHermansen.

AMHermansen

Great work!

…n_debug Graph Definition

RasmusOrsoe added 21 commits June 26, 2023 17:02

added GraphDefinition, EdgeDefinition

ae4bada

refactor of Detector

08cd6b4

add KNNGraph

238324c

polish

ea52b8d

polish

f7e93b2

replace Detector with GraphDefinition for StandardModel

8a1cace

Simplify Dataset, polish

ccf9d32

Restructure

b4b1023

simplify imports

c18e5e1

simplify imports

99d9b51

remove redundant import

9b9434d

refactor of promethus detector class

8984a01

refactor of promethus detector class

d5c8ade

refactor training example without configs

6d15019

polish

fa30515

polish

4a994bc

revert

0a21d75

Merge branch 'main' of https://github.com/RasmusOrsoe/graphnet

56474ca

revert

50e6afe

Merge branch 'main' of https://github.com/RasmusOrsoe/graphnet

469d396

merge_conflict_fix

ce248de

RasmusOrsoe marked this pull request as draft July 18, 2023 13:39

RasmusOrsoe added 8 commits July 18, 2023 15:48

Update tito example

118b580

update ensemble dataset example

639bf3a

mypy ignore Detector

a22a794

mypy ignore detector

3e01dcd

update dataloader unit tests

fed3900

update dataset unit tests, pipeline

c417ce8

update icetray examples

6627d62

update pipeline example

365c55b

RasmusOrsoe added 3 commits July 28, 2023 12:24

remove cross-inheritance of IceCube Detector classes

a7216e4

remove redundant imports

dcd00c5

add GraphDefinition to deployment modules

cec8c48

Aske-Rosted reviewed Aug 3, 2023

View reviewed changes

AMHermansen reviewed Aug 4, 2023

View reviewed changes

AMHermansen mentioned this pull request Aug 4, 2023

Refactor detector to allow for different standardizations of input features #562

Open

RasmusOrsoe added 10 commits August 10, 2023 09:47

add deletion of has_torch_package

cab8c74

fix comment in model_config.py

9eb948a

test default value of graph_definition dtype

5f1c5fe

delete if statement to handle None dtype arg

10e2eda

remove property dec. for feature_map in Detector

af94365

update graphnet overview in getting_started.md

df68aa3

Update Dataset and Dataloader section of getting_started.md

76140f2

update example DatasetConfig in getting_started.md

f20f2ed

final update of getting_started.md

f174cf3

update

3fb96c8

RasmusOrsoe requested review from AMHermansen and Aske-Rosted August 14, 2023 07:08

AMHermansen approved these changes Aug 14, 2023

View reviewed changes

RasmusOrsoe marked this pull request as ready for review August 14, 2023 07:42

RasmusOrsoe merged commit 268a17d into graphnet-team:main Aug 14, 2023
11 checks passed

AMHermansen mentioned this pull request Aug 21, 2023

Changed dtype in config files #569

Merged

AMHermansen mentioned this pull request Sep 1, 2023

Update CONTRIBUTING.md #575

Merged

RasmusOrsoe added a commit to RasmusOrsoe/graphnet that referenced this pull request Oct 25, 2023

Merge pull request graphnet-team#558 from RasmusOrsoe/graph_definitio…

2f444cf

…n_debug Graph Definition

AMHermansen mentioned this pull request Dec 19, 2023

Implement class to crop pulsemaps to maximum length #648

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph Definition #558

Graph Definition #558

RasmusOrsoe commented Jul 18, 2023 •

edited

Loading

RasmusOrsoe commented Jul 28, 2023

Aske-Rosted Aug 3, 2023

RasmusOrsoe Aug 10, 2023

Aske-Rosted Aug 10, 2023

Aske-Rosted Aug 3, 2023

RasmusOrsoe Aug 10, 2023

Aske-Rosted Aug 3, 2023

RasmusOrsoe Aug 10, 2023

Aske-Rosted commented Aug 3, 2023

AMHermansen left a comment

AMHermansen Aug 4, 2023

RasmusOrsoe Aug 10, 2023

AMHermansen Aug 4, 2023

RasmusOrsoe Aug 10, 2023

RasmusOrsoe commented Aug 10, 2023

AMHermansen left a comment


		torch.multiprocessing.set_sharing_strategy("file_system")

		del has_torch_package

Graph Definition #558

Graph Definition #558

Conversation

RasmusOrsoe commented Jul 18, 2023 • edited Loading

RasmusOrsoe commented Jul 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aske-Rosted commented Aug 3, 2023

AMHermansen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RasmusOrsoe commented Aug 10, 2023

AMHermansen left a comment

Choose a reason for hiding this comment

RasmusOrsoe commented Jul 18, 2023 •

edited

Loading