Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduled NEAT run fails due to ~~ValueError~~ KeyError when attempting to load graph file(s) #31

Open
caufieldjh opened this issue May 3, 2022 · 8 comments · Fixed by #46

Comments

@caufieldjh
Copy link
Contributor

During scheduled NEAT run:

16:30:56  Traceback (most recent call last):
16:30:56    File "/home/jenkinsuser/anaconda3/bin/neat", line 8, in <module>
16:30:56      sys.exit(cli())
16:30:56    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
16:30:56      return self.main(*args, **kwargs)
16:30:56    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
16:30:56      rv = self.invoke(ctx)
16:30:56    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
16:30:56      return _process_result(sub_ctx.command.invoke(sub_ctx))
16:30:56    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
16:30:56      return ctx.invoke(self.callback, **ctx.params)
16:30:56    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
16:30:56      return callback(*args, **kwargs)
16:30:56    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/neat/cli.py", line 55, in run
16:30:56      make_node_embeddings(**node_embedding_args)
16:30:56    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/neat/graph_embedding/graph_embedding.py", line 49, in make_node_embeddings
16:30:56      graph: Graph = Graph.from_csv(**main_graph_args)
16:30:56  ValueError: Cannot open the file at merged-kg_nodes-prefixcats.tsv

I suspect this is an issue with KG-Ontoml configuration rather than something NEAT is misinterpreting, but there may be some improvement for NEAT in here too.

@caufieldjh
Copy link
Contributor Author

The above error turned out to be just a missing file in the compressed source (fixed), but now there's a seemingly related issue:

16:00:32  Traceback (most recent call last):
16:00:32    File "/home/jenkinsuser/anaconda3/bin/neat", line 8, in <module>
16:00:32      sys.exit(cli())
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
16:00:32      return self.main(*args, **kwargs)
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
16:00:32      rv = self.invoke(ctx)
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
16:00:32      return _process_result(sub_ctx.command.invoke(sub_ctx))
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
16:00:32      return ctx.invoke(self.callback, **ctx.params)
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
16:00:32      return callback(*args, **kwargs)
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/neat/cli.py", line 91, in run
16:00:32      edge_method=yhelp.get_edge_embedding_method(classifier),
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/neat/link_prediction/model.py", line 111, in make_train_valid_data
16:00:32      negative_graph=graphs["neg_training"],
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/embiggen/transformers/link_prediction_transformer.py", line 90, in transform
16:00:32      negative_edge_embedding = self._transformer.transform(negative_graph)
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/embiggen/transformers/graph_transformer.py", line 103, in transform
16:00:32      return self._transformer.transform(sources, destinations)
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/embiggen/transformers/edge_transformer.py", line 123, in transform
16:00:32      self._transformer.transform(sources),
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/embiggen/transformers/node_transformer.py", line 106, in transform
16:00:32      return self._embedding.loc[nodes].to_numpy()
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 931, in __getitem__
16:00:32      return self._getitem_axis(maybe_callable, axis=axis)
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1153, in _getitem_axis
16:00:32      return self._getitem_iterable(key, axis=axis)
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1093, in _getitem_iterable
16:00:32      keyarr, indexer = self._get_listlike_indexer(key, axis)
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1314, in _get_listlike_indexer
16:00:32      self._validate_read_indexer(keyarr, indexer, axis)
16:00:32    File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1374, in _validate_read_indexer
16:00:32      raise KeyError(f"None of [{key}] are in the [{axis_name}]")
16:00:32  KeyError: "None of [Index(['OBO:FBbt_00004566-biolink:subclass_of-OBO:FBbt_00002984'], dtype='object', name=0)] are in the [index]"

@caufieldjh
Copy link
Contributor Author

See also Knowledge-Graph-Hub/NEAT-kghub-scheduler#14 as I'm not sure what the origin of this issue is yet.

@caufieldjh caufieldjh changed the title Scheduled NEAT run fails due to ValueError when attempting to load graph file(s) Scheduled NEAT run fails due to ~~ValueError~~ KeyError when attempting to load graph file(s) May 11, 2022
@caufieldjh
Copy link
Contributor Author

The above run was with ensmallen==0.7.0.dev7 and embiggen==0.10.0.dev3

@caufieldjh
Copy link
Contributor Author

If the fix in #35 doesn't work, try removing neg_training graph arg from NEAT config

@caufieldjh
Copy link
Contributor Author

caufieldjh commented May 27, 2022

The extra training and validation graphs still aren't making it into the compressed KG file.
Here's what happens in the Jenkins run:

17:02:36  + python3.8 generate_subgraphs.py --nodes merged-kg_nodes.tsv --edges merged-kg_edges.tsv
17:02:46  /var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.8/site-packages/ensmallen/__init__.py:31: UserWarning: Ensmallen is compiled for the Intel Haswell architecture (2013).On the current machine, the flags '['avx2', 'bmi2', 'popcnt']' are required but '{'avx2', 'bmi2'}' are not available.
17:02:46  The library will use a slower but more compatible version (Intel Core2 2006).
17:02:46    warnings.warn(
17:02:46  Generating training and validation subgraphs...
17:02:46  Complete.
[Pipeline] sh
17:02:47  + tar -rvfz data/merged/merged-kg.tar.gz pos_valid_edges.tsv neg_train_edges.tsv neg_valid_edges.tsv
17:02:47  data/merged/merged-kg.tar.gz
17:02:47  pos_valid_edges.tsv
17:02:47  neg_train_edges.tsv
17:02:47  neg_valid_edges.tsv

But then kg-phenio.tar.gz (like https://kg-hub.berkeleybop.io/kg-phenio/20220525/kg-phenio.tar.gz) only contains node and edgefiles.
(It doesn't contain the prefixcats version of the graph nodes, either.)
There is a later statement copying this file, but clearly it does not contain the expected contents.
Is it being created twice?

@caufieldjh
Copy link
Contributor Author

The file at https://kg-hub.berkeleybop.io/kg-phenio/20220601/kg-phenio.tar.gz contains the following:

data/merged/merged-kg_nodes.tsv
data/merged/merged-kg_edges.tsv
merged_graph_stats_20220601.yaml
merged-kg_nodes-prefixcats.tsv
pos_valid_edges.tsv
neg_train_edges.tsv
neg_valid_edges.tsv

But then NEAT can't find the validation set:

[2022-06-03T16:12:28.378Z] Traceback (most recent call last):
[2022-06-03T16:12:28.378Z]   File "/home/jenkinsuser/anaconda3/bin/neat", line 8, in <module>
[2022-06-03T16:12:28.378Z]     sys.exit(cli())
[2022-06-03T16:12:28.378Z]   File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
[2022-06-03T16:12:28.378Z]     return self.main(*args, **kwargs)
[2022-06-03T16:12:28.378Z]   File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
[2022-06-03T16:12:28.378Z]     rv = self.invoke(ctx)
[2022-06-03T16:12:28.378Z]   File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
[2022-06-03T16:12:28.378Z]     return _process_result(sub_ctx.command.invoke(sub_ctx))
[2022-06-03T16:12:28.378Z]   File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
[2022-06-03T16:12:28.378Z]     return ctx.invoke(self.callback, **ctx.params)
[2022-06-03T16:12:28.378Z]   File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
[2022-06-03T16:12:28.378Z]     return callback(*args, **kwargs)
[2022-06-03T16:12:28.378Z]   File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/neat/cli.py", line 90, in run
[2022-06-03T16:12:28.378Z]     edge_method=yhelp.get_edge_embedding_method(classifier),
[2022-06-03T16:12:28.378Z]   File "/home/jenkinsuser/anaconda3/lib/python3.7/site-packages/neat/link_prediction/model.py", line 99, in make_train_valid_data
[2022-06-03T16:12:28.378Z]     graphs[name] = Graph.from_csv(**graph_args)
[2022-06-03T16:12:28.378Z] ValueError: Cannot open the file at pos_valid_edges.tsv

@caufieldjh caufieldjh reopened this Jun 6, 2022
@caufieldjh
Copy link
Contributor Author

caufieldjh commented Jun 7, 2022

KG-Phenio Jenkins build 20 completed without issue, and it looks like it contains all the necessary components:

$ wget https://kg-hub.berkeleybop.io/kg-phenio/20220606/kg-phenio.tar.gz
$ tar xvzf kg-phenio.tar.gz
data/merged/merged-kg_nodes.tsv
data/merged/merged-kg_edges.tsv
merged_graph_stats_20220606.yaml
merged-kg_nodes-prefixcats.tsv
pos_valid_edges.tsv
neg_train_edges.tsv
neg_valid_edges.tsv

There's an issue here (it was present in the last build, too) - the merged KG is in a subdir.
It should be created with something like

tar czvf test.tar.gz -C data/merged/ .

(or move all the files to the current working directory before building the tar.gz)

Will also need to verify that NEAT is looking in the same directory for all input files.

@caufieldjh
Copy link
Contributor Author

This seems to be resolved but can't verify while blocked by Knowledge-Graph-Hub/NEAT-kghub-scheduler#18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant