Merge branch 'release/2.0.1'

Merging after release.
Accenture · Jul 12, 2023 · 5dc0c40 · 5dc0c40
2 parents 71716ac + bbbb9ed
commit 5dc0c40
Show file tree

Hide file tree

Showing 20 changed files with 332 additions and 166 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -106,28 +106,28 @@ workflows:
             branches:
               only:  
                 - main
-                - ampligraph2/develop
+                - develop
       - pip-check:
           filters:
             branches:
               only:  
                 - main
-                - ampligraph2/develop      
+                - develop
       - lint:
           filters:
             branches:
               only:  
                 - main
-                - ampligraph2/develop      
+                - develop
       - docs:
           filters:
             branches:
               only:  
                 - main
-                - ampligraph2/develop      
+                - develop
       - test:
           filters:
             branches:
               only:  
                 - main
-                - ampligraph2/develop       
+                - develop
diff --git a/README.md b/README.md
@@ -62,69 +62,117 @@ AmpliGraph includes the following submodules:
 * Linux, macOS, Windows
 * Python ≥ 3.8
 
-#### Provision a Virtual Environment
+### Provision a Virtual Environment
 
-Create and activate a virtual environment (conda)
+To provision a virtual environment for installing AmpliGraph, any option can work; here we will give provide the
+instruction for using `venv` and `Conda`.
+
+#### venv
+
+The first step is to create and activate the virtual environment.
+
+```
+python3.8 -m venv PATH/TO/NEW/VIRTUAL_ENVIRONMENT
+source PATH/TO/NEW/VIRTUAL_ENVIRONMENT/bin/activate
+```
+
+Once this is done, we can proceed with the installation of TensorFlow 2:
+
+```
+pip install "tensorflow==2.9.0"
+```
+
+If you are installing Tensorflow on MacOS, instead of the following please use:
+
+```
+pip install "tensorflow-macos==2.9.0"
+```
+
+**IMPORTANT**: the installation of TensorFlow can be tricky on Mac OS with the Apple silicon chip. Though `venv` can
+provide a smooth experience, we invite you to refer to the [dedicated section](#install-tensorflow-2-for-mac-os-m1-chip)
+down below and consider using `conda` if some issues persist in alignment with the
+[Tensorflow Plugin page on Apple developer site](https://developer.apple.com/metal/tensorflow-plugin/).
+
+
+#### Conda
+
+The first step is to create and activate the virtual environment.
 
 ```
 conda create --name ampligraph python=3.8
 source activate ampligraph
 ```
 
-#### Install TensorFlow
+Once this is done, we can proceed with the installation of TensorFlow 2, which can be done through `pip` or `conda`.
 
-AmpliGraph 2 is built on TensorFlow 2.x.
-Install from pip or conda:
+```
+pip install "tensorflow==2.9.0"
 
-**CPU-only**
+or 
 
+conda install "tensorflow==2.9.0"
 ```
-pip install "tensorflow>=2.9"
 
-or 
+#### Install TensorFlow 2 for Mac OS M1 chip
 
-conda install tensorflow'>=2.9'
+When installing TensorFlow 2 for Mac OS with Apple silicon chip we recommend to use a conda environment. 
+
+```
+conda create --name ampligraph python=3.8
+source activate ampligraph
 ```
 
-**Install TensorFlow 2 for Mac OS M1 chip**
+After having created and activated the virtual environment, run the following to install Tensorflow. 
 
 ```
 conda install -c apple tensorflow-deps
-pip install --user tensorflow-macos==2.10
+pip install --user tensorflow-macos==2.9.0
 pip install --user tensorflow-metal==0.6
 ```
 
-In case of problems with installation refer to [Tensorflow Plugin page on Apple developer site](https://developer.apple.com/metal/tensorflow-plugin/).
+In case of problems with the installation or for further details, refer to
+[Tensorflow Plugin page](https://developer.apple.com/metal/tensorflow-plugin/) on the official Apple developer website.
 
 ### Install AmpliGraph
 
+Once the installation of Tensorflow is complete, we can proceed with the installation of AmpliGraph.
 
-Install the latest stable release from pip:
+To install the latest stable release from pip:
 
 ```
 pip install ampligraph
 ```
 
-If instead you want the most recent development version, you can clone the repository
-and install from source (your local working copy will be on the latest commit on the `develop` branch).
-The code snippet below will install the library in editable mode (`-e`):
+To sanity check the installation, run the following:
+
+```python
+>>> import ampligraph
+>>> ampligraph.__version__
+'2.0.1'
+```
+
+If instead you want the most recent development version, you can clone the repository from
+[GitHub](https://github.com/Accenture/AmpliGraph.git), install AmpliGraph from source and checkout the `develop`
+branch. In this way, your local working copy will be on the latest commit on the `develop` branch.
 
 ```
 git clone https://github.com/Accenture/AmpliGraph.git
 cd AmpliGraph
+git checkout develop
 pip install -e .
 ```
+Notice that the code snippet above installs the library in editable mode (`-e`).
 
-
-### Sanity Check
+To sanity check the installation run the following:
 
 ```python
 >>> import ampligraph
 >>> ampligraph.__version__
-'2.0.0'
+'2.0-dev'
 ```
 
 
+
 ## Predictive Power Evaluation (MRR Filtered)
 
 AmpliGraph includes implementations of TransE, DistMult, ComplEx, HolE, ConvE, and ConvKB.
@@ -134,9 +182,9 @@ Their predictive power is reported below and compared against the state-of-the-a
 |                           | FB15K-237 | WN18RR   | YAGO3-10 | FB15k      | WN18      |
 |---------------------------|-----------|----------|----------|------------|-----------|
 | Literature Best           | **0.35*** | 0.48*    | 0.49*    | **0.84**** | **0.95*** |
-| TransE (AmpliGraph 2)     | 0.31      | 0.22     | 0.50     | 0.62       | 0.64      |
+| TransE (AmpliGraph 2)     | 0.31      | 0.22     | 0.50     | 0.62       | 0.66      |
 | DistMult (AmpliGraph 2)   | 0.30      | 0.47     | 0.48     | 0.71       | 0.82      |
-| ComplEx  (AmpliGraph 2)   | 0.31      | 0.50     | 0.49     | 0.73       | 0.94      |
+| ComplEx  (AmpliGraph 2)   | 0.31      | **0.51** | 0.49     | 0.73       | 0.94      |
 | HolE (AmpliGraph 2)       | 0.30      | 0.47     | 0.47     | 0.73       | 0.94      |
 | TransE (AmpliGraph 1)     | 0.31      | 0.22     | **0.51** | 0.63       | 0.66      |
 | DistMult (AmpliGraph 1)   | 0.31      | 0.47     | 0.50     | 0.78       | 0.82      |

diff --git a/ampligraph/__init__.py b/ampligraph/__init__.py
@@ -13,7 +13,7 @@
 
 tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
 
-__version__ = '2.0.0'
+__version__ = '2.0.1'
 __all__ = ['datasets', 'latent_features', 'discovery', 'evaluation', 'utils']
 
 logging.config.fileConfig(

diff --git a/ampligraph/datasets/data_indexer.py b/ampligraph/datasets/data_indexer.py
@@ -562,19 +562,24 @@ def get_indexes_from_a_dictionary_single(
             logger.error(msg)
             raise Exception(msg)
 
+        elements = []
         if type_of == "e":
-            elements = np.array([entities[x] for x in sample], dtype=dtype)
-            return elements
+            elements = np.array([entities[x] for x in sample if x in entities.keys()], dtype=dtype)
         elif type_of == "r":
-            elements = np.array([relations[x] for x in sample], dtype=dtype)
-            return elements
+            elements = np.array([relations[x] for x in sample if x in relations.keys()], dtype=dtype)
         else:
             if type_of not in ["r", "e"]:
                 msg = "No such option, should be r (relations) or e (entities), instead got {}".format(
                     type_of
                 )
                 logger.error(msg)
                 raise Exception(msg)
+        invalid_keys = len(sample) - len(elements)
+        if invalid_keys > 0:
+            print(
+                "\n{} entities with invalid keys skipped!".format(invalid_keys)
+            )
+        return elements
 
     def get_relations_count(self):
         """Get number of unique relations."""

diff --git a/ampligraph/datasets/datasets.py b/ampligraph/datasets/datasets.py
@@ -315,7 +315,8 @@ def _add_reciprocal_relations(triples_df):
     df_reciprocal.iloc[:, 1] = df_reciprocal.iloc[:, 1] + "_reciprocal"
 
     # append to original triples
-    triples_df = triples_df.append(df_reciprocal)
+    triples_df = pd.concat([triples_df, df_reciprocal])
+
     return triples_df
 
 

diff --git a/ampligraph/datasets/graph_data_loader.py b/ampligraph/datasets/graph_data_loader.py
@@ -17,6 +17,7 @@
 from datetime import datetime
 
 import numpy as np
+import pandas as pd
 import tensorflow as tf
 
 from .data_indexer import DataIndexer
@@ -122,6 +123,12 @@ def _load(self, data_source, dataset_type):
         )
         self.data_source = data_source
         self.dataset_type = dataset_type
+
+        if isinstance(self.data_source, pd.DataFrame):
+            self.data_source = self.data_source.values
+        elif isinstance(self.data_source, list):
+            self.data_source = np.array(self.data_source)
+
         if isinstance(self.data_source, np.ndarray):
             if self.use_indexer is True:
                 self.mapper = DataIndexer(

diff --git a/ampligraph/discovery/__init__.py b/ampligraph/discovery/__init__.py
@@ -24,6 +24,7 @@
     find_clusters,
     find_duplicates,
     query_topn,
+    find_nearest_neighbours
 )
 
-__all__ = ["discover_facts", "find_clusters", "find_duplicates", "query_topn"]
+__all__ = ["discover_facts", "find_clusters", "find_duplicates", "query_topn", "find_nearest_neighbours"]