diff --git a/docs/developer-guide/getting-started-with-persistence.md b/docs/developer-guide/getting-started-with-persistence.md index db9d3c9..c57e0b1 100644 --- a/docs/developer-guide/getting-started-with-persistence.md +++ b/docs/developer-guide/getting-started-with-persistence.md @@ -4,65 +4,33 @@ This page contains explanations and code samples for developers who need to store their entities into the database. -The Strongbox project uses [OrientDB](http://orientdb.com/orientdb/) as its internal persistent storage through the -corresponding `JPA` implementation and `spring-orm` middle tier. Also we use `JTA` for transaction management and -`spring-tx` implementation module from Spring technology stack. +The Strongbox project uses [JanusGraph](https://janusgraph.org/) as its internal persistent storage through the +corresponding [Gremlin](https://tinkerpop.apache.org/gremlin.html) implementation and [spring-data-neo4j](https://spring.io/projects/spring-data-neo4j#overview) middle tier. We also use `JTA` for transaction management and the `spring-tx` implementation module from the Spring technology stack. -## OrientDB Studio +## Persistence stack -As you are learning about Strongbox persistence, you may want to explore the existing persistence implementation. -For development environments, Strongbox includes an embedded OrientDB server as well as an embedded instance of -OrientDB Studio. By default, when you run the application from the source tree, you'll use the embedded database -server. However, OrientDB Studio is disabled by default. +We're using the following technology stack to deal with persistence: -### Running OrientDB Studio From Source Tree + - Embedded Cassandra as direct storage (`CassandraDaemon` allows us to have the Cassandra instance inside the same JVM as the application) + - JanusGraph as our graph DBMS (it is not directly a data storage, it just allows you to have access to data in the form of a graph) + - [Apache TinkerPop](http://tinkerpop.apache.org/docs/current/reference/) as a set of tools to interact with the database + - [spring-data-neo4j](https://github.com/spring-projects/spring-data-neo4j) to manage transactions in Spring with `Neo4jTransactionManager` and implement custom Cypher queries with Spring Data repositories (by custom queries via the `@org.springframework.data.neo4j.annotation.Query` annotation) + - [cypher-for-gremlin](https://github.com/opencypher/cypher-for-gremlin) which translates Cypher queries into Gremlin traversals (it has some issues which prevent us from using it for `neo4j-ogm` CRUD operations, these issues will be explained below) + - [neo4j-ogm](https://github.com/neo4j/neo4j-ogm) to map Java POJOs into Vertices and Edges of Graph + - We also use custom `EntityTraversalAdapters`, which implement anonymous Gremlin traversals for CRUD operations under `neo4j-ogm` entities. -To enable OrientDB Studio, you need only to set the property `strongbox.orientdb.studio.enabled` to `true`. You -can do this on the Maven command line by running Strongbox as follows: +## Vertices and Edges -``` -$ mvn spring-boot:run -Dspring-boot.run.jvmArguments="-Dstrongbox.orientdb.studio.enabled=true" -``` - -There are two additional properties that can be used to configure OrientDB Studio: - -- `strongbox.orientdb.studio.ip.address` -- `strongbox.orientdb.studio.port` - -### Running OrientDB Studio From The Distribution - -If you're running from the `tar.gz`, or `rpm` distributions, you can start Strongbox as follows to enable OrientDB Studio: - -``` -$ cd /opt/strongbox -$ STRONGBOX_VAULT=/opt/strongbox-vault STRONGBOX_ORIENTDB_STUDIO_ENABLED=true ./bin/strongbox console -``` +Unlike a relational DBMS, Graph DBMS have vertices and edges, not rows and tables. So, in terms of Graph, every persistent entity should be stored as vertex or edge. An example of a vertex might be `Artifact` or `AritfactCoordinates` and the relation between them would be an edge. It should be noted that, unlike RDBMS, object relations are represented by a separate edge, instead of just a foreign key column in a table. In addition to vertices, persistence objects can also be edges -- for example, the `ArtifactDependency` would be an edge between `ArtifactCoordinates` vertices. -Please, note that the `STRONGBOX_VAULT` environment variable needs to be pointing to an absolute path for this to work. +## Gremlin Server -As with the source distribution, you can set additional environment variables to further configure OrientDB Studio: +`TODO` -``` -$ export STRONGBOX_ORIENTDB_STUDIO_IP_ADDRESS=0.0.0.0 -$ export STRONGBOX_ORIENTDB_STUDIO_PORT=2480 -``` - -Once the application is running, you can login to OrientDB Studio by visiting -http://127.0.0.1:2480/studio/index.html in your browser. The initial credentials are `admin` and `password`. - -![Login Screen](/assets/screenshots/orientdb-studio/login-screen.png) - -After your login, you'll land on the Browse Screen, which allows you to query the embedded database. - -![Browse Screen](/assets/screenshots/orientdb-studio/browse-screen.png) - -Finally, you can explore the schema defined in the database by clicking `SCHEMA`. - -![Schema Screen](/assets/screenshots/orientdb-studio/schema-screen.png) ## Adding Dependencies -Let's assume that you, as a Strongbox developer, need to create a new module or write some persistence code in an +Let's assume that you, as a Strongbox developer, need to create a new module, or write some persistence code in an existing module that does not contain any persistence dependencies yet. (Otherwise you will already have the proper `` section in your `pom.xml`, similar to the one in the example below). You will need to add the following code snippet to your module's `pom.xml` under the `` section: @@ -75,173 +43,195 @@ following code snippet to your module's `pom.xml` under the `` sec ``` -Notice that there is no need to define any direct dependencies on OrientDB or Spring Data - it's already done via +Notice that there is no need to define any direct dependencies on JanusGraph or Spring Data - it's already done via the `strongbox-data-service` module. ## Creating Your Entity Class Let's now assume that you have a POJO and you need to save it to the database (and that you probably have at least -CRUD operation's implemented in it as well). Place your code under the `org.carlspring.strongbox.domain.yourstuff` -package. For the sake of the example, let's pick `MyEntity` as the name of your entity. +CRUD operations implemented in it as well). Place your code under the `org.carlspring.strongbox.domain` +package. For the sake of the example, let's pick `PetEntity` as the name of your entity. If you want to store that entity properly you need to adopt the following rules: -* Extend the `org.carlspring.strongbox.data.domain.GenericEntity` class to inherit all required fields and logic from - the superclass. -* Define getters and setters according to the `JavaBeans` coding convention for all non-transient properties in your - class. -* Define a default empty constructor for safety (even if the compiler will create one for you, if you don't define any - other constructors) and follow the `JPA` and `java.io.Serializable` standards. -* Override the `equals() `and `hashCode()` methods according to java `hashCode` contract (because your entity could be - used in collection classes such as `java.util.Set` and if you don't define such methods properly other developers or - yourself will be not able to use your entity). -* _Optional_ - define a `toString()` implementation to let yourself and other developers see something meaningful in - the debug messages. +* Create the interface for your entity with all the getters and setters that are required to interact with the entity, according to the `JavaBeans` coding convention. This interface should extend `org.carlspring.strongbox.data.domain.DomainObject`. We need an interface in order to hide the implementation-specific details that depend on the underlying database, such as inheritance strategy. +* Create the entity class which implements the above interface and extend to `org.carlspring.strongbox.data.domain.DomainEntity`. +* Declare an entity class with `@NodeEntity` or `@RelationshipEntity`. +* Define a default empty constructor, as this would be required in order to create entity instances from `neo4j-ogm` internals. The complete source code example that follows all requirements should look something like this: ```java package org.carlspring.strongbox.domain; -import org.carlspring.strongbox.data.domain.GenericEntity; - -import com.google.common.base.Objects; - -public class MyEntity - extends GenericEntity +@NodeEntity("Pet") +public class PetEntity + extends DomainEntity + implements Pet { - private String property; + private Integer age; - public MyEntity() + public PetEntity() { } - public String getProperty() + @Override + public Integer getAge() { - return property; + return age; } - public void setProperty(String property) + @Override + public void setAge(Integer age) { - this.property = property; + this.age = age; } +} +``` + +## Creating a `EntityTraversalAdapter` + +As mentioned above, besides `neo4j-ogm` and `spring-data-neo4j`, we were forced to use custom CRUD implementations based on Gremlin. This has its advantages, as it allows us to optimize OGM entities and make them faster than what the common `neo4j-ogm` provides out of the box. The main thing of the Gremlin based CRUD is `EntityTraversalAdapter` which is a strategy for create/update/read/delete operations. The concrete `EntityTraversalAdapter` provides [Anonymous Traversals](http://tinkerpop.apache.org/docs/current/tutorials/gremlins-anatomy/) for each operation of the specific entity type. These traversals are used in Gremlin-based repositories to perform common CRUD operations: + +- `fold` : to construct entity instance based on vertex/edge and its properties +- `unfold` : to extract entity properties into vertex/edge and its properties +- `cascade` : to cascade other vertices/edges within delete if needed + +Basically these all these operations are implemented using special `__` class, which represent anonymous traversal in Gremlin. + +The `EntityTraversalAdapter` implementations can also use each other to support relations between entities, inheritance and cascade operations. + +Below is the code example of `EntityTraversalAdapter` implementation for `PetEntity`: + +```java +package org.carlspring.strongbox.gremlin.adapters; + +import static org.carlspring.strongbox.gremlin.adapters.EntityTraversalUtils.extractObject; + +import java.util.Collections; +import java.util.Map; +import java.util.Set; + +import org.apache.tinkerpop.gremlin.process.traversal.Traverser; +import org.apache.tinkerpop.gremlin.structure.Element; +import org.apache.tinkerpop.gremlin.structure.Vertex; +import org.carlspring.strongbox.domain.Pet; +import org.carlspring.strongbox.domain.PetEntity; +import org.carlspring.strongbox.gremlin.dsl.EntityTraversal; +import org.carlspring.strongbox.gremlin.dsl.__; +import org.springframework.stereotype.Component; + +@Component +public class PetAdapter extends VertexEntityTraversalAdapter +{ + @Override - public boolean equals(Object o) + public Set labels() { - if (this == o) - { - return true; - } - if (o == null || getClass() != o.getClass()) - { - return false; - } - - MyEntity myEntity = (MyEntity) o; + return Collections.singleton("Pet"); + } - return Objects.equal(property, myEntity.property); + @Override + public EntityTraversal fold() + { + return __.project("uuid", "age") + .by(__.enrichPropertyValue("uuid")) + .by(__.enrichPropertyValue("age")) + .map(this::map); + } + + private Pet map(Traverser> t) + { + PetEntity result = new PetEntity(); + result.setUuid(extractObject(String.class, t.get().get("uuid"))); + result.setAge(extractObject(Integer.class, t.get().get("age"))); + + return result; } @Override - public int hashCode() + public UnfoldEntityTraversal unfold(Pet entity) { - return Objects.hashCode(property); + EntityTraversal t = __.identity(); + if (entity.getAge() != null) + { + t = t.property(single, "age", entity.getAge()); + } + + return new UnfoldEntityTraversal<>("Pet", t); } @Override - public String toString() + public EntityTraversal cascade() { - final StringBuilder sb = new StringBuilder("MyEntity{"); - sb.append("property='").append(property).append('\''); - sb.append('}'); - - return sb.toString(); + return __.identity(); } + } -``` -## Creating a DAO Layer +``` -First of all you will need to extend the `CrudService` with the second type parameter that corresponds to your ID's data type. Usually it's just strings. +## Creating a `Repository` +All the database interactions should be done through repositories. For the compatibility with `spring-data`, we use `org.springframework.data.repository.CrudRepository` as a basis for our repositories. The base class for implementing `EntityTraversalAdapter`-based repositories is `org.carlspring.strongbox.gremlin.repositories.GremlinRepository`. Further repository implementation depends on the type of entity; for vertex-backed entities, it should be `GremlinVertexRepository`. +In addition to CRUD operations, there is also the need to be able to select data using queries. Queries could be implemented using [Cypher](https://neo4j.com/docs/cypher-manual/current/introduction/) through `spring-data-neo4j` using the `@org.springframework.data.neo4j.annotation.Query` annotation. So, the final repository should be a mixin that extends `GremlinRepository` and delegates custom `Cypher` queries to the `org.springframework.data.repository.Repository` instance provided by `spring-data-neo4j`. -!!! tip "To read more about ID's in OrientDB, check the manual" +Putting together all the above, the repository for the `PetEntity` will look like below: ```java -package org.carlspring.strongbox.users.service; +package org.carlspring.strongbox.repositories; -import org.carlspring.strongbox.data.service.CrudService; -import org.carlspring.strongbox.users.domain.MyEntity; +import javax.inject.Inject; -import org.springframework.transaction.annotation.Transactional; +import org.carlspring.strongbox.domain.Pet; +import org.carlspring.strongbox.gremlin.adapters.PetAdapter; +import org.carlspring.strongbox.gremlin.repositories.GremlinVertexRepository; +import org.springframework.stereotype.Repository; -/** - * CRUD service for managing {@link MyEntity} entities. - * - * @author Alex Oreshkevich - */ -@Transactional -public interface MyEntityService - extends CrudService +@Repository +public class PetRepository extends GremlinVertexRepository + implements PetQueries { - MyEntity findByProperty(String property); + @Inject + PetAdapter adapter; + + @Inject + PetQueries queries; -} -``` + @Override + protected PetAdapter adapter() + { + return adapter; + } -After that you will need to define an implementation of your service class. - -Follow these rules for the service implementation: - -* Inherit your CRUD service from `CommonCrudService` class; -* Name it like your service interface with an `Impl` suffix, for example `MyEntityServiceImpl`; -* Annotate your class with the Spring `@Service` and `@Transactional` annotations; -* Do **not** define your service class as public and use interface instead of class for injection (with `@Autowired`); - this follows the best practice principles from Joshua Bloch 'Effective Java' book called Programming to Interface; -* _Optional_ - define any methods you need to work with your `MyEntity` class; these methods mostly should be based on - common API form `javax.persistence.EntityManager`, or custom queries (see example below); - -* !!! warning "Avoid query parameters construction through string concatenation!" - Please avoid using query parameter construction through string concatenation! - This usually leads to [SQL Injection](https://en.wikipedia.org/wiki/SQL_injection) issues! - Bad query example: - `String sQuery = "select * from MyEntity where proprety='" + propertyValue + "'"`; - What you should do instead is to create a service which does properly assigns the parameters. - Here's an example service: - ```java - @Transactional - public class MyEntityServiceImpl - extends CommonCrudService implements MyEntityService - { - public MyEntity findByProperty(String property) - { - String sQuery = "select * from MyEntity where property = :propertyValue"; - - OSQLSynchQuery oQuery = new OSQLSynchQuery(sQuery); - oQuery.setLimit(1); - - HashMap params = new HashMap(); - params.put("propertyValue", property); - - List resultList = getDelegate().command(oQuery).execute(params); - return !resultList.isEmpty() ? resultList.iterator().next() : null; - } - } - ``` - -## Register entity schema in EntityManager -Before using entities you will need to register them. Consider the following example: + List findByAgeGreater(Integer age) + { + return queries.findByAgeGreater(age); + } -```java -@Inject -private OEntityManager oEntityManager; +} -@PostConstruct -public void init() +@Repository +interface PetQueries + extends org.springframework.data.repository.Repository { - oEntityManager.registerEntityClass(MyEntity.class); + + @Query("MATCH (pet:Pet) " + + "WHERE pet.age > $age " + + "RETURN pet") + List findByAgeGreater(@Param("age") Integer age); + } ``` + +## Issues of `cypher-for-gremlin` and `neo4j-ogm` + +The first issue that we have, is the fact that `cypher-for-gremlin` does not fully suport all Cypher syntax that is produced by `neo4j-ogm` for CRUD operations. To be more specific, on every CRUD operation, `neo4j-ogm` generates a Cypher query which is then translated to Gremlin by `cypher-for-gremlin`. As a workadound, we modify Cypher queries produced by `neo4j-ogm` and replace some clauses (see `org.opencypher.gremlin.neo4j.ogm.request.GremlinRequest`). + +Another issue is that `cypher-for-gremlin` has an ambiguous concept for working with `null` values in Gremlin. They put a lot of noisy tokens into Gremlin traversals which prevents the JanusGraph engine from matching expected indexes. This, in term, causes heavy full-scans on every query (see [#342](https://github.com/opencypher/cypher-for-gremlin/issues/342)). This was the main reason why we couldn't use the `neo4j-ogm` for CRUD operations. + +Either way, we are still using it for custom Cypher queries via the `@org.springframework.data.neo4j.annotation.Query` annotation. This is a good option to have Cypher queries, instead of Gremlin ones, because it looks more clear and takes less time to read and write queries. +