diff --git a/docs/installation/manual/installing_fedora_syn_and_blazegraph.md b/docs/installation/manual/installing_fedora_syn_and_blazegraph.md index 4914a6b72..4f1f6ebd7 100644 --- a/docs/installation/manual/installing_fedora_syn_and_blazegraph.md +++ b/docs/installation/manual/installing_fedora_syn_and_blazegraph.md @@ -1,11 +1,8 @@ # Installing Fedora, Syn, and Blazegraph -!!! warning "Needs Maintenance" - The manual installation documentation is in need of attention. We are aware that some components no longer work as documented here. If you are interested in helping us improve the documentation, please see [Contributing](../../../contributing/CONTRIBUTING). - ## In this section, we will install: -- [Fedora 6](https://duraspace.org/fedora/), the back-end repository that Islandora will use +- [Fedora 6](https://fedora.lyrasis.org/), the back-end repository that Islandora will use - [Syn](https://github.com/Islandora/Syn), the authentication broker that will manage communication with Fedora - [Blazegraph](https://blazegraph.com/), the resource index layer on top of Fedora for managing discoverability via RDF @@ -40,6 +37,7 @@ create user FEDORA_DB_USER with encrypted password 'FEDORA_DB_PASSWORD'; grant all privileges on database FEDORA_DB to FEDORA_DB_USER; \q ``` + - `FEDORA_DB`: `fcrepo` - This will be used as the database Fedora will store the repository in. - `FEDORA_DB_USER`: `fedora` @@ -50,193 +48,122 @@ grant all privileges on database FEDORA_DB to FEDORA_DB_USER; The Fedora configuration is going to come in a few different chunks that need to be in place before Fedora will be functional. We’re going to place several files outright, with mildly modified parameters according to our configuration. -The basics of these configuration files have been pulled largely from the templates in [Islandora-Devops/ansible-role-fcrepo](https://github.com/islandora-devops/ansible-role-fcrepo); you may consider referencing the playbook’s templates directory for more details. - -`i8_namespaces.cnd` is a list of namespaces used by Islandora that may not necessarily be present in Fedora; we add them here to ensure we can use them in queries. - -`/opt/fcrepo/config/i8_namespaces.cnd | tomcat:tomcat/644` -``` - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +The basics of these configuration files have been pulled largely from the templates in Islandora-Devops/islandora-playbook [internal Fedora role](https://github.com/Islandora-Devops/islandora-playbook/tree/dev/roles/internal/Islandora-Devops.fcrepo); you may consider referencing the playbook’s templates directory for more details. + +#### Namespace prefixes + +`i8_namespaces.yml` is a list of namespaces used by Islandora that may not necessarily be present in Fedora; we add them here to ensure we can use them in queries. + +`/opt/fcrepo/config/i8_namespaces.yml | tomcat:tomcat/644` +```{ .yaml .copy } +# Islandora 8/Fedora namespaces +# +# This file contains ALL the prefix mappings, if a URI +# does not appear in this file it will be displayed as +# the full URI in Fedora. +acl: http://www.w3.org/ns/auth/acl# +bf: http://id.loc.gov/ontologies/bibframe/ +cc: http://creativecommons.org/ns# +dc: http://purl.org/dc/elements/1.1/ +dcterms: http://purl.org/dc/terms/ +dwc: http://rs.tdwg.org/dwc/terms/ +ebucore: http://www.ebu.ch/metadata/ontologies/ebucore/ebucore# +exif: http://www.w3.org/2003/12/exif/ns# +fedoraconfig: http://fedora.info/definitions/v4/config# +fedoramodel: info:fedora/fedora-system:def/model# +foaf: http://xmlns.com/foaf/0.1/ +geo: http://www.w3.org/2003/01/geo/wgs84_pos# +gn: http://www.geonames.org/ontology# +iana: http://www.iana.org/assignments/relation/ +islandorarelsext: http://islandora.ca/ontology/relsext# +islandorarelsint: http://islandora.ca/ontology/relsint# +ldp: http://www.w3.org/ns/ldp# +memento: http://mementoweb.org/ns# +nfo: http://www.semanticdesktop.org/ontologies/2007/03/22/nfo# +ore: http://www.openarchives.org/ore/terms/ +owl: http://www.w3.org/2002/07/owl# +premis: http://www.loc.gov/premis/rdf/v1# +prov: http://www.w3.org/ns/prov# +rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# +rdfs: http://www.w3.org/2000/01/rdf-schema# +rel: http://id.loc.gov/vocabulary/relators/ +schema: http://schema.org/ +skos: http://www.w3.org/2004/02/skos/core# +test: info:fedora/test/ +vcard: http://www.w3.org/2006/vcard/ns# +webac: http://fedora.info/definitions/v4/webac# +xml: http://www.w3.org/XML/1998/namespace +xmlns: http://www.w3.org/2000/xmlns/ +xs: http://www.w3.org/2001/XMLSchema +xsi: http://www.w3.org/2001/XMLSchema-instance ``` -We intend to have Crayfish installed later. Since Fedora needs to be able to read data from Crayfish, we need to tell Fedora that the Crayfish endpoint is a valid data source. +#### Allowed External Content Hosts + +We have Fedora provide metadata for some resources that are contained in Drupal. Fedora needs to know to allow access to these External Content hosts. -`/opt/fcrepo/config/allowed_hosts.txt | tomcat:tomcat/644` +We create a file `/opt/fcrepo/config/allowed_external_hosts.txt | tomcat:tomcat/644` ``` -http://CRAYFISH_HOST:CRAYFISH_PORT/ +http://localhost:8000/ ``` -- `CRAYFISH_HOST`: localhost -- `CRAYFISH_PORT`: 80 - - This guide will install Crayfish on the same host and port that Drupal is installed on. This may not be desirable, and if Crayfish is installed on a different host or port later, that change should be reflected here. - -The next part of the configuration defines where the pieces of the actual repository will live. Note that this file contains some of the defined `FEDORA_DB` variables from earlier. - -`/opt/fcrepo/config/repository.json | tomcat:tomcat/644` -```json -{ - "name" : "repo", - "jndiName" : "", - "workspaces" : { - "predefined" : ["default"], - "default" : "default", - "allowCreation" : true, - "cacheSize" : 10000 - }, - "storage" : { - "persistence": { - "type" : "db", - "connectionUrl": "jdbc:postgresql://localhost:5432/FEDORA_DB", - "driver" : "org.postgresql.Driver", - "username" : "FEDORA_DB_USER", - "password" : "FEDORA_DB_PASSWORD" - }, - "binaryStorage" : { - "type" : "file", - "directory" : "/opt/fcrepo/data/binaries", - "minimumBinarySizeInBytes" : 4096 - } - }, - "security" : { - "anonymous" : { - "roles" : ["readonly","readwrite","admin"], - "useOnFailedLogin" : false - }, - "providers" : [ - { "classname" : "org.fcrepo.auth.common.BypassSecurityServletAuthenticationProvider" } - ] - }, - "garbageCollection" : { - "threadPool" : "modeshape-gc", - "initialTime" : "00:00", - "intervalInHours" : 24 - }, - "node-types" : ["fedora-node-types.cnd", "file:/opt/fcrepo/config/i8_namespaces.cnd"] -} + +**Note**: the trailing backslash is important here. For more information on Fedora's External Content and configuring it, see the [Fedora Wiki pages](https://wiki.lyrasis.org/display/FEDORA6x/External+Content) + +#### Fedora configuration properties file + +Fedora 6 now allows you to put all your configuration properties into a single file. We use `0640` permissions as you will want to put your database credentials in here. + +`/opt/fcrepo/config/fcrepo.properties | tomcat:tomcat/640` +```{ .text .copy } +fcrepo.home=FCREPO_HOME +# External content using path defined above. +fcrepo.external.content.allowed=/opt/fcrepo/config/allowed_external_hosts.txt +# Namespace registry using path defined above. +fcrepo.namespace.registry=/opt/fcrepo/config/i8_namespaces.yml +fcrepo.auth.principal.header.enabled=true +# The principal header is the syn-setting.xml "config" element's "header" attribute +fcrepo.auth.principal.header.name=X-Islandora +# false to use manual versioning, true to create a version on each change +fcrepo.autoversioning.enabled=true +fcrepo.db.url=FCREPO_DB_URL +fcrepo.db.user=FCREPO_DB_USERNAME +fcrepo.db.password=FCREPO_DB_PASSWORD +fcrepo.ocfl.root=FCREPO_OCFL_ROOT +fcrepo.ocfl.temp=FCREPO_TEMP_ROOT +fcrepo.ocfl.staging=FCREPO_STAGING_ROOT +# Can be sha512 or sha256 +fcrepo.persistence.defaultDigestAlgorithm=sha512 +# Jms moved from 61616 to allow external ActiveMQ to use that port +fcrepo.dynamic.jms.port=61626 +# Same as above +fcrepo.dynamic.stomp.port=61623 +fcrepo.velocity.runtime.log=FCREPO_VELOCITY_LOG +fcrepo.jms.baseUrl=FCREPO_JMS_BASE ``` -Finally, we need an actual `fcrepo-config.xml` to pull this configuration into place. There's nothing to edit in here by default, but pay attention to the `p:repositoryConfiguration` property of the `modeshapeRepofactory` bean, which contains the path to the `repository.json` file we made earlier. If you've placed this somewhere else, you'll need to change it here. +* `FCREPO_HOME` - The home directory for all Fedora generated output and state. Unless otherwise specified, all logs, metadata, binaries, and internally generated indexes, etc. It would default to the Tomcat starting directory. A good default would be `/opt/fcrepo` +* `FCREPO_DB_URL` - This parameter allows you to set the database connection url. In general the format is as follows: -`/opt/fcrepo/config/fcrepo-config.xml | tomcat:tomcat/644` -```xml - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - /** = servletContainerAuthFilter,headerProvider,delegatedPrincipalProvider,webACFilter - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -``` + `jdbc:://:/` + + Fedora currently supports H2, PostgresQL 12.3, MariaDB 10.5.3, and MySQL 8.0 + + So using the default ports for the supported databases here are the values we typically use: + + * PostgresQL: `jdbc:postgresql://localhost:5432/fcrepo` + * MariaDB: `jdbc:mariadb://localhost:3306/fcrepo` + * MySQL: `jdbc:mysql://localhost:3306/fcrepo` + +* `FCREPO_DB_USERNAME` - The database username +* `FCREPO_DB_PASSWORD` - The database password +* `FCREPO_OCFL_ROOT` - Sets the root directory of the OCFL. Defaults to `FCREPO_HOME/data/ocfl-root` if not set. +* `FCREPO_TEMP_ROOT` - Sets the temp directory used by OCFL. Defaults to `FCREPO_HOME/data/temp` if not set. +* `FCREPO_STAGING_ROOT` - Sets the staging directory used by OCFL. Defaults to `FCREPO_HOME/data/staging` if not set. +* `FCREPO_VELOCITY_LOG` - The Fedora HTML template code uses Apache Velocity, which generates a runtime log called velocity.log. Defaults to `FCREPO_HOME/logs/velocity`. A good choice might be /opt/tomcat/logs/velocity.log +* `FCREPO_JMS_BASE` - This specifies the baseUrl to use when generating JMS messages. You can specify the hostname with or without port and with or without path. If your system is behind a NAT firewall you may need this to avoid your message consumers trying to access the system on an invalid port. If this system property is not set, the host, port and context from the user's request will be used in the emitted JMS messages. If your Alpaca is on the same machine as your Fedora and you use the `islandora-indexing-fcrepo`, you could use http://localhost:8080/fcrepo/rest. + + +Check the Lyrasis Wiki to find all of [Fedora's properties](https://wiki.lyrasis.org/display/FEDORA6x/Properties) ### Adding the Fedora Variables to `JAVA_OPTS` @@ -248,14 +175,14 @@ We need our Tomcat `JAVA_OPTS` to include references to our repository configura > 3 | export JAVA_OPTS="-Djava.awt.headless=true -server -Xmx1500m -Xms1000m" **After**: -> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dfcrepo.home=/otp/fcrepo/data -Dfcrepo.velocity.runtime.log=/opt/tomcat/logs/velocity.log -Dfcrepo.jms.baseUrl=http://localhost:8080/fcrepo/rest -Dfcrepo.autoversioning.enabled=false -DconnectionTimeout=-1 -Dfcrepo.db.url=jdbc:postgresql://localhost:5432/fcrepo -Dfcrepo.db.user=fedora -Dfcrepo.db.password=fedora -server -Xmx1500m -Xms1000m" +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dfcrepo.config.file=/opt/fcrepo/config/fcrepo.properties -DconnectionTimeout=-1 -server -Xmx1500m -Xms1000m" ### Ensuring Tomcat Users Are In Place While not strictly necessary, we can use the `tomcat-users.xml` file to give us direct access to the Fedora endpoint. Fedora defines, out of the box, a `fedoraAdmin` and `fedoraUser` role that can be reflected in the users list for access. The following file will also include the base `tomcat` user. As always, these default passwords should likely not stay as the defaults. `/opt/tomcat/conf/tomcat-users.xml | tomcat:tomcat/600` -```xml +```{ .xml .copy } ``` + - `TOMCAT_PASSWORD`: `tomcat` - `FEDORA_ADMIN_PASSWORD`: `islandora` - `FEDORA_USER_PASSWORD`: `islandora` @@ -282,6 +210,7 @@ sudo wget -O fcrepo.war FCREPO_WAR_URL sudo mv fcrepo.war /opt/tomcat/webapps sudo chown tomcat:tomcat /opt/tomcat/webapps/fcrepo.war ``` + - `FCREPO_WAR_URL`: This can be found at the [fcrepo downloads page](https://github.com/fcrepo/fcrepo/releases); the file you're looking for is: - Tagged in green as the 'Latest release' - Named "fcrepo-webapp-VERSION.war" @@ -310,6 +239,7 @@ sudo wget -P /opt/tomcat/lib SYN_JAR_URL sudo chown -R tomcat:tomcat /opt/tomcat/lib sudo chmod -R 640 /opt/tomcat/lib ``` + - `SYN_JAR_URL`: The latest stable release of the Syn JAR from the [releases page](https://github.com/Islandora/Syn/releases). Specifically, the JAR compiled as `-all.jar` is required. ### Generating an SSL Key for Syn @@ -328,12 +258,13 @@ sudo chown www-data:www-data /opt/keys/syn* Syn sites and tokens belong in a settings file that we’re going to reference in Tomcat. `/opt/fcrepo/config/syn-settings.xml | tomcat:tomcat/600` -```xml +```{ .xml .copy } ISLANDORA_SYN_TOKEN ``` + - `ISLANDORA_SYN_TOKEN`: `islandora` - This should be a secure generated token rather than this default; it will be configured on the Drupal side later. @@ -341,6 +272,10 @@ Syn sites and tokens belong in a settings file that we’re going to reference i Referencing the valve we’ve created in our `syn-settings.xml` involves creating a `` entry in Tomcat’s `context.xml`: +There are two options here: + +#### 1. Enable the Syn Valve for all of Tomcat. + `/opt/tomcat/conf/context.xml` **Before**: @@ -355,6 +290,20 @@ Referencing the valve we’ve created in our `syn-settings.xml` involves creatin > 31 | `` +#### 2. Enable the Syn Valve for only Fedora. + +Create a new file at + +`/opt/tomcat/conf/Catalina/localhost/fcrepo.xml` + +```{ .xml .copy } + + + +``` + +Your Fedora web application needs to be deployed in Tomcat with the name `fcrepo.war`. Otherwise, change the name of the above XML file to match the deployed web application's name. + ### Restarting Tomcat Finally, restart tomcat to apply the new configurations. @@ -365,7 +314,95 @@ sudo systemctl restart tomcat **Note:** sometimes it takes a while for Fedora and Tomcat to start up, usually it shouldn't take longer than 5 minutes. -**Note:** after installing the Syn valve, you'll no longer be able to manually manage objects via Fedora Web UI or access the Fedora home page (http://localhost:8080/fcrepo). All communication with Fedora will now be handled from the Islandora module in Drupal. +**Note:** after installing the Syn valve, you'll no longer be able to manually create/edit or delete objects via Fedora Web UI. All communication with Fedora will now be handled from the Islandora module in Drupal. + +### Redhat logging + +Redhat systems have stopped generating an all inclusive `catalina.out`, the `catalina..log` does not include web application's log statements. To get Fedora log statements flowing, you can create your own [LogBack](https://logback.qos.ch/) configuration file and point to it. + +`/opt/fcrepo/config/fcrepo-logback.xml | tomcat:tomcat/644` +```{ .xml .copy } + + + + + + %p %d{HH:mm:ss.SSS} [%thread] \(%c{0}\) %m%n + + + + + ${catalina.base}/logs/fcrepo.log + true + + ${catalina.base}/logs/fcrepo.%d{yyyy-MM-dd}.log.%i + 10MB + 30 + 2GB + + + %p %d{HH:mm:ss.SSS} [%thread] \(%c{0}\) %m%n + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +Then alter your `$JAVA_OPTS` like [above](#adding-the-fedora-variables-to-java_opts) to include +``` +-Dlogback.configurationFile=/opt/fcrepo/config/fcrepo-logback.xml +``` + +This will generate a log file at `${catalina.base}/logs/fcrepo.log` and will rotate each day or if the logs reaches 10MB. It will maintain 30 days of old logs, or 2GB whichever comes first. ## Blazegraph 2 @@ -389,6 +426,7 @@ sudo wget -O blazegraph.war BLAZEGRAPH_WARFILE_LINK sudo mv blazegraph.war /opt/tomcat/webapps sudo chown tomcat:tomcat /opt/tomcat/webapps/blazegraph.war ``` + - BLAZEGRAPH_WAR_URL: You can find a link to this at the [Maven repository for Blazegraph](https://repo1.maven.org/maven2/com/blazegraph/bigdata-war/); you’ll want to click the link for the latest version of Blazegraph 2.1.x, then get the link to the `.war` file within that version folder. Once this is downloaded, give it a moment to expand before moving on to the next step. @@ -398,7 +436,7 @@ Once this is downloaded, give it a moment to expand before moving on to the next We would like to have an appropriate logging configuration for Blazegraph, which can be useful for looking at incoming traffic and determining if anything has gone wrong with Blazegraph. Our logger isn’t going to be much different than the default logger; it can be made more or less verbose by changing the default `WARN` levels. There are several other loggers that can be enabled, like a SPARQL query trace or summary query evaluation log; if these are desired they should be added in. Consult the Blazegraph documentation for more details. `/opt/blazegraph/conf/log4j.properties | tomcat:tomcat/644` -``` +```{ .text .copy } log4j.rootCategory=WARN, dest1 # Loggers. @@ -434,7 +472,7 @@ log4j.appender.ruleLog.layout.ConversionPattern=%m Our configuration will be built from a few different files that we will eventually reference in `JAVA_OPTS` and directly apply to Blazegraph; these include most of the functional pieces Blazegraph requires, as well as a generalized configuration for the `islandora` namespace it will use. As with most large configurations like this, these should likely be tuned to your preferences, and the following files only represent sensible defaults. `/opt/blazegraph/conf/RWStore.properties | tomcat:tomcat/644` -``` +``` { .text .copy } com.bigdata.journal.AbstractJournal.file=/opt/blazegraph/data/blazegraph.jnl com.bigdata.journal.AbstractJournal.bufferMode=DiskRW com.bigdata.service.AbstractTransactionService.minReleaseAge=1 @@ -454,7 +492,7 @@ com.bigdata.journal.Journal.collectPlatformStatistics=false ``` `/opt/blazegraph/conf/blazegraph.properties | tomcat:tomcat/644` -``` +```{ .text .copy } com.bigdata.rdf.store.AbstractTripleStore.textIndex=false com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.OwlAxioms com.bigdata.rdf.sail.isolatableIndices=false @@ -470,7 +508,7 @@ com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false ``` `/opt/blazegraph/conf/inference.nt | tomcat:tomcat/644` -``` +```{ .text .copy } . . ``` @@ -482,11 +520,10 @@ In order to enable our configuration when Tomcat starts, we need to reference th `/opt/tomcat/bin/setenv.sh` **Before**: -> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dfcrepo.home=/otp/fcrepo/data -Dfcrepo.velocity.runtime.log=/opt/tomcat/logs/velocity.log -Dfcrepo.jms.baseUrl=http://localhost:8080/fcrepo/rest -Dfcrepo.autoversioning.enabled=false -DconnectionTimeout=-1 -Dfcrepo.db.url=jdbc:postgresql://localhost:5432/fcrepo -Dfcrepo.db.user=fedora -Dfcrepo.db.password=fedora -server -Xmx1500m -Xms1000m" - +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dfcrepo.config.file=/opt/fcrepo/config/fcrepo.properties -DconnectionTimeout=-1 -server -Xmx1500m -Xms1000m" **After**: -> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dfcrepo.home=/otp/fcrepo/data -Dfcrepo.velocity.runtime.log=/opt/tomcat/logs/velocity.log -Dfcrepo.jms.baseUrl=http://localhost:8080/fcrepo/rest -Dfcrepo.autoversioning.enabled=false -DconnectionTimeout=-1 -Dfcrepo.db.url=jdbc:postgresql://localhost:5432/fcrepo -Dfcrepo.db.user=fedora -Dfcrepo.db.password=fedora -Dcom.bigdata.rdf.sail.webapp.ConfigParams.propertyFile=/opt/blazegraph/conf/RWStore.properties -Dlog4j.configuration=file:/opt/blazegraph/conf/log4j.properties -server -Xmx1500m -Xms1000m" +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dfcrepo.config.file=/opt/fcrepo/config/fcrepo.properties -DconnectionTimeout=-1 -Dcom.bigdata.rdf.sail.webapp.ConfigParams.propertyFile=/opt/blazegraph/conf/RWStore.properties -Dlog4j.configuration=file:/opt/blazegraph/conf/log4j.properties -server -Xmx1500m -Xms1000m" ### Restarting Tomcat @@ -501,11 +538,11 @@ sudo systemctl restart tomcat The two other files we created, `blazegraph.properties` and `inference.nt`, contain information that Blazegraph requires in order to establish and correctly use the datasets Islandora will send to it. First, we need to create a dataset - contained in `blazegraph.properties` - and then we need to inform that dataset of the inference set we have contained in `inference.nt`. -```bash +``` { .bash .copy } curl -X POST -H "Content-Type: text/plain" --data-binary @/opt/blazegraph/conf/blazegraph.properties http://localhost:8080/blazegraph/namespace -# If this worked correctly, Blazegraph should respond with "CREATED: islandora" -# to let us know it created the islandora namespace. +``` +If this worked correctly, Blazegraph should respond with "CREATED: islandora" to let us know it created the islandora namespace. +``` { .bash .copy } curl -X POST -H "Content-Type: text/plain" --data-binary @/opt/blazegraph/conf/inference.nt http://localhost:8080/blazegraph/namespace/islandora/sparql -# If this worked correctly, Blazegraph should respond with some XML letting us -# know it added the 2 entries from inference.nt to the namespace. ``` +If this worked correctly, Blazegraph should respond with some XML letting us know it added the 2 entries from inference.nt to the namespace. diff --git a/mkdocs.yml b/mkdocs.yml index 90ad942f9..c86dbcaaa 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -27,6 +27,13 @@ markdown_extensions: - footnotes - toc: permalink: True + - pymdownx.highlight: + anchor_linenums: true + line_spans: __span + pygments_lang_class: true + - pymdownx.inlinehilite + - pymdownx.snippets + - pymdownx.superfences extra_css: - css/custom.css plugins: