Skip to content

Commit

Permalink
Merge pull request #410 from netarchivesuite/331_major_upgrade
Browse files Browse the repository at this point in the history
Upgrade Java 1.8 → 11, Tomcat 8.5 → 9 and Solr 7 → 9
  • Loading branch information
thomasegense authored Dec 20, 2023
2 parents ecdbad9 + 9c72537 commit 963f872
Show file tree
Hide file tree
Showing 138 changed files with 6,455 additions and 20,954 deletions.
4 changes: 4 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# SolrWayback changelog

4.5.0
-----
Upgrade Java 1.8 → 11, Tomcat 8.5 → 9 and Solr 7 → 9. SolrWayback 4.5.0 is backwards compatible with existing Solr 7 installations.

4.4.3
-----

Expand Down
56 changes: 23 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,18 +128,20 @@ Documents in SolrWayback are indexed through the [warc-indexer](https://github.c


## Requirements
* Works on macOS/Linux/Windows.
* JDK 8/9/10/11
* Works on macOS/Linux/Windows
* Java 11 (tested with OpenJDK)
* A nice collection of ARC/WARC files or harvest your own with Heritrix, Webrecorder, Brozzler, Wget, etc.
* Tomcat 8+ or another J2EE server for deploying the WAR-file
* A Solr 7.X server with the index build from the Arc/Warc files using the Warc-Indexer version 3.2.0-SNAPSHOT +
* Tomcat 9+ or another J2EE server for deploying the WAR-file
* A Solr 9+ server with the index build from the Arc/Warc files using the Warc-Indexer version 3.2.0-SNAPSHOT+
* (Optional) chrome/(chromium) installed for page previews to work. (headless chrome)

## Build and usage
* Build the application with: `mvn package`
* Deploy the `target/solrwayback-*.war` file in a web-container
* Copy `src/test/resources/properties/solrwayback.properties` and `/src/test/resources/properties/solrwaybackweb.properties`
to `user/home/` folder for the J2EE server
to either the root of the tomcat folder or the `user/home/` folder for the J2EE server.
Alternatively use the [src/main/webapp/META-INF/context.xml](src/main/webapp/META-INF/context.xml) as template
for a context for the SolrWayback WAR and set the paths for the properties directly.
* Modify the property files. (default all urls http://localhost:8080)
* Open search interface: http://localhost:8080/solrwayback

Expand Down Expand Up @@ -168,38 +170,40 @@ Unzip and follow the instructions below.
## Installation instructions

### 1) INITIAL SETUP
* **Setup:** Copy the two property files: `properties/solrwayback.properties` and `properties/solrwaybackweb.properties` to your HOME folder (or the home-folder for Tomcat user).
* **Optional:** For screenshot previews to work you may have to edit the file `solrwayback.properties` and change the value of the last two properties : `chrome.command` and `screenshot.temp.imagedir`.

* **Optional:** For screenshot previews to work you may have to edit the file `properties/solrwayback.properties` and change the value of the last two properties : `chrome.command` and `screenshot.temp.imagedir`.
Chrome(Chromium) must be installed for preview of images to work.

If you encounter any errors when running a script during installation or setup, try change the permissions for the file (`startup.sh` etc.). On Linux and mac, this can be done with the following command: `chmod +x filename.sh`

**Note:** Previous versions of the SolrWayback bundle expected the property files to be located at the root of the home folder of the user. If this is preferable, move the two property files `solrwayback.properties` and `solrwaybackweb.properties` from the `properties/` folder in the bundle to the root of the home folder of the user.

### 2) STARTING SOLRWAYBACK
SolrWayback requires both Solr and Tomcat to be running. These processes are started and stopped separately with the following commands:

* **Step 1:** Navigate to the location of the bundle on your computer.

For Linux and Mac:
* **Step 2.1:** Start tomcat with this command: `apache-tomcat-8.5.60/bin/startup.sh`
* **Step 2.2:** Start solr with this command: `solr-7.7.3/bin/solr start`
* **Step 2.1:** Start tomcat with this command: `tomcat-9/bin/startup.sh`
* **Step 2.2:** Start solr with this command: `solr-9/bin/solr start -c -m 1g`

For Windows:
* **Step 2.1:** To start tomcat navigate to `apache-tomcat-8.5.60/bin/` and type `startup.bat`
* **Step 2.2:** To start solr navigate to `solr-7.7.3/bin/` and type `solr.cmd start`

* **Step 2.1:** To start tomcat navigate to `tomcat-9/bin/` and type `startup.bat`
* **Step 2.2:** To start solr navigate to `solr-9/bin/` and type `solr.cmd start -c -m 1g`

* **Step 3:** To see that tomcat and solr is running open the following links: http://localhost:8080/solrwayback/ and http://localhost:8983/solr/#/netarchivebuilder. If these are not throwing errors the services have been started successfully.

#### Tomcat:

* Start tomcat: `apache-tomcat-8.5.60/bin/startup.sh`
* Stop tomcat: `apache-tomcat-8.5.60/bin/shutdown.sh`
* (For windows navigate to `apache-tomcat-8.5.60/bin/` and type `startup.bat` or `shutdown.bat`)
* Start tomcat: `tomcat-9/bin/startup.sh`
* Stop tomcat: `tomcat-9/bin/shutdown.sh`
* (For windows navigate to `tomcat-9/bin/` and type `startup.bat` or `shutdown.bat`)
* To see Tomcat is running open: http://localhost:8080/solrwayback/

#### Solr:
* Start solr: `solr-7.7.3/bin/solr start`
* Stop solr: `solr-7.7.3/bin/solr stop -all`
* (For windows navigate to `solr-7.7.3/bin/` and type `solr.cmd start` or `solr.cmd stop -all`)
* Start solr: `solr-9/bin/solr start -c -m 1g`
* Stop solr: `solr-9/bin/solr stop -all`
* (For windows navigate to `solr-9/bin/` and type `solr.cmd start -c -m 1g` or `solr.cmd stop -all`)
* To see Solr is running open: http://localhost:8983/solr/#/netarchivebuilder

### 3) INDEXING
Expand Down Expand Up @@ -314,7 +318,7 @@ A more advanced distributed indexing flow can be handled by the archon/arctika i
If you want to remove and old index and create a new index from scratch, this can be done by following these steps:

1. Stop solr
2. Delete the folder `solr-7.7.3/server/solr/configsets/netarchivebuilder/netarchivebuilder_data/index` (or rename to `index1` etc, if you want to switch back later)
2. Delete the folder `solr-9/server/solr/netarchivebuilder_shard1_replica_n1/data/index/` (or rename to `index1` etc, if you want to switch back later)
3. Start solr
4. Start the indexing script

Expand Down Expand Up @@ -345,17 +349,3 @@ The optional `--span-hosts` parameter will also harvest resources outside the do
`wget --span-hosts --level=1 --recursive --warc-cdx --page-requisites --warc-file=warcfilename --warc-max-size=1G -i url_list.txt`\
where `level=1` means "starting URLs and the first level of URLs linked from the starting URLs".
This will substantially increase the size of the WARC file(s).


**THIS VERSION HAS BEEN PATCHED AGAINST 'log4shell'**

## log4shell security alert - Patch your SolrWayback Bundle if you are using a release before 4.2.3.
SolrWayback itself does not use log4j 2+ and is not directly affected by [CVE-2021-44228](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228).

The SolrWayback bundle uses Solr 7.7.3, which **is affected by log4shell**. Please follow the [Solr log4shell mitigation guide](https://solr.apache.org/security.html#apache-solr-affected-by-apache-log4j-cve-2021-44228) if the bundled Solr is used before version 4.2.3. The quickest fix, taken from the guide, is the following:
* (Linux/MacOS) Edit your solr.in.sh file to include: SOLR_OPTS="$SOLR_OPTS -Dlog4j2.formatMsgNoLookups=true"
* (Windows) Edit your solr.in.cmd file to include: set SOLR_OPTS=%SOLR_OPTS% -Dlog4j2.formatMsgNoLookups=true

If another version of Solr is used, note that Solr >= 7.4 and < 8.11 are vulnerable. See the mitigation guide above for details.


46 changes: 32 additions & 14 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -109,19 +109,37 @@
<version>30.0-jre</version>
</dependency>

<!-- SOLR -->
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-solrj</artifactId>
<version>7.1.0</version>
</dependency>
<!-- SOLR -->
<!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core -->
<!-- SOLR -->
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-solrj</artifactId>
<version>9.1.0</version>
</dependency>

<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-test-framework</artifactId>
<version>9.1.0</version>
<!-- <version>7.1.0</version> -->
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.solr/solr-core -->
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-core</artifactId>
<version>9.1.0</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>9.2.0</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-test-framework</artifactId>
<version>7.1.0</version>
<scope>test</scope>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-compress -->
<dependency>
Expand Down Expand Up @@ -269,8 +287,8 @@
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<source>11</source>
<target>11</target>
</configuration>
</plugin>

Expand Down
98 changes: 97 additions & 1 deletion src/bundle/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,100 @@ Resources used when building the SolrWayback bundle.
- `install SolrWayback bundle`: See install guide [SolrWayback README](https://github.com/netarchivesuite/solrwayback/blob/master/README.md/)
- `indexing`: Scripts for indexing WARC files using [webarchive-discovery](https://github.com/ukwa/webarchive-discovery/)
- `Changes.md`: See version history [SolrWayback](https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md/)
- `properties`: Default properties for the SolrWayback Bundle

- solrwaybackproxy
- Solr 9 config files
- Tomcat 9
- Solr 9

## How to for package managers

### Build WARs and JAR

Create the SolrWayback WAR
```
mvn clean package
```

Build a `warc-indexer-0.3.2-SNAPSHOT-jar-with-dependencies.jar` from [webarchive-discovery](https://github.com/ukwa/webarchive-discovery/).

Build a `solrwaybackrootproxy-4.3.1.war` from [solrwaybackrootproxy](https://github.com/netarchivesuite/solrwaybackrootproxy).

### Folder structure

```
mkdir solrwayback_package_4.5
cd solrwayback_package_4.5/
cp -r ../src/bundle/indexing/ .
cp
cp -r ../src/test/resources/solr_9/ solr_9_files.
cp ../README.md ../CHANGES.md .
mkdir properties
cp ../src/test/resources/properties/solrwayback.properties properties/
cp ../src/test/resources/properties/solrwaybackweb.properties properties/
```

Copy the previously generated `warc-indexer-XXX-jar-with-dependencies.jar` to the `indexing/` folder.

### Tomcat 9

Download and unpack Tomcat 9 (in current folder `solrwayback_package_4.5`)
```
wget 'https://dlcdn.apache.org/tomcat/tomcat-9/v9.0.84/bin/apache-tomcat-9.0.84.tar.gz'
tar -xzovf apache-tomcat-9.0.84.tar.gz
mv apache-tomcat-9.0.84 tomcat-9
rm apache-tomcat-9.0.84.tar.gz
```

Copy WAR and context:
```
cp ../target/solrwayback-*.war tomcat-9/webapps/solrwayback.war
mkdir -p conf/Catalina/localhost/
cp ../src/main/webapp/META-INF/context.xml tomcat-9/conf/Catalina/localhost/solrwayback.xml
```

Edit `tomcat-9/conf/Catalina/localhost/solrwayback.xml` and set
* `solrwayback-config` to `properties/solrwayback.properties`
* `solrwaybackweb-config` to `properties/solrwaybackweb.properties`

Copy and rename the previously generated `solrwaybackrootproxy-4.3.1.war` to `tomcat/webapps/ROOT.war`.

### Solr 9

Download and unpack Solr 9 (in current folder `solrwayback_package_4.5`)
```
wget 'https://www.apache.org/dyn/closer.lua/solr/solr/9.4.0/solr-9.4.0.tgz?action=download' -O solr-9.4.0.tgz
tar -xovf solr-9.4.0.tgz
mv solr-9.4.0 solr-9
rm solr-9.4.0.tgz
```

/Optional but makes it easier to debug:/ Open Solr to the World instead of just localhost
```
sed -i 's/#SOLR_JETTY_HOST="127.0.0.1"/SOLR_JETTY_HOST="0.0.0.0"/' solr-9.4.0/bin/solr.in.sh
sed -i 's/REM set SOLR_JETTY_HOST=127.0.0.1/set SOLR_JETTY_HOST=0.0.0.0/' solr-9.4.0/bin/solr.in.cmd
```

Start Solr in cloud mode, create a 1 shard `netarchivebuilder` collection and shut it down
```
solr-9/bin/solr start -c -m 1g
solr-9/bin/solr create_collection -c netarchivebuilder -d solr_9_files/netarchivebuilder/conf/ -n sw_conf_1 -shards 1
solr-9/bin/solr stop
```

### Finishing and packing (in current folder `solrwayback_package_4.5`)

Remove Emacs backup files (if any)
```
find . -iname "*~" | xargs rm
```

Create the bundle
```
cd ..
zip -r solrwayback_package_4.5.zip solrwayback_package_4.5/
```

- `properties`: Default properties for the SolrWayback Bundle

2 changes: 1 addition & 1 deletion src/bundle/indexing/show_warc_config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@

pushd ${BASH_SOURCE%/*} > /dev/null

java -cp warc-indexer-3.1.0-KB-SNAPSHOT-jar-with-dependencies.jar uk.bl.wa.util.ConfigPrinter
java -cp warc-indexer-3.2.0-SNAPSHOT-jar-with-dependencies.jar uk.bl.wa.util.ConfigPrinter

2 changes: 1 addition & 1 deletion src/bundle/indexing/warc-indexer.sh
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ index_warcs() {
export SOLR_URL
export STATUS_ROOT
export TMP_ROOT
cat "$WARCS" | xargs -P "$THREADS" -n 1 -I "{}" bash -c 'index_warc "{}"'
cat "$WARCS" | xargs -P "$THREADS" -I "{}" bash -c 'index_warc "{}"'
}

index_all() {
Expand Down
Loading

0 comments on commit 963f872

Please sign in to comment.