Skip to content

Releases: ggreen/spring-file-service

spring-file-service-0.0.1

16 Apr 11:30
Compare
Choose a tag to compare

Spring File Service

This project was developed to demonstrate file movement and integration
using VMware Data Solutions and Spring

  • High throughput file streaming
  • Multi-site replication (hub-spoke)
  • Distribution of millions of small files
  • Maintain file source directory structure
  • Low-latency file data transfers

Architecture

architecture.png

RabbitMQ

High throughput file streaming

RabbitMQ supports moving a large number of small files.
The file-consumer-sink application supports the RabbitMQ streaming that has benchmarks to support the throughput of millions messages per second.

Multi-site replication (hub-spoke)

RabbitMQ features such as exchanges, routing with binding rules, site replication (shovel and federation) make it a great solution for implementing hub and spoke integration patterns. RabbitMQ decouples producer and consumer applications. The application can be running on the same network or distributed over a Wide Area Network (WAN). This allows for better maintainability and extensibility compared to disk-based replication.

Distribution of millions of small files

RabbitMQ supports high availability and fault tolerance for messaging.
Rabbit can be set up as a cluster within a single network. Outages to one or more RabbitMQ servers can be transparent to producer and consumer applications.

In general, messages utilize memory, disk, and network resources. In RabbitMQ (ex: version RabbitMQ 11) the default max size if 134 MB. The messages should be less than maximum allowed size of 512 MB. See RabbitMQ configuration.

Maintain file source directory structure

The file-send-source application implementation adds the file attributes to each message. This includes absolute path, relative path, in addition to the file content. The application sends messages to an RabbitMQ exchange that allows for future custom routing logic.

Low-latency file data transfers

RabbitMQ supports low-latency event streaming.
It uses a push-based model from producers to consumers.

The file-consumer-sink saves meta-data to remember files sent to RabbitMQ.
It uses GemFire (based on Apache Geode).
GemFire is an In-memory SQL-database. It provides high-performance real-time apps with ultra-high speed, in-memory data and compute grid data processing.


Operations

RabbitMQ

This solution uses RabbitMQ.
See the following to download and install RabbitMQ.

File Consumer Sink

The file-consumer-sink application uses RabbitMQ streams to save file content from Rabbit to a local directory.

Application Properties

Properties Notes Default
spring.rabbitmq.host Rabbit host connection
spring.rabbitmq.username Rabbit username guest
spring.rabbitmq.password Rabbit password guest
spring.rabbitmq.port Rabbit port 5672
file.sink.rootDirectory Root directory to save files

Run file-consumer-sink

java -jar applications/file-consumer-sink/build/libs/file-consumer-sink-0.0.1-SNAPSHOT.jar --spring.rabbitmq.host=localhost --file.sink.rootDirectory=/tmp/io/output

Send File Source

The Send File Source using the Nyla Solutions library FileMonitor
to watch files in given a root source directory.

New or updated files will be sent to RabbitMQ exchange.

GemFire

The application stores meta-data about which files have been sent
with an embedded GemFire server
within the Spring Boot application using the @CacheServerApplication annotation uses
Spring Data for GemFire. The data is stored in a GemFire region named "File".

Application Properties

Properties Notes Default
spring.rabbitmq.host Rabbit host connection
spring.rabbitmq.username Rabbit username guest
spring.rabbitmq.password Rabbit password guest
spring.rabbitmq.port Rabbit port 5672
spring.data.gemfire.locators GemFire locator (only needed to inspect meta-data)
file.source.rootDirectory Root directory to watch for file changes
file.source.pollingIntervalMs Milliseconds rate to poll for file changes 1000
file.source.delayMs The delay between polling interval to monitor files 1000
file.source.fileNameFilter File name pattern to process under the root dir *
file.source.processCurrentFiles Boolean to process current files at startup true
spring.data.gemfire.disk.store.directory.location Directory to store GemFire file meta-data

Run file-send-source

java -jar applications/file-send-source/build/libs/file-send-source-0.0.1-SNAPSHOT.jar --spring.data.gemfire.disk.store.directory.location=/Users/Projects/solutions/integration/files/dev/spring-file-service/deployment/gemfire/work-dir --spring.rabbitmq.host=localhost --file.source.rootDirectory=/tmp/io/input/

If you would like to connect the File Send Source to a GemFire Locator, then add the following (note the following was tested on Java SDK version 17):

Start GemFire locator using Gfsh

$GEMFIRE_HOME/bin/gfsh -e "start locator --name=locator"

Adding JVM --add-opens --add-exports, and spring.data.gemfire.locators arguments

java   --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-exports  java.management/com.sun.jmx.remote.security=ALL-UNNAMED --add-exports  java.base/sun.nio.ch=ALL-UNNAMED -jar applications/file-send-source/build/libs/file-send-source-0.0.1-SNAPSHOT.jar --spring.data.gemfire.disk.store.directory.location=/Users/Projects/solutions/integration/files/dev/spring-file-service/deployment/gemfire/work-dir --spring.rabbitmq.host=localhost --file.source.rootDirectory=/tmp/io/input/ --spring.data.gemfire.locators="localhost[10334]"

List Files in GemFire

  • query=select * from /File
$GEMFIRE_HOME/bin/gfsh -e "connect --locator=localhost[10334]" -e "--query='select * from /File' "

Example output

/Users/devtools/repositories/IMDG/gemfire/vmware-gemfire-9.15.4/bin$ ./gfsh -e "connect --locator=localhost[10334]" -e "query --query='select * from /File' "

(1) Executing - connect --locator=localhost[10334]

Connecting to Locator at [host=localhost, port=10334] ..
Connecting to Manager at [host=192.168.86.201, port=1099] ..
Successfully connected to: [host=192.168.86.201, port=1099]

You are connected to a cluster of version 9.15.4.


(2) Executing - query --query='select * from /File'

Result : true
Limit  : 100
Rows   : 13

              absolutePath               | lastModified
---------------------------------------- | -------------
"/tmp/io/input/test3.txt"                | 1680878671420
"/tmp/io/input/heat/test1.txt"           | 1680871569198
"/tmp/io/input/e-receipt3.pdf"           | 1680878671420
"/tmp/io/input/heat/e-receipt.pdf"       | 1680871569197
"/tmp/io/input/heat/out.txt"             | 1680871569198
"/tmp/io/input/heat/test.txt"            | 168087156919...
Read more