Cloudfoundry buildpack to manage buckets S3, GCS ... based on rclone
Functionalities of this buildpack:
- Automatically configure Rclone from AWS and GCP service brokers services
- Provide a web interface to explore the contents of the buckets
- Enable serving of remote objects via HTTP
- Clone data from one bucket to another keeping it sync periodically
- Use a Rclone server with a HTTP API
Example manifest.yml
:
---
applications:
- name: rclone
memory: 512M
instances: 1
stack: cflinuxfs3
random-route: true
buildpacks:
- https://github.com/SpringerPE/cf-rclone-buildpack.git
services:
- jose-rclone-gcs
- jose-rclone-aws
env:
AUTH_USER: "admin"
AUTH_PASSWORD: "admin"
CLONE_SOURCE_SERVICE: "jose-rclone-aws"
CLONE_DESTINATION_SERVICE: "jose-rclone-gcs"
CLONE_MODE: sync
CLONE_TIMER: 600
With this configuration, the program will run rclone sync
to synchronize data from the bucket jose-rclone-aws
to jose-rclone-gcs
every 10 minutes. As each service offers only one bucket, you do not need to
know the bucket name.
The web service always requires authentication. If AUTH_USER is not defined,
it defaults to admin
and AUTH_PASSWORD will be autogenerated and printed
in stdout (you can see it with cf logs
) and stored in
/home/vcap/auth/${AUTH_USER}.password
GCS_PROJECT_NUMBER is predefined, but if you have your own project in GCP you will need to redefine it
CLONE_SOURCE_SERVICE and CLONE_DESTINATION_SERVICE should match the name of the services bound to the application and both need to be set in order to run the clone operation.
RCLONE_MODE is one option of:
copy
(default): copies data from one bucket to another, just adding files to the new bucket. It does not delete files in source neither in destination buckets. See rclone copysync
: synchronizes data from source to destination, making both identical, modifying destination only. Destination is updated to match source, including deleting files if necessary. See rclone syncmove
: Moves the contents of the source bucket to the destination bucket. Source contents will deleted as soon as they are copied to destination, rclone move
Be careful with
CLONE_MODE=sync
orCLONE_MODE=move
, those are destructive options
CLONE_TIMER specifies amount of seconds to wait to re-run the clone
operation, by default is 0
, so the clone process will not run periodically,
just once after the program starts. The process will wait after the previous
run has finished, it is not queuing jobs, so if the clone process takes
one hour, the next run will be in 10 minutes (see previous manifest).
Extra rclone parameters can be defined via environment variables. See https://rclone.org/docs/#environment-variables, but be aware that the automatic CLONE process uses the rclone API so most likely those environment variables will be ignored.
This buildpack does not allow more than one instance, deploying more than one, will cause the extra intances will fail.
Just copy the environment variable VCAP_SERVICES
from the other CF platform
and create a file called VCAP_SERVICES
in the root of the application with
the contents of the variable. When start, the buildpack will merge the contents
of the file with the environment variable and setup the rclone configuration.
Create a rclone configuration file rclone.conf
with the parameters of the
bucket, something like:
# S3 example, please fill the access key and key id
[s3-service]
type = s3
provider = AWS
access_key_id = <S3-KEY-ID>
secret_access_key = <S3-ACCESS-KEY>
region = eu-central-1
location_constraint = eu-central-1
acl = private
env_auth = false
# GCS Example. Please put the `auth.json` file in the app folder
[gcs-service]
type = google cloud storage
client_id =
client_secret =
project_number =
service_account_file = /home/vcap/app/auth.json
storage_class = REGIONAL
location = europe-west4
Note that the bucket names are not defined in this configuration, you
have to define them in the environment variables CLONE_SOURCE_BUCKET or
CLONE_DESTINATION_BUCKET and set the variable CLONE_SOURCE_SERVICE
or CLONE_DESTINATION_SERVICE to the name of the entry between brackets
(s3-service
or gcs-service
-no brackets- in this example).
Just create a file post-start.sh
, like this:
#!/bin/bash
# $RCLONE is defined env variable, just use it to execute commands
# Example command
$RCLONE rc core/version
# Sync these 2 buckets
$RCLONE -vv rc sync/sync srcFs=s3-service:bucket1 dstFs=gcs-service:bucket2
# alternative way to do it (aysnc == true)
$RCLONE rc sync/sync --json '{ "srcFs": "s3-service:bucket1", "dstFs": "gcs-service:bucket2", "_async": true }'
The variables CLONE_SOURCE_BUCKET
and CLONE_DESTINATION_BUCKET
are
automatically defined if the counterparts SERVICE variables are provided.
If a post-start.sh
file is found, no automatic clone operation will be performed.
You can define all kind of logic in this file, sync or async operations, it
does not matter, the file will be executed in background automatically at
startup.
Just open in a browser https://rclone.example.app/[SERVICE_NAME:]BUCKET_NAME/
changing SERVICE_NAME
and BUCKET_NAME
to the correct values.
Or using curl:
# Note the square brackets are escaped with \
# curl -u admin:password 'https://rclone.example.app/\[SERVICE_NAME:\]BUCKET_NAME/'
curl -u admin:password 'https://rclone.example.app/\[s3-service:\]bucket1/'
You can define a lot of buckets and use rclone API to trigger actions to those buckets (also retrieve them using HTTP). All calls must made using POST.
https://rclone.org/rc/#accessing-the-remote-control-via-http
curl -u admin:password -H "Content-Type: application/json" -X POST -d '{"potato":2,"sausage":1}' http://rclone.example.com/rc/noop
Real world example, perform a sync between 2 buckets:
curl -u admin:password -H "Content-Type: application/json" -X POST -d '{ "srcFs": "s3-service:bucket1", "dstFs": "gcs-service:bucket2", "_async": true }' http://rclone.example.com/sync/sync
All issues found are regarding the new web ui. It is quite new piece of software (Aug 2019) and currently is in alpha.
-
When log in, you have to introduce twice the auth settings. The second time, in the program interface, click first on Verify and then Login.
-
It allows you to visualize the contents of the buckets, see current operations and view/delete objects. In order to see the contents of a bucket, go to Explorer and type
<name-of-service>:<name-of-bucket>
and click Open (yep, you need to now the name of the bucket!). -
Graph does not get refreshed after the transation is done.
Buildpack implemented using bash scripts to make it easy to understand and change.
https://docs.cloudfoundry.org/buildpacks/understand-buildpacks.html
The builpack uses the deps
and cache
folders according the implementation purposes,
so, the first time the buildpack is used it will download all resources, next times
it will use the cached resources.
(c) 2019 Jose Riguera Lopez [email protected] Springernature Engineering Enablement
MIT License