-
Notifications
You must be signed in to change notification settings - Fork 1
User Guide
The PySpark Plaso extracting process is controlled by a Web service via a REST API. The service is running in Docker Spark Application container where the PySpark Plaso build artefacts were deployed (see the build process).
There are several shell scripts in deployment/scripts
to transfer data between a local file-system and a distributed file-system utilised by the distributed environment (HDFS) and to control the extraction process in the environment (those scripts require curl and zip or p7zip).
-
./client-ls.sh [--url=http://0.0.0.0:5432/] [path-to-list] [another-path ...]
-- to list the content of HDFS at a particular path -
./client-rm.sh [--url=http://0.0.0.0:5432/] <path-remove> [another-path ...]
-- to remove a file or a directory from HDFS -
./client-download-file.sh [--url=http://0.0.0.0:5432/] <path-where-to-download> <file-path-to-download> [another-file ...]
-- to download a file from HDFS into a local file-system -
./client-download-into-zip.sh [--url=http://0.0.0.0:5432/] <path-where-to-download> <file-or-dir-path-to-download> [another-file-or-dir ...]
-- to download a file or a directory from HDFS as a ZIP file into a local file-directory -
./client-upload-file-dir.sh [--url=http://0.0.0.0:5432/] <path-where-to-upload> <file-or-directory-to-upload> [another-file-or-dir ...]
-- to upload a file or a directory from a local file-system into HDFS -
./client-upload-zip.sh [--url=http://0.0.0.0:5432/] <path-where-to-upload> <zip-file-to-extract-there> [another-file-or-dir ...]
-- to upload the contant of a ZIP file from a local file-system into HDFS -
./client-extract.sh [--url=http://0.0.0.0:5432/] [path-to-extract] [another-path ...]
-- to run the extraction process on a given path in the HDFS
In default configuration (see the deployment/docker-compose/webapp.yml
docker-compose file) the REST Web API is running at http://0.0.0.0:5432/. The following operation are available in the Web API:
-
to list the content of HDFS at a particular path (the path can be empty to list the content of a root directory)
GET http://0.0.0.0:5432/ls/[path-to-list]
-
to remove a file or a directory from HDFS (the path is mandatory here)
GET http://0.0.0.0:5432/rm/<path-to-remove>
DELETE http://0.0.0.0:5432/file/<path-to-remove>
-
xxx