Skip to content

User Guide

Marek Rychly edited this page Dec 27, 2019 · 2 revisions

PySpark Plaso User Guide

The PySpark Plaso extracting process is controlled by a Web service via a REST API. The service is running in Docker Spark Application container where the PySpark Plaso build artefacts were deployed (see the build process).

Command-line Interface (CLI)

There are several shell scripts in deployment/scripts to transfer data between a local file-system and a distributed file-system utilised by the distributed environment (HDFS) and to control the extraction process in the environment (those scripts require curl and zip or p7zip).

  • ./client-ls.sh [--url=http://0.0.0.0:5432/] [path-to-list] [another-path ...] -- to list the content of HDFS at a particular path

  • ./client-rm.sh [--url=http://0.0.0.0:5432/] <path-remove> [another-path ...] -- to remove a file or a directory from HDFS

  • ./client-download-file.sh [--url=http://0.0.0.0:5432/] <path-where-to-download> <file-path-to-download> [another-file ...] -- to download a file from HDFS into a local file-system

  • ./client-download-into-zip.sh [--url=http://0.0.0.0:5432/] <path-where-to-download> <file-or-dir-path-to-download> [another-file-or-dir ...] -- to download a file or a directory from HDFS as a ZIP file into a local file-directory

  • ./client-upload-file-dir.sh [--url=http://0.0.0.0:5432/] <path-where-to-upload> <file-or-directory-to-upload> [another-file-or-dir ...] -- to upload a file or a directory from a local file-system into HDFS

  • ./client-upload-zip.sh [--url=http://0.0.0.0:5432/] <path-where-to-upload> <zip-file-to-extract-there> [another-file-or-dir ...] -- to upload the contant of a ZIP file from a local file-system into HDFS

  • ./client-extract.sh [--url=http://0.0.0.0:5432/] [path-to-extract] [another-path ...] -- to run the extraction process on a given path in the HDFS

REST Web API

In default configuration (see the deployment/docker-compose/webapp.yml docker-compose file) the REST Web API is running at http://0.0.0.0:5432/. The following operation are available in the Web API:

  • to list the content of HDFS at a particular path (the path can be empty to list the content of a root directory)

    • GET http://0.0.0.0:5432/ls/[path-to-list]
  • to remove a file or a directory from HDFS (the path is mandatory here)

    • GET http://0.0.0.0:5432/rm/<path-to-remove>
    • DELETE http://0.0.0.0:5432/file/<path-to-remove>
  • xxx

Clone this wiki locally