From 230d5186027df96a50c6548f9cef06baf6081b81 Mon Sep 17 00:00:00 2001 From: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com> Date: Fri, 24 Jan 2025 01:32:20 -0500 Subject: [PATCH] docs: Add guides for using clp-json with object storage; Update compression scripts docs missed in previous PRs. (#683) Co-authored-by: Haiqi Xu <14502009+haiqi96@users.noreply.github.com> --- docs/src/user-guide/guides-overview.md | 14 ++ .../guides-using-object-storage/clp-config.md | 78 +++++++++ .../guides-using-object-storage/clp-usage.md | 52 ++++++ .../guides-using-object-storage/index.md | 95 +++++++++++ .../object-storage-config.md | 152 ++++++++++++++++++ docs/src/user-guide/index.md | 16 ++ .../quick-start-compression/json.md | 8 +- .../quick-start-compression/text.md | 2 +- 8 files changed, 415 insertions(+), 2 deletions(-) create mode 100644 docs/src/user-guide/guides-overview.md create mode 100644 docs/src/user-guide/guides-using-object-storage/clp-config.md create mode 100644 docs/src/user-guide/guides-using-object-storage/clp-usage.md create mode 100644 docs/src/user-guide/guides-using-object-storage/index.md create mode 100644 docs/src/user-guide/guides-using-object-storage/object-storage-config.md diff --git a/docs/src/user-guide/guides-overview.md b/docs/src/user-guide/guides-overview.md new file mode 100644 index 000000000..5e8179bf7 --- /dev/null +++ b/docs/src/user-guide/guides-overview.md @@ -0,0 +1,14 @@ +# Overview + +The guides below describe how to use CLP in different use cases. + +::::{grid} 1 1 2 2 +:gutter: 2 + +:::{grid-item-card} +:link: guides-using-object-storage/index +Using object storage +^^^ +Using CLP to ingest logs from object storage and store archives on object storage. +::: +:::: diff --git a/docs/src/user-guide/guides-using-object-storage/clp-config.md b/docs/src/user-guide/guides-using-object-storage/clp-config.md new file mode 100644 index 000000000..02e3b9360 --- /dev/null +++ b/docs/src/user-guide/guides-using-object-storage/clp-config.md @@ -0,0 +1,78 @@ +# Configuring CLP + +To use object storage with CLP, follow the steps below to configure each use case you require. + +:::{note} +If CLP is already running, shut it down, update its configuration, and then start it again. +::: + +## Configuration for archive storage + +To configure CLP to store archives on S3, update the `archive_output.storage` key in +`/etc/clp-config.yml` with the values in the code block below, replacing the fields in +angle brackets (`<>`) with the appropriate values: + +```yaml +archive_output: + storage: + type: "s3" + staging_directory: "var/data/staged-archives" # Or a path of your choosing + s3_config: + region_code: "" + bucket: "" + key_prefix: "" + credentials: + access_key_id: "" + secret_access_key: "" + + # archive_output's other config keys +``` + +* `staging_directory` is the local filesystem directory where archives will be temporarily stored + before being uploaded to S3. +* `s3_config` configures both the S3 bucket where archives should be stored and the credentials + for accessing it. + * `` is the AWS region [code][aws-region-codes] for the bucket. + * `` is the bucket's name. + * `` is the "directory" where all archives will be stored within the bucket and + must end with a trailing forward slash (e.g., `archives/`). + * `credentials` contains the CLP IAM user's credentials. + +## Configuration for stream storage + +To configure CLP to cache stream files on S3, update the `stream_output.storage` key in +`/etc/clp-config.yml` with the values in the code block below, replacing the fields in +angle brackets (`<>`) with the appropriate values: + +```yaml +stream_output: + storage: + type: "s3" + staging_directory: "var/data/staged-streams" # Or a path of your choosing + s3_config: + region_code: "" + bucket: "" + key_prefix: "" + credentials: + access_key_id: "" + secret_access_key: "" + + # stream_output's other config keys +``` + +* `staging_directory` is the local filesystem directory where streams will be temporarily stored + before being uploaded to S3. +* `s3_config` configures both the S3 bucket where streams should be stored and the credentials + for accessing it. + * `` is the AWS region [code][aws-region-codes] for the bucket. + * `` is the bucket's name. + * `` is the "directory" where all streams will be stored within the bucket and + must end with a trailing forward slash (e.g., `streams/`). + * `credentials` contains the CLP IAM user's credentials. + +:::{note} +CLP currently doesn't explicitly delete the cached streams. This limitation will be addressed in a +future release. +::: + +[aws-region-codes]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Availability diff --git a/docs/src/user-guide/guides-using-object-storage/clp-usage.md b/docs/src/user-guide/guides-using-object-storage/clp-usage.md new file mode 100644 index 000000000..6fab2db44 --- /dev/null +++ b/docs/src/user-guide/guides-using-object-storage/clp-usage.md @@ -0,0 +1,52 @@ +# Using CLP with object storage + +To compress logs from S3, follow the steps in the section below. For all other operations, you +should be able to use CLP as described in the [quick start](../quick-start-overview.md) guide. + +## Compressing logs from S3 + +To compress logs from S3, use the `s3` subcommand as follows, replacing the fields in angle brackets +(`<>`) with the appropriate values: + +```bash +sbin/compress.sh \ + s3 \ + --aws-credentials-file \ + --timestamp-key \ + https://.s3..amazonaws.com/ +``` + +* `` is the path to an AWS credentials file like the following: + + ```ini + [default] + aws_access_key_id = + aws_secret_access_key = + ``` + + * CLP expects the credentials to be in the `default` section. + * `` and `` are the access key ID and secret access + key of the CLP IAM user. + * If you don't want to use a credentials file, you can specify the credentials on the command + line using the `--aws-access-key-id` and `--aws-secret-access-key` flags (note that this may + expose your credentials to other users running on the system). + +* `` is the field path of the kv-pair that contains the timestamp in each log event. +* `` is the name of the S3 bucket containing your logs. +* `` is the AWS region [code][aws-region-codes] for the S3 bucket containing your logs. +* `` is the prefix of all logs you wish to compress and must begin with the + `` value from the [compression IAM policy][compression-iam-policy]. + +:::{note} +The `s3` subcommand only supports a single URL but will compress any logs that have the given +prefix. + +If you wish to compress a single log file, specify the entire path to the log file. However, if that +log file's path is a prefix of another log file's path, then both log files will be compressed +(e.g., with two files "logs/syslog" and "logs/syslog.1", a prefix like "logs/syslog" will cause +both logs to be compressed). This limitation will be addressed in a future release. +::: + +[add-iam-policy]: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#embed-inline-policy-console +[aws-region-codes]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Availability +[compression-iam-policy]: ./object-storage-config.md#configuration-for-compression \ No newline at end of file diff --git a/docs/src/user-guide/guides-using-object-storage/index.md b/docs/src/user-guide/guides-using-object-storage/index.md new file mode 100644 index 000000000..d3f0aa536 --- /dev/null +++ b/docs/src/user-guide/guides-using-object-storage/index.md @@ -0,0 +1,95 @@ +# Using object storage + +CLP can: + +* compress logs from object storage (e.g., S3); +* store archives on object storage; and +* cache stream files (used for viewing compressed logs) on object storage. + +This guide explains how to configure and use CLP for all three use cases. Note that you can choose +to use object storage for any combination of the three use cases (e.g., compress logs from S3 and +cache the stream files on S3, but store archives on the local filesystem). + +:::{note} +Currently, only the [clp-json][release-choices] release supports object storage. Support for +`clp-text` will be added in a future release. +::: + +:::{note} +Currently, CLP only supports using S3 as object storage. Support for other object storage services +will be added in a future release. +::: + +## Prerequisites + +1. This guide assumes you're able to configure, start, stop, and use a CLP cluster as described in + the [quick-start guide](../quick-start-overview.md). +2. An S3 bucket and [key prefix][aws-key-prefixes] containing the logs you wish to compress. +3. An S3 bucket and key prefix where you wish to store compressed archives. +4. An S3 bucket and key prefix where you wish to cache stream files. +5. An AWS IAM user with the necessary permissions to access the S3 bucket(s) and prefixes mentioned + above. + * To create a user, follow [this guide][aws-create-iam-user]. + * You don't need to assign any groups or policies to the user at this stage since we will + attach policies in later steps, depending on which object storage use cases you require. + * You may use a single IAM user for all use cases, or a separate one for each. + * For brevity, we'll refer to this user as the "CLP IAM user" in the rest of this guide. +6. IAM user (long-term) credentials for the IAM user(s) created in step (4) above. + * To create these credentials, follow [this guide][aws-create-access-keys]. + * Choose the "Other" use case to generate long-term credentials. + + :::{note} + CLP currently requires IAM user (long-term) credentials to access the relevant S3 buckets. + Support for other authentication methods (e.g., temporary credentials) will be added in a future + release. + ::: + +## Configuration + +The subsections below explain how to configure your object storage bucket and CLP for each use case: + +::::{grid} 1 1 1 1 +:gutter: 2 + +:::{grid-item-card} +:link: object-storage-config +Configuring object storage +^^^ +Configuring your object storage bucket for each use case. +::: + +:::{grid-item-card} +:link: clp-config +Configuring CLP +^^^ +Configuring CLP to use object storage for each use case. +::: +:::: + +## Using CLP with object storage + +The subsection below explains how to use CLP with object storage for each use case: + +::::{grid} 1 1 1 1 +:gutter: 2 + +:::{grid-item-card} +:link: clp-usage +Using CLP with object storage +^^^ +Using CLP to compress, search, and view log files from object storage. +::: +:::: + +:::{toctree} +:hidden: + +object-storage-config +clp-config +clp-usage +::: + +[aws-create-access-keys]: https://docs.aws.amazon.com/keyspaces/latest/devguide/create.keypair.html +[aws-create-iam-user]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html +[aws-key-prefixes]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-prefixes.html +[release-choices]: ../quick-start-cluster-setup/index.md#choosing-a-release diff --git a/docs/src/user-guide/guides-using-object-storage/object-storage-config.md b/docs/src/user-guide/guides-using-object-storage/object-storage-config.md new file mode 100644 index 000000000..9cce64b99 --- /dev/null +++ b/docs/src/user-guide/guides-using-object-storage/object-storage-config.md @@ -0,0 +1,152 @@ +# Configuring object storage + +To use object storage with CLP, follow the steps below to configure the CLP IAM user and your object +storage bucket(s) for each use case you require. + +## Configuration for compression + +[Attach the inline policy][add-iam-policy] below to the CLP IAM user (you can use the JSON editor), +replacing the fields in angle brackets (`<>`) with the appropriate values: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": "s3:GetObject", + "Resource": [ + "arn:aws:s3:::/*" + ] + }, + { + "Effect": "Allow", + "Action": "s3:ListBucket", + "Resource": [ + "arn:aws:s3:::" + ], + "Condition": { + "StringLike": { + "s3:prefix": "*" + } + } + } + ] +} +``` + +* `` should be the name of the S3 bucket containing your logs. +* `` should be the prefix of all logs you wish to compress. + + :::{note} + If you want to enforce that only logs under a directory-like prefix, e.g., `logs/`, can be + compressed, you can append a trailing slash (`/`) after the `` value. This will + prevent CLP from compressing logs with prefixes like `logs-private`. However, note that to + compress all logs under the `logs/` prefix, you will need to include the trailing slash when + invoking `sbin/compress.sh` below. + ::: + +## Configuration for archive storage + +[Attach the inline policy][add-iam-policy] below to the CLP IAM user (you can use the JSON editor), +replacing the fields in angle brackets (`<>`) with the appropriate values: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "s3:GetObject", + "s3:PutObject" + ], + "Resource": [ + "arn:aws:s3::://*" + ] + } + ] +} +``` + +* `` should be the name of the S3 bucket where compressed archives should be stored. +* `` should be the prefix (used like a directory path) where compressed archives should + be stored. + +## Configuration for stream storage + +The [log viewer][yscope-log-viewer] currently supports viewing [IR][uber-clp-blog-1] and JSONL +stream files but not CLP archives; thus, to view the compressed logs from a CLP archive, CLP first +converts the compressed logs into stream files. These streams can be cached on the filesystem, or on +object storage. + +:::{note} +A future version of the log viewer will support viewing CLP archives directly. +::: + +Storing streams on S3 requires both configuring the CLP IAM user and setting up a cross-origin +resource sharing (CORS) policy for the S3 bucket. + +### IAM user configuration + +[Attach the inline policy][add-iam-policy] below to the CLP IAM user (you can use the JSON editor), +replacing the fields in angle brackets (`<>`) with the appropriate values: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "s3:GetObject", + "s3:PutObject" + ], + "Resource": [ + "arn:aws:s3::://*" + ] + } + ] +} +``` + +* `` should be the name of the S3 bucket where cached streams should be stored. +* `` should be the prefix (used like a directory path) where cached streams should be + stored. + +### Cross-origin resource sharing (CORS) configuration + +For CLP's log viewer to be able to access the cached stream files from S3 over the internet, the S3 +bucket must have a CORS policy configured. + +Add the CORS configuration below to your bucket by following [this guide][aws-cors-guide]: + +```json +[ + { + "AllowedHeaders": [ + "*" + ], + "AllowedMethods": [ + "GET" + ], + "AllowedOrigins": [ + "*" + ], + "ExposeHeaders": [ + "Access-Control-Allow-Origin" + ] + } +] +``` + +:::{tip} +The CORS policy above allows requests from any host (origin). If you already know what hosts will +access CLP's web interface, you can enhance security by changing `AllowedOrigins` from `["*"]` to +the specific list of hosts that will access the web interface. +::: + +[aws-cors-guide]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabling-cors-examples.html +[add-iam-policy]: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#embed-inline-policy-console +[uber-clp-blog-1]: https://www.uber.com/en-US/blog/reducing-logging-cost-by-two-orders-of-magnitude-using-clp +[yscope-log-viewer]: https://github.com/y-scope/yscope-log-viewer diff --git a/docs/src/user-guide/index.md b/docs/src/user-guide/index.md index a38b606b3..642eac4a6 100644 --- a/docs/src/user-guide/index.md +++ b/docs/src/user-guide/index.md @@ -15,6 +15,13 @@ Quick start A quick start guide for setting up a CLP cluster, compressing your logs, and searching them. ::: +:::{grid-item-card} +:link: guides-overview +Guides +^^^ +Guides for using CLP in a variety of use cases. +::: + :::{grid-item-card} :link: core-overview Core @@ -47,6 +54,15 @@ quick-start-compression/index quick-start-search/index ::: +:::{toctree} +:hidden: +:caption: Guides +:glob: + +guides-overview +guides-using-object-storage/index +::: + :::{toctree} :hidden: :caption: Core diff --git a/docs/src/user-guide/quick-start-compression/json.md b/docs/src/user-guide/quick-start-compression/json.md index 430363f83..6091762a8 100644 --- a/docs/src/user-guide/quick-start-compression/json.md +++ b/docs/src/user-guide/quick-start-compression/json.md @@ -3,9 +3,10 @@ To compress JSON logs, from inside the package directory, run: ```bash -sbin/compress.sh --timestamp-key '' [ ...] +sbin/compress.sh fs --timestamp-key '' [ ...] ``` +* `fs` is a subcommand for compressing logs from the filesystem. * `` is the field path of the kv-pair that contains the timestamp in each log event. * E.g., if your log events look like `{"timestamp": {"iso8601": "2024-01-01 00:01:02.345", ...}}`, you should enter @@ -21,6 +22,11 @@ sbin/compress.sh --timestamp-key '' [ ...] * Each JSON log file should contain each log event as a [separate JSON object][json-log-format], i.e., _not_ as an array. +:::{tip} +To compress logs from object storage, see +[Using object storage](../guides-using-object-storage/index). +::: + # Sample logs For some sample logs, check out the open-source [datasets](../resources-datasets.md). diff --git a/docs/src/user-guide/quick-start-compression/text.md b/docs/src/user-guide/quick-start-compression/text.md index 29e798b9d..18179a65b 100644 --- a/docs/src/user-guide/quick-start-compression/text.md +++ b/docs/src/user-guide/quick-start-compression/text.md @@ -3,7 +3,7 @@ To compress unstructured text logs, from inside the package directory, run: ```bash -sbin/compress.sh [ ...] +sbin/compress.sh fs [ ...] ``` `` are paths to unstructured text log files or directories containing such files.