Lambda-Promtail (grafana#2282)

* specs out lambda-promtail * init fn * udpates readme/template * lambda-promtail docs * lambda promtail includes source timestamp and uses context * non markdown links * new doc structure
owen-d · Jul 29, 2020 · 2a596a7 · 2a596a7
1 parent d93b410
commit 2a596a7
Show file tree

Hide file tree

Showing 81 changed files with 4,663 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -23,4 +23,5 @@ dlv
 rootfs/
 dist
 coverage.txt
-.DS_Store
+.DS_Store
+.aws-sam
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,7 @@
+## Unreleased (Master)
+
+* [2282](https://github.com/grafana/loki/pull/2282) **owen-d**: introduces a [lambda-promtail](https://github.com/grafana/loki/blob/master/docs/clients/lambda-promtail/README.md) workflow for shipping Cloudwatch logs to Loki.
+
 ## 1.5.0 (2020-05-20)
 
 It's been a busy month and a half since 1.4.0 was released, and a lot of new improvements have been added to Loki since!

diff --git a/docs/sources/clients/_index.md b/docs/sources/clients/_index.md
@@ -11,6 +11,7 @@ Loki supports the following official clients for sending logs:
 - [Fluentd](fluentd/)
 - [Fluent Bit](fluentbit/)
 - [Logstash](logstash/)
+- [Lambda Promtail](/lambda-promtail/)
 
 ## Picking a client
 
@@ -51,6 +52,12 @@ Prometheus plugin.
 If you are already using logstash and/or beats, this will be the easiest way to start.
 By adding our output plugin you can quickly try Loki without doing big configuration changes.
 
+### Lambda Promtail
+
+This is a workflow combining the promtail push-api [scrape config](./promtail/configuration#loki_push_api_config) and the [lambda-promtail](../../tools/lambda-promtail/) AWS Lambda function which pipes logs from Cloudwatch to Loki.
+
+This is a good choice if you're looking to try out Loki in a low-footprint way or if you wish to monitor AWS lambda logs in Loki.
+
 # Unofficial clients
 
 Please note that the Loki API is not stable yet, so breaking changes might occur

diff --git a/docs/sources/clients/lambda-promtail/_index.md b/docs/sources/clients/lambda-promtail/_index.md
@@ -0,0 +1,84 @@
+# Lambda Promtail
+
+Loki includes an [AWS SAM](https://aws.amazon.com/serverless/sam/) package template for shipping Cloudwatch logs to Loki via a set of promtails [here](../../../tools/lambda-promtail/). This is done via an intermediary [lambda function](https://aws.amazon.com/lambda/) which processes cloudwatch events and propagates them to a promtail instance (or set of instances behind a load balancer) via the push-api [scrape config](docs/clients/promtail/configuration#loki_push_api_config).
+
+## Uses
+
+### Ephemeral Jobs
+
+This workflow is intended to be an effective approach for monitoring ephemeral jobs such as those run on AWS Lambda which are otherwise hard/impossible to monitor via one of the other Loki [clients](../).
+
+Ephemeral jobs can quite easily run afoul of cardinality best practices. During high request load, an AWS lambda function might balloon in concurrency, creating many log streams in Cloudwatch. However, these may only be active for a very short while. This creates a problem for combining these short-lived log streams in Loki because timestamps may not strictly increase across multiple log streams. The other obvious route is creating labels based on log streams, which is also undesirable because it leads to cardinality problems via many low-throughput log streams.
+
+Instead we can pipeline Cloudwatch logs to a set of promtails, which can mitigate these problem in two ways:
+
+1) Using promtail's push api along with the `use_incoming_timestamp: false` config, we let promtail determine the timestamp based on when it ingests the logs, not the timestamp assigned by cloudwatch. Obviously, this means that we lose the origin timestamp because promtail now assigns it, but this is a relatively small difference in a real time ingestion system like this.
+2) In conjunction with (1), promtail can coalesce logs across  Cloudwatch log streams because it's no longer susceptible to `out-of-order` errors when combining multiple sources (lambda invocations).
+
+One important aspect to keep in mind when running with a set of promtails behind a load balancer is that we're effectively moving the cardinality problems from the `number_of_log_streams` -> `number_of_promtails`. You'll need to assign a promtail specific label on each promtail so that you don't run into `out-of-order` errors when the promtails send data for the same log groups to Loki. This can easily be done via a config like `--client.external-labels=promtail=${HOSTNAME}` passed to promtail.
+
+### Proof of concept Loki deployments
+
+For those using Cloudwatch and wishing to test out Loki in a low-risk way, this workflow allows piping Cloudwatch logs to Loki regardless of the event source (EC2, Kubernetes, Lambda, ECS, etc) without setting up a set of promtail daemons across their infrastructure. However, running promtail as a daemon on your infrastructure is the best-practice deployment strategy in the long term for flexibility, reliability, performance, and cost.
+
+Note: Propagating logs from Cloudwatch to Loki means you'll still need to _pay_ for Cloudwatch.
+
+## Propagated Labels
+
+Incoming logs will have three special labels assigned to them which can be used in [relabeling](../promtail#relabel_config) or later stages in a promtail [pipeline](../promtail/pipelines):
+
+- `__aws_cloudwatch_log_group`: The associated Cloudwatch Log Group for this log.
+- `__aws_cloudwatch_log_stream`: The associated Cloudwatch Log Stream for this log.
+- `__aws_cloudwatch_owner`: The AWS ID of the owner of this event.
+
+## Limitations
+
+### Promtail labels
+
+As stated earlier, this workflow moves the worst case stream cardinality from `number_of_log_streams` -> `number_of_log_groups` * `number_of_promtails`. For this reason, each promtail must have a unique label attached to logs it processes (ideally via something like `--client.external-labels=promtail=${HOSTNAME}`) and it's advised to run a small number of promtails behind a load balancer according to your throughput and redundancy needs. 
+
+This trade-off is very effective when you have a large number of log streams but want to aggregate them by the log group. This is very common in AWS Lambda, where log groups are the "application" and log streams are the individual application containers which are spun up and down at a whim, possibly just for a single function invocation.
+
+### Data Persistence
+
+#### Availability
+
+For availability concerns, run a set of promtails behind a load balancer.
+
+#### Batching
+
+Since promtail batches writes to Loki for performance, it's possible that promtail will receive a log, issue a successful `204` http status code for the write, then be killed at a later time before it writes upstream to Loki. This should be rare, but is a downside this workflow has.
+
+### Templating
+
+The current SAM template is rudimentary. If you need to add vpc configs, extra log groups to monitor, subnet declarations, etc, you'll need to edit the template manually. Currently this requires pulling the Loki source.
+
+## Example Promtail Config
+
+Note: this should be run in conjunction with a promtail-specific label attached, ideally via a flag argument like `--client.external-labels=promtail=${HOSTNAME}`. It will receive writes via the push-api on ports `3500` (http) and `3600` (grpc).
+
+```yaml
+server:
+  http_listen_port: 9080
+  grpc_listen_port: 0
+
+positions:
+  filename: /tmp/positions.yaml
+
+clients:
+  - url: http://ip_or_hostname_where_Loki_run:3100/loki/api/v1/push
+
+scrape_configs:
+  - job_name: push1
+    loki_push_api:
+      server:
+        http_listen_port: 3500
+        grpc_listen_port: 3600
+      labels:
+        # Adds a label on all streams indicating it was processed by the lambda-promtail workflow.
+        promtail: 'lambda-promtail'
+      relabel_configs:
+        # Maps the cloudwatch log group into a label called `log_group` for use in Loki.
+        - source_labels: ['__aws_cloudwatch_log_group']
+          target_label: 'log_group'
+```
diff --git a/go.mod b/go.mod
@@ -3,6 +3,7 @@ module github.com/grafana/loki
 go 1.14
 
 require (
+	github.com/aws/aws-lambda-go v1.17.0
 	github.com/blang/semver v3.5.1+incompatible // indirect
 	github.com/bmatcuk/doublestar v1.2.2
 	github.com/c2h5oh/datasize v0.0.0-20200112174442-28bbd4740fee

diff --git a/go.sum b/go.sum
@@ -163,7 +163,10 @@ github.com/asaskevich/govalidator v0.0.0-20190424111038-f61b66f89f4a h1:idn718Q4
 github.com/asaskevich/govalidator v0.0.0-20190424111038-f61b66f89f4a/go.mod h1:lB+ZfQJz7igIIfQNfa7Ml4HSf2uFQQRzpGGRXenZAgY=
 github.com/asaskevich/govalidator v0.0.0-20200108200545-475eaeb16496 h1:zV3ejI06GQ59hwDQAvmK1qxOQGB3WuVTRoY0okPTAv0=
 github.com/asaskevich/govalidator v0.0.0-20200108200545-475eaeb16496/go.mod h1:oGkLhpf+kjZl6xBf758TQhh5XrAeiJv/7FRz/2spLIg=
+github.com/aws/aws-lambda-go v1.13.3 h1:SuCy7H3NLyp+1Mrfp+m80jcbi9KYWAs9/BXwppwRDzY=
 github.com/aws/aws-lambda-go v1.13.3/go.mod h1:4UKl9IzQMoD+QF79YdCuzCwp8VbmG4VAQwij/eHl5CU=
+github.com/aws/aws-lambda-go v1.17.0 h1:Ogihmi8BnpmCNktKAGpNwSiILNNING1MiosnKUfU8m0=
+github.com/aws/aws-lambda-go v1.17.0/go.mod h1:FEwgPLE6+8wcGBTe5cJN3JWurd1Ztm9zN4jsXsjzKKw=
 github.com/aws/aws-sdk-go v1.15.78/go.mod h1:E3/ieXAlvM0XWO57iftYVDLLvQ824smPP3ATZkfNZeM=
 github.com/aws/aws-sdk-go v1.17.7/go.mod h1:KmX6BPdI08NWTb3/sm4ZGu5ShLoqVDhKgpiN924inxo=
 github.com/aws/aws-sdk-go v1.22.4/go.mod h1:KmX6BPdI08NWTb3/sm4ZGu5ShLoqVDhKgpiN924inxo=
@@ -1227,6 +1230,7 @@ github.com/ugorji/go/codec v1.1.7 h1:2SvQaVZ1ouYrrKKwoSk2pzd4A9evlKJb9oTL+OaLUSs
 github.com/ugorji/go/codec v1.1.7/go.mod h1:Ax+UKWsSmolVDwsd+7N3ZtXu+yMGCf907BLYF3GoBXY=
 github.com/urfave/cli v1.20.0/go.mod h1:70zkFmudgCuE/ngEzBv17Jvp/497gISqfk5gWijbERA=
 github.com/urfave/cli v1.22.1/go.mod h1:Gos4lmkARVdJ6EkW0WaNv/tZAAMe9V7XWyB60NtXRu0=
+github.com/urfave/cli/v2 v2.1.1/go.mod h1:SE9GqnLQmjVa0iPEY0f1w3ygNIYcIJ0OKPMoW2caLfQ=
 github.com/vektah/gqlparser v1.1.2/go.mod h1:1ycwN7Ij5njmMkPPAOaRFY4rET2Enx7IkVv3vaXspKw=
 github.com/weaveworks/common v0.0.0-20200206153930-760e36ae819a/go.mod h1:6enWAqfQBFrE8X/XdJwZr8IKgh1chStuFR0mjU/UOUw=
 github.com/weaveworks/common v0.0.0-20200625145055-4b1847531bc9 h1:dNVIG9aKQHR9T4uYAC4YxmkHHryOsfTwsL54WrS7u28=

diff --git a/tools/lambda-promtail/Makefile b/tools/lambda-promtail/Makefile
@@ -0,0 +1,15 @@
+UNAME_S := $(shell uname -s)
+LOCAL_PORT ?= 8080
+ifeq ($(UNAME_S),Linux)
+	LOCAL_ENDPOINT=http://localhost:$(LOCAL_PORT)/loki/api/v1/push
+else
+	LOCAL_ENDPOINT=http://host.docker.internal:$(LOCAL_PORT)/loki/api/v1/push
+endif
+
+.PHONY: build
+
+build:
+	sam build
+
+dry-run:
+	echo $$(sam local generate-event cloudwatch logs) | sam local invoke LambdaPromtailFunction -e - --parameter-overrides PromtailAddress=$(LOCAL_ENDPOINT)
diff --git a/tools/lambda-promtail/README.md b/tools/lambda-promtail/README.md
@@ -0,0 +1,120 @@
+# lambda-promtail
+
+This is a sample template for lambda-promtail - Below is a brief explanation of what we have generated for you:
+
+```bash
+.
+├── Makefile                    <-- Make to automate build
+├── README.md                   <-- This instructions file
+├── hello-world                 <-- Source code for a lambda function
+│   └── main.go                 <-- Lambda function code
+└── template.yaml
+```
+
+## Requirements
+
+* AWS CLI already configured with Administrator permission
+* [Docker installed](https://www.docker.com/community-edition)
+* [Golang](https://golang.org)
+* SAM CLI - [Install the SAM CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html)
+
+## Setup process
+
+### Installing dependencies & building the target 
+
+In this example we use the built-in `sam build` to automatically download all the dependencies and package our build target.   
+Read more about [SAM Build here](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-cli-command-reference-sam-build.html) 
+
+The `sam build` command is wrapped inside of the `Makefile`. To execute this simply run
+
+```shell
+make
+```
+
+### Local development
+
+**Invoking function locally
+
+```bash
+make dry-run
+```
+
+## Packaging and deployment
+
+AWS Lambda Golang runtime requires a flat folder with the executable generated on build step. SAM will use `CodeUri` property to know where to look up for the application:
+
+```bash
+make build
+```
+
+To deploy your application for the first time, first make sure you've set the following parameters in the template:
+- `LogGroup`
+- `PromtailAddress`
+- `ReservedConcurrency`
+
+These can also be set via overrides by passing the following argument to `sam deploy`:
+```
+  --parameter-overrides           Optional. A string that contains AWS
+                                  CloudFormation parameter overrides encoded
+                                  as key=value pairs.For example, 'ParameterKe
+                                  y=KeyPairName,ParameterValue=MyKey Parameter
+                                  Key=InstanceType,ParameterValue=t1.micro' or
+                                  KeyPairName=MyKey InstanceType=t1.micro
+```
+
+Also, if your deployment requires a VPC configuration, make sure to edit the `VpcConfig` field in the `template.yaml` manually.
+
+Then run the following in your shell:
+
+```bash
+sam deploy --guided --capabilities CAPABILITY_IAM,CAPABILITY_NAMED_IAM --parameter-overrides PromtailAddress=<>,LogGroup=<>
+```
+
+The command will package and deploy your application to AWS, with a series of prompts:
+
+* **Stack Name**: The name of the stack to deploy to CloudFormation. This should be unique to your account and region, and a good starting point would be something matching your project name.
+* **AWS Region**: The AWS region you want to deploy your app to.
+* **Confirm changes before deploy**: If set to yes, any change sets will be shown to you before execution for manual review. If set to no, the AWS SAM CLI will automatically deploy application changes.
+* **Allow SAM CLI IAM role creation**: Many AWS SAM templates, including this example, create AWS IAM roles required for the AWS Lambda function(s) included to access AWS services. By default, these are scoped down to minimum required permissions. To deploy an AWS CloudFormation stack which creates or modified IAM roles, the `CAPABILITY_IAM` value for `capabilities` must be provided. If permission isn't provided through this prompt, to deploy this example you must explicitly pass `--capabilities CAPABILITY_IAM` to the `sam deploy` command.
+* **Save arguments to samconfig.toml**: If set to yes, your choices will be saved to a configuration file inside the project, so that in the future you can just re-run `sam deploy` without parameters to deploy changes to your application.
+
+# Appendix
+
+### Golang installation
+
+Please ensure Go 1.x (where 'x' is the latest version) is installed as per the instructions on the official golang website: https://golang.org/doc/install
+
+A quickstart way would be to use Homebrew, chocolatey or your linux package manager.
+
+#### Homebrew (Mac)
+
+Issue the following command from the terminal:
+
+```shell
+brew install golang
+```
+
+If it's already installed, run the following command to ensure it's the latest version:
+
+```shell
+brew update
+brew upgrade golang
+```
+
+#### Chocolatey (Windows)
+
+Issue the following command from the powershell:
+
+```shell
+choco install golang
+```
+
+If it's already installed, run the following command to ensure it's the latest version:
+
+```shell
+choco upgrade golang
+```
+
+## Limitations
+- Error handling: If promtail is unresponsive, `lambda-promtail` will drop logs after `retry_count`, which defaults to 2.
+- AWS does not support passing log lines over 256kb to lambdas.
diff --git a/tools/lambda-promtail/lambda-promtail/main.go b/tools/lambda-promtail/lambda-promtail/main.go
@@ -0,0 +1,102 @@
+package main
+
+import (
+	"bufio"
+	"bytes"
+	"context"
+	"errors"
+	"fmt"
+	"io"
+	"net/http"
+	"net/url"
+	"os"
+
+	"github.com/aws/aws-lambda-go/events"
+	"github.com/aws/aws-lambda-go/lambda"
+	"github.com/cortexproject/cortex/pkg/util"
+	"github.com/gogo/protobuf/proto"
+	"github.com/golang/snappy"
+	"github.com/grafana/loki/pkg/logproto"
+	"github.com/prometheus/common/model"
+)
+
+const (
+	// We use snappy-encoded protobufs over http by default.
+	contentType = "application/x-protobuf"
+
+	maxErrMsgLen = 1024
+)
+
+var promtailAddress *url.URL
+
+func init() {
+	addr := os.Getenv("PROMTAIL_ADDRESS")
+	if addr == "" {
+		panic(errors.New("required environmental variable PROMTAIL_ADDRESS not present"))
+	}
+	var err error
+	promtailAddress, err = url.Parse(addr)
+	if err != nil {
+		panic(err)
+	}
+}
+
+func handler(ctx context.Context, ev events.CloudwatchLogsEvent) error {
+
+	data, err := ev.AWSLogs.Parse()
+	if err != nil {
+		return err
+	}
+
+	stream := logproto.Stream{
+		Labels: model.LabelSet{
+			model.LabelName("__aws_cloudwatch_log_group"):  model.LabelValue(data.LogGroup),
+			model.LabelName("__aws_cloudwatch_log_stream"): model.LabelValue(data.LogStream),
+			model.LabelName("__aws_cloudwatch_owner"):      model.LabelValue(data.Owner),
+		}.String(),
+		Entries: make([]logproto.Entry, 0, len(data.LogEvents)),
+	}
+
+	for _, entry := range data.LogEvents {
+		stream.Entries = append(stream.Entries, logproto.Entry{
+			Line: entry.Message,
+			// It's best practice to ignore timestamps from cloudwatch as promtail is responsible for adding those.
+			Timestamp: util.TimeFromMillis(entry.Timestamp),
+		})
+	}
+
+	buf, err := proto.Marshal(&logproto.PushRequest{
+		Streams: []logproto.Stream{stream},
+	})
+	if err != nil {
+		return err
+	}
+
+	// Push to promtail
+	buf = snappy.Encode(nil, buf)
+	req, err := http.NewRequest("POST", promtailAddress.String(), bytes.NewReader(buf))
+	if err != nil {
+		return err
+	}
+	req.Header.Set("Content-Type", contentType)
+
+	resp, err := http.DefaultClient.Do(req.WithContext(ctx))
+	if err != nil {
+		return err
+	}
+
+	if resp.StatusCode/100 != 2 {
+		scanner := bufio.NewScanner(io.LimitReader(resp.Body, maxErrMsgLen))
+		line := ""
+		if scanner.Scan() {
+			line = scanner.Text()
+		}
+		err = fmt.Errorf("server returned HTTP status %s (%d): %s", resp.Status, resp.StatusCode, line)
+	}
+
+	return err
+}
+
+func main() {
+	lambda.Start(handler)
+}