Skip to content

Latest commit

 

History

History
342 lines (275 loc) · 14.5 KB

component.md

File metadata and controls

342 lines (275 loc) · 14.5 KB

Component Specification

This document specifies Vector Component behavior (source, transforms, and sinks) for the development of Vector.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Introduction

Vector is a highly flexible observability data pipeline due to its directed acyclic graph processing model. Each node in the graph is a Vector Component, and in order to meet our high user experience expectations each Component must adhere to a common set of behavioral rules. This document aims to clearly outline these rules to guide new component development and ongoing maintenance.

Scope

This specification addresses direct component development and does not cover aspects that components inherit "for free". For example, this specification does not cover global context, such as component_id, that all components receive in their telemetry by nature of being a Vector component.

Finally, this document is written from the broad perspective of a Vector component. Unless otherwise stated, a section applies to all component types (sources, transforms, and sinks).

Naming

To align with the logical boundaries of components, component naming MUST follow the following guidelines.

Source and sink naming

  • MUST only contain ASCII alphanumeric, lowercase, and underscores.
  • MUST be a noun named after the protocol or service that the component integrates with.
  • MAY be suffixed with the event type only if the component is specific to that type, logs, metrics, or traces (e.g., kubernetes_logs, apache_metrics).

Transform naming

  • MUST only contain ASCII alphanumeric, lowercase, and underscores.
  • MUST be a verb describing the broad purpose of the transform (e.g., route, sample, delegate).

Configuration

This section extends the Configuration Specification for component specific configuration.

Options

endpoint(s)

When a component makes a connection to a downstream target, it SHOULD expose either an endpoint option that takes a string representing a single endpoint, or an endpoints option that takes an array of strings representing multiple endpoints. If a component uses multiple options to automatically build the endpoint, then the endpoint(s) option MUST override that process.

listen

When a component listens for incoming connections, it SHOULD expose a listen configuration option that takes a string representing an address with <protocol>:<address>.

Options for protocol are:

  • unix+stream, where address should be a file path
  • unix+datagram, where address should be a file path
  • unix, same as unix+stream
  • tcp, where address should be <host>:<port>
  • udp, where address should be <host>:<port>

Components MAY have a default protocol. For example, a statsd component may default the protocol to udp and only require the <host>:<port> to bind to.

Instrumentation

Extends the Instrumentation Specification.

Vector components MUST be instrumented for optimal observability and monitoring.

Events

This section lists all required events that a component MUST emit. Additional events are listed that a component is RECOMMENDED to emit, but remain OPTIONAL. It is expected that components will emit custom events beyond those listed here that reflect component specific behavior. There is leeway in the implementation of these events:

  • Events MAY be augmented with additional component-specific context. For example, the socket source adds a mode attribute as additional context.
  • The naming of the events MAY deviate to satisfy implementation. For example, the socket source may rename the EventReceived event to SocketEventReceived to add additional socket specific context.
  • Components MAY emit events for batches of Vector events for performance reasons, but the resulting telemetry state MUST be equivalent to emitting individual events. For example, emitting the EventsReceived event for 10 events MUST increment the component_received_events_total counter by 10.

ComponentEventsReceived

All components MUST emit a ComponentEventsReceived event that represents the reception of Vector events from an upstream component.

  • Emission
    • MUST emit immediately after creating or receiving Vector events, before modification or metadata is added.
  • Properties
    • count - The count of Vector events.
    • byte_size - The estimated JSON byte size of all events received.
  • Metrics
    • MUST increment the component_received_events_total counter by the defined quantity property with the other properties as metric tags.
    • MUST increment the component_received_event_bytes_total counter by the defined byte_size property with the other properties as metric tags.
  • Logs
    • MUST log a Events received. message at the trace level with the defined properties as key-value pairs.
    • MUST NOT be rate limited.

ComponentBytesReceived

Sources MUST emit a ComponentBytesReceived event that represent the reception of bytes.

  • Emission
    • MUST emit immediately after receiving, decompressing and filtering bytes from the upstream source and before the creation of a Vector event.
  • Properties
    • byte_size
      • For UDP, TCP, and Unix protocols, the total number of bytes received from the socket excluding the delimiter.
      • For HTTP-based protocols, the total number of bytes in the HTTP body, as represented by the Content-Length header.
      • For files, the total number of bytes read from the file excluding the delimiter.
    • protocol - The protocol used to send the bytes (i.e., tcp, udp, unix, http, https, file, etc.).
    • http_path - If relevant, the HTTP path, excluding query strings.
    • socket - If relevant, the socket number that bytes were received from.
  • Metrics
    • MUST increment the component_received_bytes_total counter by the defined value with the defined properties as metric tags.
  • Logs
    • MUST log a Bytes received. message at the trace level with the defined properties as key-value pairs.
    • MUST NOT be rate limited.

ComponentBytesSent

Sinks MUST emit a ComponentBytesSent event that represent the transmission of bytes.

  • Emission
    • MUST emit a ComponentBytesSent event immediately after sending bytes to the downstream target, if the transmission was successful. The reported bytes MUST be before compression.
    • Note that sinks that simply expose data, but don't delete the data after sending it, like the prometheus_exporter sink, SHOULD NOT emit this metric.
  • Properties
    • byte_size
      • For UDP, TCP, and Unix protocols, the total number of bytes placed on the socket excluding the delimiter.
      • For HTTP-based protocols, the total number of bytes in the HTTP body, as represented by the Content-Length header.
      • For files, the total number of bytes written to the file excluding the delimiter.
    • protocol - The protocol used to send the bytes (i.e., tcp, udp, unix, http, https, file, etc.).
    • endpoint - If relevant, the endpoint that the bytes were sent to. For HTTP, this MUST be the host and path only, excluding the query string.
    • file - If relevant, the absolute path of the file.
  • Metrics
    • MUST increment the component_sent_bytes_total counter by the defined value with the defined properties as metric tags.
  • Logs
    • MUST log a Bytes sent. message at the trace level with the defined properties as key-value pairs.
    • MUST NOT be rate limited.

ComponentEventsSent

All components MUST emit an ComponentEventsSent event that represents the emission of Vector events to the next downstream component(s).

  • Emission
    • MUST emit immediately after successful transmission of Vector events. MUST NOT emit if the transmission was unsuccessful.
    • MUST NOT emit for pull-based sinks since they do not send events. For example, the prometheus_exporter sink MUST NOT emit this event.
  • Properties
    • count - The count of Vector events.
    • byte_size - The estimated JSON byte size of all events sent.
    • output - OPTIONAL, for components that can use multiple outputs, the name of the output that events were sent to. For events sent to the default output, this value MUST be _default.
  • Metrics
    • MUST increment the component_sent_events_total counter by the defined quantity property with the other properties as metric tags.
    • MUST increment the component_sent_event_bytes_total counter by the defined byte_size property with the other properties as metric tags.
  • Logs
    • MUST log a Events sent. message at the trace level with the defined properties as key-value pairs.
    • MUST NOT be rate limited.

ComponentError

Extends the Error event.

All components MUST emit error events in accordance with the Error event requirements.

This specification does not list a standard set of errors that components must implement since errors are specific to the component.

ComponentEventsDropped

Extends the EventsDropped event.

All components that can drop events MUST emit a ComponentEventsDropped event in accordance with the EventsDropped event requirements.

SinkNetworkBytesSent

(to be implemented)

Sinks MUST emit a SinkNetworkBytesSent that represents the egress of raw network bytes.

  • Emission
    • MUST emit immediately after egress of raw network bytes regardless of whether the transmission was successful or not.
      • This includes pull-based sinks, such as the prometheus_exporter sink, and SHOULD reflect the bytes sent to the client when requested (pulled).
    • MUST emit after processing of the bytes (encryption, compression, filtering, etc.)
  • Properties
    • byte_size - The number of raw network bytes sent after processing.
      • SHOULD be the closest representation possible of raw network bytes based on the sink's capabilities. For example, if the sink uses an HTTP client that does not provide access to the total request byte size, then the sink should use the byte size of the payload/body.
  • Metrics
    • MUST increment the component_sent_network_bytes_total counter by the defined value with the defined properties as metric tags.
  • Logs
    • MUST log a Network bytes sent. message at the trace level with the defined properties as key-value pairs.
    • MUST NOT be rate limited.

SourceNetworkBytesReceived

(to be implemented)

Sources MUST emit a SourceNetworkBytesReceived event that represents the ingress of raw network bytes.

  • Emission
    • MUST emit immediately after ingress of raw network bytes.
    • MUST emit before processing of the bytes (decryption, decompression, filtering, etc.).
      • This includes pull-based sources that issue requests to ingest bytes.
  • Properties
    • byte_size - The number of raw network bytes received before processing (decryption, decompression, filtering, etc.).
      • SHOULD be the closest representation possible of raw network bytes based on the source's capabilities. For example, if the source uses an HTTP client that only provides access to the request body, then the raw request body bytes should be used.
  • Metrics
    • MUST increment the component_received_network_bytes_total counter by the defined value with the defined properties as metric tags.
  • Logs
    • MUST log a Network bytes received. message at the trace level with the defined properties as key-value pairs.
    • MUST NOT be rate limited.

Sink Operational Requirements

Health checks

All sink components SHOULD define a health check. These checks are executed at boot and as part of vector validate. This health check SHOULD, as closely as possible, emulate the sink's normal operation to give the best possible signal that Vector is configured correctly.

These checks SHOULD NOT query the health of external systems, but MAY fail due to external system being unhealthy. For example, a health check for the aws_s3 sink might fail if AWS is unhealthy, but the check itself should not query for AWS's status.

See the development documentation for more context guidance.

Finalization

All sink components MUST defer finalization of events until after those events have been delivered. This finalization controls when the events are removed from any source disk buffer. To do this, the sink must extract the finalizers from events before they are delivered and ensure they are not dropped until after delivery is completed.

Acknowledgements

Further to the above, all sink components MUST support acknowledgements. This requires both a configuration option named acknowledgements conforming to the AcknowledgementsConfig type, as well as updating the status of all finalizers deferred above after delivery of the events is completed. This update is automatically handled for all sinks that use the newer StreamSink framework. Additionally, unit tests for the sink SHOULD ensure through unit tests that delivered batches have their status updated properly for both normal delivery and delivery errors.