Skip to content

Commit 11b598a

Browse files
authored
Add reroute processor (#76511)
1 parent f13f77b commit 11b598a

File tree

9 files changed

+905
-0
lines changed

9 files changed

+905
-0
lines changed

docs/changelog/76511.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 76511
2+
summary: Add `reroute` processor
3+
area: Ingest Node
4+
type: enhancement
5+
issues: []

docs/reference/ingest/processors.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ include::processors/redact.asciidoc[]
6464
include::processors/registered-domain.asciidoc[]
6565
include::processors/remove.asciidoc[]
6666
include::processors/rename.asciidoc[]
67+
include::processors/reroute.asciidoc[]
6768
include::processors/script.asciidoc[]
6869
include::processors/set.asciidoc[]
6970
include::processors/set-security-user.asciidoc[]
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
[[reroute-processor]]
2+
=== Reroute processor
3+
++++
4+
<titleabbrev>Reroute</titleabbrev>
5+
++++
6+
7+
experimental::[]
8+
9+
The `reroute` processor allows to route a document to another target index or data stream.
10+
It has two main modes:
11+
12+
When setting the `destination` option, the target is explicitly specified and the `dataset` and `namespace` options can't be set.
13+
14+
When the `destination` option is not set, this processor is in a data stream mode.
15+
Note that in this mode, the `reroute` processor can only be used on data streams that follow the {fleet-guide}/data-streams.html#data-streams-naming-scheme[data stream naming scheme].
16+
Trying to use this processor on a data stream with a non-compliant name will raise an exception.
17+
18+
The name of a data stream consists of three parts: `<type>-<dataset>-<namespace>`.
19+
See the {fleet-guide}/data-streams.html#data-streams-naming-scheme[data stream naming scheme] documentation for more details.
20+
21+
This processor can use both static values or reference fields from the document to determine the `dataset` and `namespace` components of the new target.
22+
See <<reroute-options>> for more details.
23+
24+
NOTE: It's not possible to change the `type` of the data stream with the `reroute` processor.
25+
26+
After a `reroute` processor has been executed, all the other processors of the current pipeline are skipped, including the final pipeline.
27+
If the current pipeline is executed in the context of a <<pipeline-processor>>, the calling pipeline will be skipped, too.
28+
This means that at most one `reroute` processor is ever executed within a pipeline,
29+
allowing to define mutually exclusive routing conditions,
30+
similar to a if, else-if, else-if, … condition.
31+
32+
The reroute processor ensures that the `data_stream.<type|dataset|namespace>` fields are set according to the new target.
33+
If the document contains a `event.dataset` value, it will be updated to reflect the same value as `data_stream.dataset`.
34+
35+
Note that the client needs to have permissions to the final target.
36+
Otherwise, the document will be rejected with a security exception which looks like this:
37+
38+
[source,js]
39+
--------------------------------------------------
40+
{"type":"security_exception","reason":"action [indices:admin/auto_create] is unauthorized for API key id [8-dt9H8BqGblnY2uSI--] of user [elastic/fleet-server] on indices [logs-foo-default], this action is granted by the index privileges [auto_configure,create_index,manage,all]"}
41+
--------------------------------------------------
42+
// NOTCONSOLE
43+
44+
[[reroute-options]]
45+
.Reroute options
46+
[options="header"]
47+
|======
48+
| Name | Required | Default | Description
49+
| `destination` | no | - | A static value for the target. Can't be set when the `dataset` or `namespace` option is set.
50+
| `dataset` | no | `{{data_stream.dataset}}` a| Field references or a static value for the dataset part of the data stream name. In addition to the criteria for <<indices-create-api-path-params, index names>>, cannot contain `-` and must be no longer than 100 characters. Example values are `nginx.access` and `nginx.error`.
51+
52+
Supports field references with a mustache-like syntax (denoted as `{{double}}` or `{{{triple}}}` curly braces). When resolving field references, the processor replaces invalid characters with `_`. Uses the `<dataset>` part of the index name as a fallback if all field references resolve to a `null`, missing, or non-string value.
53+
| `namespace` | no | `{{data_stream.namespace}}` a| Field references or a static value for the namespace part of the data stream name. See the criteria for <<indices-create-api-path-params, index names>> for allowed characters. Must be no longer than 100 characters.
54+
55+
Supports field references with a mustache-like syntax (denoted as `{{double}}` or `{{{triple}}}` curly braces). When resolving field references, the processor replaces invalid characters with `_`. Uses the `<namespace>` part of the index name as a fallback if all field references resolve to a `null`, missing, or non-string value.
56+
include::common-options.asciidoc[]
57+
|======
58+
59+
The `if` option can be used to define the condition in which the document should be rerouted to a new target.
60+
61+
[source,js]
62+
--------------------------------------------------
63+
{
64+
"reroute": {
65+
"tag": "nginx",
66+
"if" : "ctx?.log?.file?.path?.contains('nginx')",
67+
"dataset": "nginx"
68+
}
69+
}
70+
--------------------------------------------------
71+
// NOTCONSOLE
72+
73+
The dataset and namespace options can contain either a single value or a list of values that are used as a fallback.
74+
If a field reference evaluates to `null`, is not present in the document, the next value or field reference is used.
75+
If a field reference evaluates to a non-`String` value, the processor fails.
76+
77+
In the following example, the processor would first try to resolve the value for the `service.name` field to determine the value for `dataset`.
78+
If that field resolves to `null`, is missing, or is a non-string value, it would try the next element in the list.
79+
In this case, this is the static value `"generic`".
80+
The `namespace` option is configured with just a single static value.
81+
82+
[source,js]
83+
--------------------------------------------------
84+
{
85+
"reroute": {
86+
"dataset": [
87+
"{{service.name}}",
88+
"generic"
89+
],
90+
"namespace": "default"
91+
}
92+
}
93+
--------------------------------------------------
94+
// NOTCONSOLE

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/IngestCommonPlugin.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ public Map<String, Processor.Factory> getProcessors(Processor.Parameters paramet
7979
entry(RegisteredDomainProcessor.TYPE, new RegisteredDomainProcessor.Factory()),
8080
entry(RemoveProcessor.TYPE, new RemoveProcessor.Factory(parameters.scriptService)),
8181
entry(RenameProcessor.TYPE, new RenameProcessor.Factory(parameters.scriptService)),
82+
entry(RerouteProcessor.TYPE, new RerouteProcessor.Factory()),
8283
entry(ScriptProcessor.TYPE, new ScriptProcessor.Factory(parameters.scriptService)),
8384
entry(SetProcessor.TYPE, new SetProcessor.Factory(parameters.scriptService)),
8485
entry(SortProcessor.TYPE, new SortProcessor.Factory()),
Lines changed: 262 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,262 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0 and the Server Side Public License, v 1; you may not use this file except
5+
* in compliance with, at your election, the Elastic License 2.0 or the Server
6+
* Side Public License, v 1.
7+
*/
8+
9+
package org.elasticsearch.ingest.common;
10+
11+
import org.elasticsearch.core.Nullable;
12+
import org.elasticsearch.ingest.AbstractProcessor;
13+
import org.elasticsearch.ingest.ConfigurationUtils;
14+
import org.elasticsearch.ingest.IngestDocument;
15+
import org.elasticsearch.ingest.Processor;
16+
17+
import java.util.List;
18+
import java.util.Locale;
19+
import java.util.Map;
20+
import java.util.Objects;
21+
import java.util.function.Function;
22+
import java.util.regex.Pattern;
23+
24+
import static org.elasticsearch.core.Strings.format;
25+
import static org.elasticsearch.ingest.ConfigurationUtils.newConfigurationException;
26+
import static org.elasticsearch.ingest.common.RerouteProcessor.DataStreamValueSource.DATASET_VALUE_SOURCE;
27+
import static org.elasticsearch.ingest.common.RerouteProcessor.DataStreamValueSource.NAMESPACE_VALUE_SOURCE;
28+
29+
public final class RerouteProcessor extends AbstractProcessor {
30+
31+
public static final String TYPE = "reroute";
32+
33+
private static final String NAMING_SCHEME_ERROR_MESSAGE =
34+
"invalid data stream name: [%s]; must follow naming scheme <type>-<dataset>-<namespace>";
35+
36+
private static final String DATA_STREAM_PREFIX = "data_stream.";
37+
private static final String DATA_STREAM_TYPE = DATA_STREAM_PREFIX + "type";
38+
private static final String DATA_STREAM_DATASET = DATA_STREAM_PREFIX + "dataset";
39+
private static final String DATA_STREAM_NAMESPACE = DATA_STREAM_PREFIX + "namespace";
40+
private static final String EVENT_DATASET = "event.dataset";
41+
private final List<DataStreamValueSource> dataset;
42+
private final List<DataStreamValueSource> namespace;
43+
private final String destination;
44+
45+
RerouteProcessor(
46+
String tag,
47+
String description,
48+
List<DataStreamValueSource> dataset,
49+
List<DataStreamValueSource> namespace,
50+
String destination
51+
) {
52+
super(tag, description);
53+
if (dataset.isEmpty()) {
54+
this.dataset = List.of(DATASET_VALUE_SOURCE);
55+
} else {
56+
this.dataset = dataset;
57+
}
58+
if (namespace.isEmpty()) {
59+
this.namespace = List.of(NAMESPACE_VALUE_SOURCE);
60+
} else {
61+
this.namespace = namespace;
62+
}
63+
this.destination = destination;
64+
}
65+
66+
@Override
67+
public IngestDocument execute(IngestDocument ingestDocument) throws Exception {
68+
if (destination != null) {
69+
ingestDocument.reroute(destination);
70+
return ingestDocument;
71+
}
72+
final String indexName = ingestDocument.getFieldValue(IngestDocument.Metadata.INDEX.getFieldName(), String.class);
73+
final String type;
74+
final String currentDataset;
75+
final String currentNamespace;
76+
77+
// parse out the <type>-<dataset>-<namespace> components from _index
78+
int indexOfFirstDash = indexName.indexOf('-');
79+
if (indexOfFirstDash < 0) {
80+
throw new IllegalArgumentException(format(NAMING_SCHEME_ERROR_MESSAGE, indexName));
81+
}
82+
int indexOfSecondDash = indexName.indexOf('-', indexOfFirstDash + 1);
83+
if (indexOfSecondDash < 0) {
84+
throw new IllegalArgumentException(format(NAMING_SCHEME_ERROR_MESSAGE, indexName));
85+
}
86+
type = parseDataStreamType(indexName, indexOfFirstDash);
87+
currentDataset = parseDataStreamDataset(indexName, indexOfFirstDash, indexOfSecondDash);
88+
currentNamespace = parseDataStreamNamespace(indexName, indexOfSecondDash);
89+
90+
String dataset = determineDataStreamField(ingestDocument, this.dataset, currentDataset);
91+
String namespace = determineDataStreamField(ingestDocument, this.namespace, currentNamespace);
92+
String newTarget = type + "-" + dataset + "-" + namespace;
93+
ingestDocument.reroute(newTarget);
94+
ingestDocument.setFieldValue(DATA_STREAM_TYPE, type);
95+
ingestDocument.setFieldValue(DATA_STREAM_DATASET, dataset);
96+
ingestDocument.setFieldValue(DATA_STREAM_NAMESPACE, namespace);
97+
if (ingestDocument.hasField(EVENT_DATASET)) {
98+
// ECS specifies that "event.dataset should have the same value as data_stream.dataset"
99+
// not eagerly set event.dataset but only if the doc contains it already to ensure it's consistent with data_stream.dataset
100+
ingestDocument.setFieldValue(EVENT_DATASET, dataset);
101+
}
102+
return ingestDocument;
103+
}
104+
105+
private static String parseDataStreamType(String dataStreamName, int indexOfFirstDash) {
106+
return dataStreamName.substring(0, indexOfFirstDash);
107+
}
108+
109+
private static String parseDataStreamDataset(String dataStreamName, int indexOfFirstDash, int indexOfSecondDash) {
110+
return dataStreamName.substring(indexOfFirstDash + 1, indexOfSecondDash);
111+
}
112+
113+
private static String parseDataStreamNamespace(String dataStreamName, int indexOfSecondDash) {
114+
return dataStreamName.substring(indexOfSecondDash + 1);
115+
}
116+
117+
private String determineDataStreamField(
118+
IngestDocument ingestDocument,
119+
List<DataStreamValueSource> valueSources,
120+
String fallbackFromCurrentTarget
121+
) {
122+
// first try to get value from the configured dataset/namespace field references
123+
// if this contains a static value rather than a field reference, this is guaranteed to return
124+
for (DataStreamValueSource value : valueSources) {
125+
String result = value.resolve(ingestDocument);
126+
if (result != null) {
127+
return result;
128+
}
129+
}
130+
// use the dataset/namespace value we parsed out from the current target (_index) as a fallback
131+
return fallbackFromCurrentTarget;
132+
}
133+
134+
@Override
135+
public String getType() {
136+
return TYPE;
137+
}
138+
139+
List<DataStreamValueSource> getDataStreamDataset() {
140+
return dataset;
141+
}
142+
143+
List<DataStreamValueSource> getDataStreamNamespace() {
144+
return namespace;
145+
}
146+
147+
String getDestination() {
148+
return destination;
149+
}
150+
151+
public static final class Factory implements Processor.Factory {
152+
153+
@Override
154+
public RerouteProcessor create(
155+
Map<String, Processor.Factory> processorFactories,
156+
String tag,
157+
String description,
158+
Map<String, Object> config
159+
) throws Exception {
160+
List<DataStreamValueSource> dataset;
161+
try {
162+
dataset = ConfigurationUtils.readOptionalListOrString(TYPE, tag, config, "dataset")
163+
.stream()
164+
.map(DataStreamValueSource::dataset)
165+
.toList();
166+
} catch (IllegalArgumentException e) {
167+
throw newConfigurationException(TYPE, tag, "dataset", e.getMessage());
168+
}
169+
List<DataStreamValueSource> namespace;
170+
try {
171+
namespace = ConfigurationUtils.readOptionalListOrString(TYPE, tag, config, "namespace")
172+
.stream()
173+
.map(DataStreamValueSource::namespace)
174+
.toList();
175+
} catch (IllegalArgumentException e) {
176+
throw newConfigurationException(TYPE, tag, "namespace", e.getMessage());
177+
}
178+
179+
String destination = ConfigurationUtils.readOptionalStringProperty(TYPE, tag, config, "destination");
180+
if (destination != null && (dataset.isEmpty() == false || namespace.isEmpty() == false)) {
181+
throw newConfigurationException(TYPE, tag, "destination", "can only be set if dataset and namespace are not set");
182+
}
183+
184+
return new RerouteProcessor(tag, description, dataset, namespace, destination);
185+
}
186+
}
187+
188+
/**
189+
* Contains either a {{field reference}} or a static value for a dataset or a namespace field
190+
*/
191+
static final class DataStreamValueSource {
192+
193+
private static final int MAX_LENGTH = 100;
194+
private static final String REPLACEMENT = "_";
195+
private static final Pattern DISALLOWED_IN_DATASET = Pattern.compile("[\\\\/*?\"<>| ,#:-]");
196+
private static final Pattern DISALLOWED_IN_NAMESPACE = Pattern.compile("[\\\\/*?\"<>| ,#:]");
197+
static final DataStreamValueSource DATASET_VALUE_SOURCE = dataset("{{" + DATA_STREAM_DATASET + "}}");
198+
static final DataStreamValueSource NAMESPACE_VALUE_SOURCE = namespace("{{" + DATA_STREAM_NAMESPACE + "}}");
199+
200+
private final String value;
201+
private final String fieldReference;
202+
private final Function<String, String> sanitizer;
203+
204+
public static DataStreamValueSource dataset(String dataset) {
205+
return new DataStreamValueSource(dataset, ds -> sanitizeDataStreamField(ds, DISALLOWED_IN_DATASET));
206+
}
207+
208+
public static DataStreamValueSource namespace(String namespace) {
209+
return new DataStreamValueSource(namespace, nsp -> sanitizeDataStreamField(nsp, DISALLOWED_IN_NAMESPACE));
210+
}
211+
212+
private static String sanitizeDataStreamField(String s, Pattern disallowedInDataset) {
213+
if (s == null) {
214+
return null;
215+
}
216+
s = s.toLowerCase(Locale.ROOT);
217+
s = s.substring(0, Math.min(s.length(), MAX_LENGTH));
218+
return disallowedInDataset.matcher(s).replaceAll(REPLACEMENT);
219+
}
220+
221+
private DataStreamValueSource(String value, Function<String, String> sanitizer) {
222+
this.sanitizer = sanitizer;
223+
this.value = value;
224+
if (value.contains("{{") || value.contains("}}")) {
225+
if (value.startsWith("{{") == false || value.endsWith("}}") == false) {
226+
throw new IllegalArgumentException("'" + value + "' is not a valid field reference");
227+
}
228+
String fieldReference = value.substring(2, value.length() - 2);
229+
// field references may have two or three curly braces
230+
if (fieldReference.startsWith("{") && fieldReference.endsWith("}")) {
231+
fieldReference = fieldReference.substring(1, fieldReference.length() - 1);
232+
}
233+
// only a single field reference is allowed
234+
// so something like this is disallowed: {{foo}}-{{bar}}
235+
if (fieldReference.contains("{") || fieldReference.contains("}")) {
236+
throw new IllegalArgumentException("'" + value + "' is not a valid field reference");
237+
}
238+
this.fieldReference = fieldReference;
239+
} else {
240+
this.fieldReference = null;
241+
if (Objects.equals(sanitizer.apply(value), value) == false) {
242+
throw new IllegalArgumentException("'" + value + "' contains disallowed characters");
243+
}
244+
}
245+
}
246+
247+
/**
248+
* Resolves the field reference from the provided ingest document or returns the static value if this value source doesn't represent
249+
* a field reference.
250+
* @param ingestDocument
251+
* @return the resolved field reference or static value
252+
*/
253+
@Nullable
254+
public String resolve(IngestDocument ingestDocument) {
255+
if (fieldReference != null) {
256+
return sanitizer.apply(ingestDocument.getFieldValue(fieldReference, String.class, true));
257+
} else {
258+
return value;
259+
}
260+
}
261+
}
262+
}

0 commit comments

Comments
 (0)